Python的scrapy框架POST方式爬虫时碰见__VIEWSTATE和__EVENTVALIDATION的参数处理
1.目標網站
2.碰見__VIEWSTATE和__EVENTVALIDATION的參數
3.參數解釋以及必要性
4.詳解__EVENTTARGET的參數值對應不同的思路
5.分析__VIEWSTATE和__EVENTVALIDATION這兩個參數
6.全部代碼
6.1 使用點擊下一頁的方式
運行結果:
6.2 使用點擊跳轉第幾頁的方式(即為POST中的常規翻頁,有頁數的)
1.目標網站
爬取如下的鏈接的信息是出現有__VIEWSTATE和__EVENTVALIDATION的參數的情況:商品房信息http://180.141.32.142/htmlaspx/TMSFW-CZ/HPMS/SPFInfoList.aspx
2.碰見__VIEWSTATE和__EVENTVALIDATION的參數
這是我好幾次碰見這種請求方式了,請求的POST參數如下圖:
3.參數解釋以及必要性
里面主要是__VIEWSTATE和__EVENTVALIDATION? 這兩個參數請求時候一定要帶上,
還有一個__EVENTTARGET的參數,網上有些博客說要加其他的,我碰見的主要是加這三個就可以了,以防萬一,加上最好,反正那些參數又不變
這個參數的作用是確定翻頁的方式,一般有兩種
我們爬蟲的時候選擇?PageNavigator1$LnkBtnNext? 這個值的意思是,點擊下一頁的翻頁方式,另外一種方式是,你直接點擊第幾頁或者跳轉到某頁的數據,它的值就是PageNavigator1$LnkBtnGoto
點擊下一頁的值,如下圖:
?對應的值就為:PageNavigator1$LnkBtnNext
?點擊跳轉到某頁的值,如下圖:
對應的值就為:PageNavigator1$LnkBtnGoto
?
而且要加上跳轉到哪頁的參數? PageNavigator1$txtNewPageIndex,?
4.詳解__EVENTTARGET的參數值對應不同的思路
首先是__EVENTTARGET的值,上面我說了有兩種處理方式
4.1. 第一種,點擊下一頁的方式,PageNavigator1$LnkBtnNext
即就是?__EVENTTARGET的值為:?PageNavigator1$LnkBtnNext
如果你選擇是點擊下一頁的方式,那則必須,一頁一頁往下傳遞的方式,并且只需要帶著
__EVENTTARGET、__VIEWSTATE和__EVENTVALIDATION這三個參數請求即可
而不要關注另外的PageNavigator1$txtNewPageIndex跳轉到哪頁的參數
則,請求的參數可如下方,__VIEWSTATE和__EVENTVALIDATION可以到瀏覽器上復制
payload = {"__EVENTTARGET": "PageNavigator1$LnkBtnGoto","__VIEWSTATE": "zOm7apMaa5ad6gjX5DhqwY5EwCwNTqMkpbtpcBkjYJHFgf0cxkF5fKPIk3MjyonK6efcaUQ+7rFw/gyvLrUJdTgPeoSY5ek1N0GKlYLGHNj82+3uL9dJ799TBgOoF0DIKN/8DsnF3bh1NvgEKzcxnxgEsjvBBLtT/V0CTu7O1y0SMFaeT2JOkrnn+Jt6F8/bFFCVlK+/IYnFWaF65SwmPhLpr1ljt2pE2eaRtjNv4gnfxuGCgVEBn70ZhdBTcuM5FMay+1AUVtfk7opAkWVvS0cOIwFHNmq5i06FDtiJdYcdvMTKjPq+l3dLTDqS3X3wJUvNTBHhBS5yfFf6+gFAi+Z5i7+QRW1d+nbiGV2vDgHKBjhJR/4Ahfswib9diZ6UrLXGCAlWs6918vEvHn0E4CYY1QrjJmL3J48ZOKs8m/4vd0bebXGzEJPce+ZbaPoBlJUFek8a8xjRoAvm/JV8vEqTUeFZhgKe5Jgcm9mPrNIX5TuP19Rhe3PSjbsOenVVojHXDk+GjcIfZFX4pnXsO5HHRl5xnebxMDrB+tm0U88/KjGLnPfeNuHz/tFKDFvP8iwrUAYjvtxk1xxDrvQ9ywGGb26U/48LKgrtPYpXtWmP6vtI1QiyfBIqNZefd3M8aK5wOLlHe/yu0eLcR1AtsXL5iHH05MrLf0DKHHlAi6ofPLhT4TJ/k5Ft3HynMYl0/VLOoHoa3F6xp4kuHTl6uT52Lq49HEH7MoDbWtoluoT4O/W7RuS3tW1nfjamhU9Y/3IcER0/CWfieT6NCOMrhiQqmiAg2qzV2JobiKkuSzqD2FUWeeymks9+x0rlqGUoGw6lKKEZDwIuKt9kQfLz36Q5OS8vhNvdlHHW/uSKGS4pLuRjK+ofsQeX8HeLlznmJ6v4iK38FnlTJxy12jwmXwHw0zD1y2RhlPG1KAu+Ge5ECwY/EsyS7DicIP/VB0AMCuwkLTOeuq8k82UKR4blaBvM734oGFrlawqW8vN7krrG6L3KhNyzbGmcDletVFhXcIczAPCactUnFzLI4Z7kS709vjSqVhU8DwxrK7EhyyuJadS3pigQ6nWbbvfJpoiv1QKB4V0T4F/vk2qptr8SIeLjtY2StQTe3mBiQjf69e55ZwYx1PV2FrJV2q8gmWeIhsFJ+vsmYQzu2FSIOuK98ppGdMmZiI0ilTXl+5lfNHc/yLn65q6OhgINLYZwqGmAhDmACW4Yx+XqoR98Ck3xdzzHM3LjhabIBmtN5gS1go9xaHIz+fcRU+7u/OG7CEY1r3laZeFcSBrtqyIh0H6MLpgQKITpMTeZM00vBo4HGX9x4amOJS057EGT9yc7bMSWdyOsvz6l/6BtweDsmBBOmE00aSPlpLPS0KzSx3wAoDIASopmiAjJ+JqrFxLDy6r6tkbjOZPfLbBeM4UOy+/p29zieoKp5m/I4+aQ1Ow1UoQ5ceFpm/CfXYePN0ef7zTwMhkfcEAXzT6Ky1ET/sVAQua1stfs0i9RdjOx6HGYMlzvuap+mKJ2BVYh02wd2TP5mfC4yjTXSB1Ac/cu/l0hT5ndt2yRcNcDrhue/2W0eYN28dzsBnQLGyysb6awqbevkdk6jpBEVMk4jzq3KYuLzI85s6IGhqDsDlG6CfjWGZOrXHalW82lG5GxQJgOW4I5Wxh4zLHIsznQvzLSwRwmuFLzhVh7VZshh3q3IIUR4eeIBR1k3ZoWlmUtscYBoefdLPIC+QlRZPU+7uoH8Nqq96eeSWOX/GyNm27qnEM4D4NEca4ItGynGendWriO19tiratsynMiby+oYLBakIeEDPQ/CLeX8REpTJY72xR0XS03sSD0SA4zRPGVkdCYtlu+FfCdILJMe9bPY4jFZNVu/ul11LPuz4G2LM4ifOCd+W54QGhU0mxh+6ZjUaV2rqqe9XNF7sMrvP6hwwb3HMezOPZzJKBO4AaRvBGHQjNJAy2XO2+io0owLz6eTbfAQbCyPMCLPC5SXfGTe0X+H/LPk45tx0H24EqYCSGeLaV4udFo+tzx3pOrPR9ku0RuwNJVkQ04nf0TMSet1Po7y0J1nX/9yj+VXzidYfOXzwViKrVz7QwyQufVbJzDKulu+ebBD6mhAyVoslo9wkg9vRjj/lUaEuFTbAEr2m8WFag/M5m9cqz823QAa0ZMXEpaF2EvzkJlDipvcd7oqhIX5Nzo2aOcw971qrKDCBT6s0NxgplAS07FFvIzbX8Ujmq0mlqVXCp3AzLy5e3QTrjTToTHITyVyOild1rPcotsAwoABUdcHHGJrzymdPwkFCcsN4UjHuntGtqtHmmMLlvpDBgnaE9sC2BezQd7V9+Cnf9f9/m7YUBGNf30tc7gYlzKlnpMF7o6++H46LyMhziTZ/oI6unIOvFXnrdzyC+6AFaKBaiywd5TjEoQI+AUT1DztkmCiD+FS+Q4a5CjzDEf7vTNjK+n2c2d33xC6lOre4et2DD5R8u1fmY5DBbf4G8miM7EjC8bLiALD33nylm2iCcCBNn4DBWdvOllZVM7pBOglokJYhroFh65cQPAolY2EyuM/tUc44iAeSYp8xPskROklfKKeR53Z0Hui5rgkl1QY1qlIC2K3kEYn5B2ThwBdCFGqlF/IL3KmAh+kL01yj+cB4pKBJ+1CDoSGSTdU6YxnG28Qh/7BR9CdhF6p5wLMGrW9RwsUFKcIl7OIwiyPeIQaKgwdAAmZus4/6J3azW98SHsgPmPR/QwYsnGpmqh5IOw2j2Zu1WKvv6tL4SdNFzqk6BF8x4jr0EGS8DoaoVNU7NYIMxqQ70KwXsYPf7Pgcm40jtktIeWHd1Mu/+ZmbH83iJMkBFvXSHts/hW7lXnlgU240h3qFq7d/sJEXbVSVGQLb9kBVuJdSmTGbSljN/4gRMndcUDtrySUSFON2iocPzzjEmTm45BVlqdAvaU7UKw5tsrqPbiSei7eIic9KkLoSjbvj+RwM+9y2TNhiY3U1nxe01TgmhtzbnO2qtAYfmH7hRB9mqb/4PeyBV8v9hH7dZs3lTtoK6MbqpQM87n2LO7ZU2KVZd2WLFpjoKL/PXnIoXPKN12T4Typmtkmc9FypkAgWHl8Qtb6rmBZozb0NeKIYQvVRnGxl/pJdAUXtJWT+//rOLiFnezLeBNp3p232fXfynkA/k2Gbys03Wezb9LHIBzLZYzzncUyesSqJ345gQbLPFltL5/3zbwC2KzLGwjSUfePaOF+M63dsHLH5wJQc+B5bFlVaE+6JhgP0USJssKDUhbpoNsukYDbnWRaLQgykvawFbUdyA/Vs/OmrN+Hk72EUX8XJzvOem0HHPPxUnNAZWddhQUCE06LkbImRnyQGsUsXsKTJ71BTDgJ2xJt0Uw9JwOsFUTXLgfw0udru+QhKiWfP9Z8FQVMDqILjMe5D8elae06pcI2mz4FULd8UpjUYNMPz7N+dbUbtDY1jNk9hrQaZvd8TmC4t4pS1VFhGQThrrh6U88rw==","__EVENTVALIDATION": "eT4WmyJjgTkHGjSisJKNF+X+cGGK5uwbsBnK4m9+kuhoPimZAZ20qtwGw+VbV1+8weNVHdj8HHGlw5pWrqD69OKcNfVQpb5OZ0IIohTvK1UhBf/gVf0eOv7hNWjOL7KTzt/Y5OUHYqD1pViBvxaCgYDNGNZpXgQ+NFdphO153hsGhoKX1mD7U8edi5k3qm3HbqgM3WQcyTKBI4gSOjNzPEM/PhgorOuP2dWliNYeSfPfCmza4vONcAU83QLef4aJm2bHIvIiqUpEHmgfEO6iKYy8Fjbe2u9uxwziaXq0gqcVujz6hddQ+Ax/9gU="}4.2 第二種,點擊跳轉的方式,PageNavigator1$LnkBtnGoto
即就是__EVENTTARGET的值為:PageNavigator1$LnkBtnGoto
5.分析__VIEWSTATE和__EVENTVALIDATION這兩個參數
接下來,分析__VIEWSTATE和__EVENTVALIDATION這兩個參數
我們打開第一頁的調試臺,查看他的response,就會發現,第一個頁的源碼里面就有這兩個參數, 如下圖:
?如果不帶任何參數請求第一頁,不管是post還是get請求的主鏈接,就是post的那個鏈接,請求的都是第一頁的數據,知道了這個,我們請求第一頁的數據時候,就什么都不用帶,怎樣請求都可以,但最好帶上headers小心被封掉, 這樣請求的數據里面總是第一頁的數據,然后寫xpath,取出里面的我們所需要的值
VIEWSTATE = resp.xpath("//input[@name='__VIEWSTATE']/@value").extract_first() EVENTVALIDATION = resp.xpath("//input[@name='__EVENTVALIDATION']/@value").extract_first()6.全部代碼
6.1 使用點擊下一頁的方式
注意:在該scrapy框架中,使用了內聯請求,關于這個內聯請求,我后面會專門寫博客說明,可以關注我,后續會發
?有scrapy框架本身就是異步框架的,原因使用內聯請求,這個 inline_requests 模塊的安裝可以私信我,或者下方評論
這是文檔,可以參考,后續我會專門寫一篇關于scrapy框架內聯請求的博客
Installation — Scrapy Inline Requests 0.3.1 documentationhttps://scrapy-inline-requests.readthedocs.io/en/stable/installation.html為什么用,內聯因為不用內聯的話,scrapy框架是隨機請求,是異步的,所以,不用內聯的話,點擊下一頁的方式就用不了,
scrapy框架中post請求要重寫start_requests()方法 ,然后這些生成構造的url,所有的就相當于start_urls這個列表的作用,所以,一開始就會把所有的請求了,這樣就會導致,你下一次的要用上一次網頁里面解析出來的請求參數,然而無法去對應上,所以就無法去得到下一頁的數據,所以要把內聯加上。
? 注意請求所帶的參數,的__EVENTTARGET的方式不能出錯,點擊下一頁的方式不能出現跳轉對應的參數,否則,一直會是第一頁的數據
import json from pprint import pprint import scrapy from inline_requests import inline_requestsfrom XnSpider.utils.ToolsFunction import get_html_table, modify_dict_keysclass GuangxiCzXmlSpider(scrapy.Spider):"""廣西省-崇左市-項目信息五證、項目信息"""name = 'Guangxi_Cz_All'allowed_domains = ['180.141.32.142']start_urls = ['http://180.141.32.142/htmlaspx/TMSFW-CZ/HPMS/SPFInfoList.aspx']headers = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Language": "zh-CN,zh;q=0.9","Content-Type": "application/x-www-form-urlencoded","Host": "180.141.32.142",}payload = {"__EVENTTARGET": "PageNavigator1$LnkBtnNext","__VIEWSTATE": "zOm7apMaa5ad6gjX5DhqwY5EwCwNTqMkpbtpcBkjYJHFgf0cxkF5fKPIk3MjyonK6efcaUQ+7rFw/gyvLrUJdTgPeoSY5ek1N0GKlYLGHNj82+3uL9dJ799TBgOoF0DIKN/8DsnF3bh1NvgEKzcxnxgEsjvBBLtT/V0CTu7O1y0SMFaeT2JOkrnn+Jt6F8/bFFCVlK+/IYnFWaF65SwmPhLpr1ljt2pE2eaRtjNv4gnfxuGCgVEBn70ZhdBTcuM5FMay+1AUVtfk7opAkWVvS0cOIwFHNmq5i06FDtiJdYcdvMTKjPq+l3dLTDqS3X3wJUvNTBHhBS5yfFf6+gFAi+Z5i7+QRW1d+nbiGV2vDgHKBjhJR/4Ahfswib9diZ6UrLXGCAlWs6918vEvHn0E4CYY1QrjJmL3J48ZOKs8m/4vd0bebXGzEJPce+ZbaPoBlJUFek8a8xjRoAvm/JV8vEqTUeFZhgKe5Jgcm9mPrNIX5TuP19Rhe3PSjbsOenVVojHXDk+GjcIfZFX4pnXsO5HHRl5xnebxMDrB+tm0U88/KjGLnPfeNuHz/tFKDFvP8iwrUAYjvtxk1xxDrvQ9ywGGb26U/48LKgrtPYpXtWmP6vtI1QiyfBIqNZefd3M8aK5wOLlHe/yu0eLcR1AtsXL5iHH05MrLf0DKHHlAi6ofPLhT4TJ/k5Ft3HynMYl0/VLOoHoa3F6xp4kuHTl6uT52Lq49HEH7MoDbWtoluoT4O/W7RuS3tW1nfjamhU9Y/3IcER0/CWfieT6NCOMrhiQqmiAg2qzV2JobiKkuSzqD2FUWeeymks9+x0rlqGUoGw6lKKEZDwIuKt9kQfLz36Q5OS8vhNvdlHHW/uSKGS4pLuRjK+ofsQeX8HeLlznmJ6v4iK38FnlTJxy12jwmXwHw0zD1y2RhlPG1KAu+Ge5ECwY/EsyS7DicIP/VB0AMCuwkLTOeuq8k82UKR4blaBvM734oGFrlawqW8vN7krrG6L3KhNyzbGmcDletVFhXcIczAPCactUnFzLI4Z7kS709vjSqVhU8DwxrK7EhyyuJadS3pigQ6nWbbvfJpoiv1QKB4V0T4F/vk2qptr8SIeLjtY2StQTe3mBiQjf69e55ZwYx1PV2FrJV2q8gmWeIhsFJ+vsmYQzu2FSIOuK98ppGdMmZiI0ilTXl+5lfNHc/yLn65q6OhgINLYZwqGmAhDmACW4Yx+XqoR98Ck3xdzzHM3LjhabIBmtN5gS1go9xaHIz+fcRU+7u/OG7CEY1r3laZeFcSBrtqyIh0H6MLpgQKITpMTeZM00vBo4HGX9x4amOJS057EGT9yc7bMSWdyOsvz6l/6BtweDsmBBOmE00aSPlpLPS0KzSx3wAoDIASopmiAjJ+JqrFxLDy6r6tkbjOZPfLbBeM4UOy+/p29zieoKp5m/I4+aQ1Ow1UoQ5ceFpm/CfXYePN0ef7zTwMhkfcEAXzT6Ky1ET/sVAQua1stfs0i9RdjOx6HGYMlzvuap+mKJ2BVYh02wd2TP5mfC4yjTXSB1Ac/cu/l0hT5ndt2yRcNcDrhue/2W0eYN28dzsBnQLGyysb6awqbevkdk6jpBEVMk4jzq3KYuLzI85s6IGhqDsDlG6CfjWGZOrXHalW82lG5GxQJgOW4I5Wxh4zLHIsznQvzLSwRwmuFLzhVh7VZshh3q3IIUR4eeIBR1k3ZoWlmUtscYBoefdLPIC+QlRZPU+7uoH8Nqq96eeSWOX/GyNm27qnEM4D4NEca4ItGynGendWriO19tiratsynMiby+oYLBakIeEDPQ/CLeX8REpTJY72xR0XS03sSD0SA4zRPGVkdCYtlu+FfCdILJMe9bPY4jFZNVu/ul11LPuz4G2LM4ifOCd+W54QGhU0mxh+6ZjUaV2rqqe9XNF7sMrvP6hwwb3HMezOPZzJKBO4AaRvBGHQjNJAy2XO2+io0owLz6eTbfAQbCyPMCLPC5SXfGTe0X+H/LPk45tx0H24EqYCSGeLaV4udFo+tzx3pOrPR9ku0RuwNJVkQ04nf0TMSet1Po7y0J1nX/9yj+VXzidYfOXzwViKrVz7QwyQufVbJzDKulu+ebBD6mhAyVoslo9wkg9vRjj/lUaEuFTbAEr2m8WFag/M5m9cqz823QAa0ZMXEpaF2EvzkJlDipvcd7oqhIX5Nzo2aOcw971qrKDCBT6s0NxgplAS07FFvIzbX8Ujmq0mlqVXCp3AzLy5e3QTrjTToTHITyVyOild1rPcotsAwoABUdcHHGJrzymdPwkFCcsN4UjHuntGtqtHmmMLlvpDBgnaE9sC2BezQd7V9+Cnf9f9/m7YUBGNf30tc7gYlzKlnpMF7o6++H46LyMhziTZ/oI6unIOvFXnrdzyC+6AFaKBaiywd5TjEoQI+AUT1DztkmCiD+FS+Q4a5CjzDEf7vTNjK+n2c2d33xC6lOre4et2DD5R8u1fmY5DBbf4G8miM7EjC8bLiALD33nylm2iCcCBNn4DBWdvOllZVM7pBOglokJYhroFh65cQPAolY2EyuM/tUc44iAeSYp8xPskROklfKKeR53Z0Hui5rgkl1QY1qlIC2K3kEYn5B2ThwBdCFGqlF/IL3KmAh+kL01yj+cB4pKBJ+1CDoSGSTdU6YxnG28Qh/7BR9CdhF6p5wLMGrW9RwsUFKcIl7OIwiyPeIQaKgwdAAmZus4/6J3azW98SHsgPmPR/QwYsnGpmqh5IOw2j2Zu1WKvv6tL4SdNFzqk6BF8x4jr0EGS8DoaoVNU7NYIMxqQ70KwXsYPf7Pgcm40jtktIeWHd1Mu/+ZmbH83iJMkBFvXSHts/hW7lXnlgU240h3qFq7d/sJEXbVSVGQLb9kBVuJdSmTGbSljN/4gRMndcUDtrySUSFON2iocPzzjEmTm45BVlqdAvaU7UKw5tsrqPbiSei7eIic9KkLoSjbvj+RwM+9y2TNhiY3U1nxe01TgmhtzbnO2qtAYfmH7hRB9mqb/4PeyBV8v9hH7dZs3lTtoK6MbqpQM87n2LO7ZU2KVZd2WLFpjoKL/PXnIoXPKN12T4Typmtkmc9FypkAgWHl8Qtb6rmBZozb0NeKIYQvVRnGxl/pJdAUXtJWT+//rOLiFnezLeBNp3p232fXfynkA/k2Gbys03Wezb9LHIBzLZYzzncUyesSqJ345gQbLPFltL5/3zbwC2KzLGwjSUfePaOF+M63dsHLH5wJQc+B5bFlVaE+6JhgP0USJssKDUhbpoNsukYDbnWRaLQgykvawFbUdyA/Vs/OmrN+Hk72EUX8XJzvOem0HHPPxUnNAZWddhQUCE06LkbImRnyQGsUsXsKTJ71BTDgJ2xJt0Uw9JwOsFUTXLgfw0udru+QhKiWfP9Z8FQVMDqILjMe5D8elae06pcI2mz4FULd8UpjUYNMPz7N+dbUbtDY1jNk9hrQaZvd8TmC4t4pS1VFhGQThrrh6U88rw==","__EVENTVALIDATION": "eT4WmyJjgTkHGjSisJKNF+X+cGGK5uwbsBnK4m9+kuhoPimZAZ20qtwGw+VbV1+8weNVHdj8HHGlw5pWrqD69OKcNfVQpb5OZ0IIohTvK1UhBf/gVf0eOv7hNWjOL7KTzt/Y5OUHYqD1pViBvxaCgYDNGNZpXgQ+NFdphO153hsGhoKX1mD7U8edi5k3qm3HbqgM3WQcyTKBI4gSOjNzPEM/PhgorOuP2dWliNYeSfPfCmza4vONcAU83QLef4aJm2bHIvIiqUpEHmgfEO6iKYy8Fjbe2u9uxwziaXq0gqcVujz6hddQ+Ax/9gU=",}@inline_requestsdef parse(self, response, **kwargs):for i in range(1, 9): # 9if i == 1:resp = yield scrapy.Request(url=self.start_urls[0], dont_filter=True, headers=self.headers)VIEWSTATE = resp.xpath("//input[@name='__VIEWSTATE']/@value").extract_first()EVENTVALIDATION = resp.xpath("//input[@name='__EVENTVALIDATION']/@value").extract_first()self.payload['__VIEWSTATE'] = VIEWSTATEself.payload['__EVENTVALIDATION'] = EVENTVALIDATIONelse:resp = yield scrapy.FormRequest(url=self.start_urls[0], dont_filter=True,formdata=self.payload, headers=self.headers)VIEWSTATE = resp.xpath("//input[@name='__VIEWSTATE']/@value").extract_first()EVENTVALIDATION = resp.xpath("//input[@name='__EVENTVALIDATION']/@value").extract_first()self.payload['__VIEWSTATE'] = VIEWSTATEself.payload['__EVENTVALIDATION'] = EVENTVALIDATIONpro_name = resp.xpath("//div[@class='resultlist']//table//tr[position()>2]//td[2]//text()").extract()print(pro_name)print("*" * 60)運行結果:
6.2 使用點擊跳轉第幾頁的方式(即為POST中的常規翻頁,有頁數的)
因為有頁碼的限制,所以不要內聯請求的方式
import json from pprint import pprint import scrapy from inline_requests import inline_requestsfrom XnSpider.utils.ToolsFunction import get_html_table, modify_dict_keysclass GuangxiCzXmlSpider(scrapy.Spider):"""廣西省-崇左市-項目信息五證、項目信息"""name = 'Guangxi_Cz_All'allowed_domains = ['180.141.32.142']start_urls = ['http://180.141.32.142/htmlaspx/TMSFW-CZ/HPMS/SPFInfoList.aspx']headers = {"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9","Accept-Language": "zh-CN,zh;q=0.9","Content-Type": "application/x-www-form-urlencoded","Host": "180.141.32.142",}payload = {"__EVENTTARGET": "PageNavigator1$LnkBtnGoto","__VIEWSTATE": "zOm7apMaa5ad6gjX5DhqwY5EwCwNTqMkpbtpcBkjYJHFgf0cxkF5fKPIk3MjyonK6efcaUQ+7rFw/gyvLrUJdTgPeoSY5ek1N0GKlYLGHNj82+3uL9dJ799TBgOoF0DIKN/8DsnF3bh1NvgEKzcxnxgEsjvBBLtT/V0CTu7O1y0SMFaeT2JOkrnn+Jt6F8/bFFCVlK+/IYnFWaF65SwmPhLpr1ljt2pE2eaRtjNv4gnfxuGCgVEBn70ZhdBTcuM5FMay+1AUVtfk7opAkWVvS0cOIwFHNmq5i06FDtiJdYcdvMTKjPq+l3dLTDqS3X3wJUvNTBHhBS5yfFf6+gFAi+Z5i7+QRW1d+nbiGV2vDgHKBjhJR/4Ahfswib9diZ6UrLXGCAlWs6918vEvHn0E4CYY1QrjJmL3J48ZOKs8m/4vd0bebXGzEJPce+ZbaPoBlJUFek8a8xjRoAvm/JV8vEqTUeFZhgKe5Jgcm9mPrNIX5TuP19Rhe3PSjbsOenVVojHXDk+GjcIfZFX4pnXsO5HHRl5xnebxMDrB+tm0U88/KjGLnPfeNuHz/tFKDFvP8iwrUAYjvtxk1xxDrvQ9ywGGb26U/48LKgrtPYpXtWmP6vtI1QiyfBIqNZefd3M8aK5wOLlHe/yu0eLcR1AtsXL5iHH05MrLf0DKHHlAi6ofPLhT4TJ/k5Ft3HynMYl0/VLOoHoa3F6xp4kuHTl6uT52Lq49HEH7MoDbWtoluoT4O/W7RuS3tW1nfjamhU9Y/3IcER0/CWfieT6NCOMrhiQqmiAg2qzV2JobiKkuSzqD2FUWeeymks9+x0rlqGUoGw6lKKEZDwIuKt9kQfLz36Q5OS8vhNvdlHHW/uSKGS4pLuRjK+ofsQeX8HeLlznmJ6v4iK38FnlTJxy12jwmXwHw0zD1y2RhlPG1KAu+Ge5ECwY/EsyS7DicIP/VB0AMCuwkLTOeuq8k82UKR4blaBvM734oGFrlawqW8vN7krrG6L3KhNyzbGmcDletVFhXcIczAPCactUnFzLI4Z7kS709vjSqVhU8DwxrK7EhyyuJadS3pigQ6nWbbvfJpoiv1QKB4V0T4F/vk2qptr8SIeLjtY2StQTe3mBiQjf69e55ZwYx1PV2FrJV2q8gmWeIhsFJ+vsmYQzu2FSIOuK98ppGdMmZiI0ilTXl+5lfNHc/yLn65q6OhgINLYZwqGmAhDmACW4Yx+XqoR98Ck3xdzzHM3LjhabIBmtN5gS1go9xaHIz+fcRU+7u/OG7CEY1r3laZeFcSBrtqyIh0H6MLpgQKITpMTeZM00vBo4HGX9x4amOJS057EGT9yc7bMSWdyOsvz6l/6BtweDsmBBOmE00aSPlpLPS0KzSx3wAoDIASopmiAjJ+JqrFxLDy6r6tkbjOZPfLbBeM4UOy+/p29zieoKp5m/I4+aQ1Ow1UoQ5ceFpm/CfXYePN0ef7zTwMhkfcEAXzT6Ky1ET/sVAQua1stfs0i9RdjOx6HGYMlzvuap+mKJ2BVYh02wd2TP5mfC4yjTXSB1Ac/cu/l0hT5ndt2yRcNcDrhue/2W0eYN28dzsBnQLGyysb6awqbevkdk6jpBEVMk4jzq3KYuLzI85s6IGhqDsDlG6CfjWGZOrXHalW82lG5GxQJgOW4I5Wxh4zLHIsznQvzLSwRwmuFLzhVh7VZshh3q3IIUR4eeIBR1k3ZoWlmUtscYBoefdLPIC+QlRZPU+7uoH8Nqq96eeSWOX/GyNm27qnEM4D4NEca4ItGynGendWriO19tiratsynMiby+oYLBakIeEDPQ/CLeX8REpTJY72xR0XS03sSD0SA4zRPGVkdCYtlu+FfCdILJMe9bPY4jFZNVu/ul11LPuz4G2LM4ifOCd+W54QGhU0mxh+6ZjUaV2rqqe9XNF7sMrvP6hwwb3HMezOPZzJKBO4AaRvBGHQjNJAy2XO2+io0owLz6eTbfAQbCyPMCLPC5SXfGTe0X+H/LPk45tx0H24EqYCSGeLaV4udFo+tzx3pOrPR9ku0RuwNJVkQ04nf0TMSet1Po7y0J1nX/9yj+VXzidYfOXzwViKrVz7QwyQufVbJzDKulu+ebBD6mhAyVoslo9wkg9vRjj/lUaEuFTbAEr2m8WFag/M5m9cqz823QAa0ZMXEpaF2EvzkJlDipvcd7oqhIX5Nzo2aOcw971qrKDCBT6s0NxgplAS07FFvIzbX8Ujmq0mlqVXCp3AzLy5e3QTrjTToTHITyVyOild1rPcotsAwoABUdcHHGJrzymdPwkFCcsN4UjHuntGtqtHmmMLlvpDBgnaE9sC2BezQd7V9+Cnf9f9/m7YUBGNf30tc7gYlzKlnpMF7o6++H46LyMhziTZ/oI6unIOvFXnrdzyC+6AFaKBaiywd5TjEoQI+AUT1DztkmCiD+FS+Q4a5CjzDEf7vTNjK+n2c2d33xC6lOre4et2DD5R8u1fmY5DBbf4G8miM7EjC8bLiALD33nylm2iCcCBNn4DBWdvOllZVM7pBOglokJYhroFh65cQPAolY2EyuM/tUc44iAeSYp8xPskROklfKKeR53Z0Hui5rgkl1QY1qlIC2K3kEYn5B2ThwBdCFGqlF/IL3KmAh+kL01yj+cB4pKBJ+1CDoSGSTdU6YxnG28Qh/7BR9CdhF6p5wLMGrW9RwsUFKcIl7OIwiyPeIQaKgwdAAmZus4/6J3azW98SHsgPmPR/QwYsnGpmqh5IOw2j2Zu1WKvv6tL4SdNFzqk6BF8x4jr0EGS8DoaoVNU7NYIMxqQ70KwXsYPf7Pgcm40jtktIeWHd1Mu/+ZmbH83iJMkBFvXSHts/hW7lXnlgU240h3qFq7d/sJEXbVSVGQLb9kBVuJdSmTGbSljN/4gRMndcUDtrySUSFON2iocPzzjEmTm45BVlqdAvaU7UKw5tsrqPbiSei7eIic9KkLoSjbvj+RwM+9y2TNhiY3U1nxe01TgmhtzbnO2qtAYfmH7hRB9mqb/4PeyBV8v9hH7dZs3lTtoK6MbqpQM87n2LO7ZU2KVZd2WLFpjoKL/PXnIoXPKN12T4Typmtkmc9FypkAgWHl8Qtb6rmBZozb0NeKIYQvVRnGxl/pJdAUXtJWT+//rOLiFnezLeBNp3p232fXfynkA/k2Gbys03Wezb9LHIBzLZYzzncUyesSqJ345gQbLPFltL5/3zbwC2KzLGwjSUfePaOF+M63dsHLH5wJQc+B5bFlVaE+6JhgP0USJssKDUhbpoNsukYDbnWRaLQgykvawFbUdyA/Vs/OmrN+Hk72EUX8XJzvOem0HHPPxUnNAZWddhQUCE06LkbImRnyQGsUsXsKTJ71BTDgJ2xJt0Uw9JwOsFUTXLgfw0udru+QhKiWfP9Z8FQVMDqILjMe5D8elae06pcI2mz4FULd8UpjUYNMPz7N+dbUbtDY1jNk9hrQaZvd8TmC4t4pS1VFhGQThrrh6U88rw==","__EVENTVALIDATION": "eT4WmyJjgTkHGjSisJKNF+X+cGGK5uwbsBnK4m9+kuhoPimZAZ20qtwGw+VbV1+8weNVHdj8HHGlw5pWrqD69OKcNfVQpb5OZ0IIohTvK1UhBf/gVf0eOv7hNWjOL7KTzt/Y5OUHYqD1pViBvxaCgYDNGNZpXgQ+NFdphO153hsGhoKX1mD7U8edi5k3qm3HbqgM3WQcyTKBI4gSOjNzPEM/PhgorOuP2dWliNYeSfPfCmza4vONcAU83QLef4aJm2bHIvIiqUpEHmgfEO6iKYy8Fjbe2u9uxwziaXq0gqcVujz6hddQ+Ax/9gU=","PageNavigator1$txtNewPageIndex": "1"}def start_requests(self):for i in range(1, 9):if i == 1:yield scrapy.Request(url=self.start_urls[0], dont_filter=True, headers=self.headers,callback=self.parse)else:self.payload['PageNavigator1$txtNewPageIndex'] = str(i)yield scrapy.FormRequest(url=self.start_urls[0], dont_filter=True, headers=self.headers,callback=self.parse, formdata=self.payload)# @inline_requestsdef parse(self, response, **kwargs):pro_name = response.xpath("//div[@class='resultlist']//table//tr[position()>2]//td[2]//text()").extract()print(pro_name)print("*" * 60)運行結果:
打印結果也可以看出來,是隨機的,scrapy引擎處理的請求?
總結
以上是生活随笔為你收集整理的Python的scrapy框架POST方式爬虫时碰见__VIEWSTATE和__EVENTVALIDATION的参数处理的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Pandas的学习(pandas中删除行
- 下一篇: Python读取excel文件可读取xl