當前位置：首頁 > 编程语言 > asp.net >内容正文

asp.net

python 访问网页aspx_asp.net – 如何向python中的.aspx页面提交查询

發布時間：2025/3/15 asp.net 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 访问网页aspx_asp.net – 如何向python中的.aspx页面提交查询小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

作為概述，您將需要執行四個主要任務：

>向網站提交請求，

>從站點檢索響應

>來解析這些響應

>有一些邏輯來迭代上面的任務，與導航相關的參數(到結果列表中的“下一個”頁面)

http請求和響應處理使用Python標準庫的urllib和urllib2中的方法和類來完成.html頁面的解析可以使用Python的標準庫HTMLParser或其他模塊，如Beautiful Soup

以下代碼段演示了在問題中指出的站點上請求和接收搜索。該網站是ASP驅動的，因此我們需要確保我們發送幾個表單域，其中一些具有“可怕”值，因為ASP邏輯使用這些字段來維護狀態并在一定程度上驗證請求。確實提交。必須使用http POST方法發送請求，因為這是ASP應用程序的預期。主要的困難在于識別ASP期望的表單域和相關值(使用Python獲取頁面很容易)。

這個代碼是有效的，更確切地說是功能，直到我刪除了大部分的VSTATE值，并且可能通過添加注釋引入了一個或兩個。

import urllib

import urllib2

uri = 'http://legistar.council.nyc.gov/Legislation.aspx'

#the http headers are useful to simulate a particular browser (some sites deny

#access to non-browsers (bots,etc.)

#also needed to pass the content type.

headers = {

'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13','HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8','Content-Type': 'application/x-www-form-urlencoded'

}

# we group the form fields and their values in a list (any

# iterable,actually) of name-value tuples. This helps

# with clarity and also makes it easy to later encoding of them.

formFields = (

# the viewstate is actualy 800+ characters in length! I truncated it

# for this sample code. It can be lifted from the first page

# obtained from the site. It may be ok to hardcode this value,or

# it may have to be refreshed each time / each day,by essentially

# running an extra page request and parse,for this specific value.

(r'__VSTATE',r'7TzretNIlrZiKb7EOB3AQE ... ...2qd6g5xD8CGXm5EftXtNPt+H8B'),# following are more of these ASP form fields

(r'__VIEWSTATE',r''),(r'__EVENTVALIDATION',r'/wEWDwL+raDpAgKnpt8nAs3q+pQOAs3q/pQOAs3qgpUOAs3qhpUOAoPE36ANAve684YCAoOs79EIAoOs89EIAoOs99EIAoOs39EIAoOs49EIAoOs09EIAoSs99EI6IQ74SEV9n4XbtWm1rEbB6Ic3/M='),(r'ctl00_RadScriptManager1_HiddenField',''),(r'ctl00_tabTop_ClientState',(r'ctl00_ContentPlaceHolder1_menuMain_ClientState',(r'ctl00_ContentPlaceHolder1_gridMain_ClientState',#but then we come to fields of interest: the search

#criteria the collections to search from etc.

# Check boxes

(r'ctl00$ContentPlaceHolder1$chkOptions$0','on'),# file number

(r'ctl00$ContentPlaceHolder1$chkOptions$1',# Legislative text

(r'ctl00$ContentPlaceHolder1$chkOptions$2',# attachement

# etc. (not all listed)

(r'ctl00$ContentPlaceHolder1$txtSearch','york'),# Search text

(r'ctl00$ContentPlaceHolder1$lstYears','All Years'),# Years to include

(r'ctl00$ContentPlaceHolder1$lstTypeBasic','All Types'),#types to include

(r'ctl00$ContentPlaceHolder1$btnSearch','Search Legislation') # Search button itself

)

# these have to be encoded

encodedFields = urllib.urlencode(formFields)

req = urllib2.Request(uri,encodedFields,headers)

f= urllib2.urlopen(req) #that's the actual call to the http site.

# *** here would normally be the in-memory parsing of f

# contents,but instead I store this to file

# this is useful during design,allowing to have a

# sample of what is to be parsed in a text editor,for analysis.

try:

fout = open('tmp.htm','w')

except:

print('Could not open output file\n')

fout.writelines(f.readlines())

fout.close()

這是關于獲取初始頁面的。如上所述，然后需要解析頁面，即找到感興趣的部分并適當地收集它們，并將它們存儲到文件/數據庫/ whereever。這個工作可以通過很多方式完成：使用html解析器，或XSLT類型的技術(確實在解析html到xml之后)，甚至是粗略的作業，簡單的正則表達式。而且，通常提取的一個項目是“下一個信息”，即可以在服務器的新請求中使用以獲得后續頁面的各種鏈接。

這應該給你一個粗俗的味道，“長手”html刮擦是關于。還有許多其他方法，例如Mozilla(FireFox)GreaseMonkey插件，XSLT中的專用工具，腳本

總結

以上是生活随笔為你收集整理的python 访问网页aspx_asp.net – 如何向python中的.aspx页面提交查询的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： xiao77论坛php,论坛
下一篇： C++输出变量类型、max报错原因