爬取网页小实例
爬取中國(guó)大學(xué)最好大學(xué)排名前二十
代碼:
#CrawUnivRankingA.py import requests from bs4 import BeautifulSoup import bs4 def getHTMLText(url):try:r=requests.get(url,timeout=30)r.raise_for_status()r.encoding=r.apparent_encodingreturn r.textexcept:return""def fillUnivList(ulist,html):soup=BeautifulSoup(html,"html.parser")for tr in soup.find('tbody').children:if isinstance(tr,bs4.element.Tag):tds=tr('td')ulist.append([tds[0].string,tds[1].string,tds[3].string])def printUnivList(ulist,num):tplt="{0:^10}\t{1:{3}^10}\t{2:^10}"print(tplt.format("排名","學(xué)校名稱","總分",chr(12288)))for i in range(num):u=ulist[i]print(tplt.format(u[0],u[1],u[2],chr(12288)))def main():uinfo=[]url='http://www.zuihaodaxue.cn/zuihaodaxuepaiming2018.html'html=getHTMLText(url)fillUnivList(uinfo,html)printUnivList(uinfo,20) #排名前二十 main()?
轉(zhuǎn)載于:https://www.cnblogs.com/Staceyacm/p/10782047.html
總結(jié)
- 上一篇: About me 留言板
- 下一篇: 使用CSS3的appearance属性改