Python学习之爬虫(小甲鱼)
生活随笔
收集整理的這篇文章主要介紹了
Python学习之爬虫(小甲鱼)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
依葫蘆畫瓢?
用字符串查找圖片地址下載?
圖片放在當前目錄?
GIF下載下來不會動.....
?
?
import urllib.request import timedef open_url(url):#return htmlpageprint(url)req = urllib.request.Request(url)req.add_header("User-Agent","Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36")response = urllib.request.urlopen(req)return response.read()def getInitialpage():#return how many pages we haveurl = "http://jandan.net/ooxx"html = open_url(url)html = html.decode("utf-8")index = html.find("span class=\"current-comment-page\"")beginindex = html.find("[" , index)endindex = html.find("]" , index)initialpage = html[(beginindex+1) : endindex]return initialpagedef getpiclist(pageurl):html = open_url(pageurl)html = html.decode("utf-8")piclist = list()for i in range(html.count("[查看原圖]</a><br /><img")):index = html.find("[查看原圖]</a><br /><img")html=html[index:]beginindex = html.find("\"")endindex = html.find("\"" , (beginindex+1))picurl = html[beginindex+1:endindex]html = html[endindex:]piclist.append(picurl)return piclistdef savepic(piclist):for picurl in piclist:html = open_url("http:{}".format(picurl))filename = picurl.split("/")[-1]print(filename)with open(filename , "wb") as f:f.write(html)time.sleep(1)def test(page):initialpage = int(getInitialpage())for i in range((initialpage-page),(initialpage+1)):pageurl = "http://jandan.net/ooxx/page-{}#comments".format(i)piclist = getpiclist(pageurl)savepic(piclist) if __name__ == "__main__":test(1)補充:
request庫應該有一個retrieve方法用于下載,可以替換上述的 savepic() 中的代碼,動圖可正常顯示
?
?
?
?
總結
以上是生活随笔為你收集整理的Python学习之爬虫(小甲鱼)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 高数——单调有界定理
- 下一篇: 关于移动端token解决方案