手把手教你爬虫requests实战演练——python篇
生活随笔
收集整理的這篇文章主要介紹了
手把手教你爬虫requests实战演练——python篇
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
文章目錄
一、前言
二、實戰
1)獲取百度網頁并打印
?2)獲取帥哥圖片并下載到本地
4) 獲取美女視頻并下載到本地
?5)搜狗關鍵詞搜索爬取
6)爬取百度翻譯
7)爬取豆瓣電影榜單
?8)JK妹子爬取
?總結:
一、前言
前面兩篇文章我已經把requests基礎與高階篇都做了詳細講解,也有不少了例子。那么本篇在基于前兩篇文章之上,專門做一篇實戰篇。
requests? ?基礎篇? ? 進階篇
環境:jupyter
如果你不會使用jupyter請看我這一篇文章:jupyter安裝教程與使用教程
二、實戰
1)獲取百度網頁并打印
#-*- coding: utf-8 -* import requestsurl = 'http://www.baidu.com'r = requests.get(url) r.encoding = r.apparent_encodingprint(r.text)運行結果:
?2)獲取帥哥圖片并下載到本地
?此照片鏈接? 點它就行
現在我們就把這張圖片下載下來:
代碼:
import requestssrc = 'https://cn.bing.com/images/search?view=detailV2&ccid=yj6ElAFe&id=D93F105743FB238DEB0F368C30CD9881AEB3B8E8&thid=OIP.yj6ElAFeZl8v6dYUhuMgqAHaHa&mediaurl=https%3A%2F%2Ftse1-mm.cn.bing.net%2Fth%2Fid%2FR-C.ca3e8494015e665f2fe9d61486e320a8%3Frik%3D6LizroGYzTCMNg%26riu%3Dhttp%253a%252f%252fp4.music.126.net%252f9Fpqj1WM0H7fjlRQc3-TSw%253d%253d%252f109951165325278290.jpg%26ehk%3Dr9puRRQ%252fYEoDToUqJ%252bOt%252fBhB69sKQ8Zl0cwQXKrOWng%253d%26risl%3D%26pid%3DImgRaw%26r%3D0&riu=http%253a%252f%252fp2.music.126.net%252fPFVNR3tU9DCiIY71NdUDcQ%253d%253d%252f109951165334518246.jpg&ehk=o08VEDcuKybQIPsOGrNpQ2glID%252fIiEV7cw%252bFo%252fzopiM%253d&risl=1&pid=ImgRaw&r=0&exph=1024&expw=1024&q=%E5%BC%A0%E6%9D%B0&simid=608056275652971603&form=IRPRST&ck=61FF572F08E45A84E73B7ECCF670E32A&selectedindex=3&ajaxhist=0&ajaxserp=0&vt=0&sim=11'r = requests.get(src)with open('bizhi.jpg', 'wb') as f:f.write(r.content)print('下載完成')運行結果:
4) 獲取美女視頻并下載到本地
比如我得到一個視頻鏈接:美女變身
截圖:
代碼:
?運行結果:
?5)搜狗關鍵詞搜索爬取
代碼:
import requests #指定url url='https://www.sogou.com/web' kw=input('enter a word: ') header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' } param={'query':kw } #發起請求,做好偽裝 response=requests.get(url=url,params=param,headers=header) #獲取相應數據 content=response.text fileName=kw+'.html' #將數據保存在本地 with open(fileName,'w',encoding='utf-8') as fp:fp.write(content) print(fileName,'爬取完成!!!')運行結果:? 輸入 美女 回車
?網址詳情:
6)爬取百度翻譯
分析找到接口:
?由此我們可以拿到接口和請求方式:
?代碼:
import json import requests url='https://fanyi.baidu.com/sug' word=input('請輸入想翻譯的詞語或句子:') data={'kw':word } headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2626.106 Safari/537.36' } reponse=requests.post(url=url,data=data,headers=headers) dic_obj=reponse.json() # print(dic_obj) filename=word+'.json' with open(filename,'w',encoding='utf-8') as fp:json.dump(dic_obj,fp=fp,ensure_ascii=False) j=dic_obj['data'][1]['v'] print(j)測試結果:
7)爬取豆瓣電影榜單
目標網址:
https://movie.douban.com/chart
?代碼:
import json import requests url='https://movie.douban.com/j/chart/top_list?' params={'type': '11','interval_id': '100:90','action': '','start': '0','limit': '20', } headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2626.106 Safari/537.36' } reponse=requests.get(url=url,params=params,headers=headers) dic_obj=reponse.json() print(dic_obj) with open('douban.json','w',encoding='utf-8') as fp:json.dump(dic_obj,fp=fp,ensure_ascii=False)運行結果:(同時保存為json)
?8)JK妹子爬取
import requests import re import urllib.request import time import os header={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36' } url='https://cn.bing.com/images/async?q=jk%E5%88%B6%E6%9C%8D%E5%A5%B3%E7%94%9F%E5%A4%B4%E5%83%8F&first=118&count=35&relp=35&cw=1177&ch=705&tsc=ImageBasicHover&datsrc=I&layout=RowBased&mmasync=1&SFX=4' request=requests.get(url=url,headers=header) c=request.text pattern=re.compile(r'<div class="imgpt".*?<div class="img_cont hoff">.*?src="(.*?)".*?</div>',re.S ) items = re.findall(pattern, c) # print(items) os.makedirs('photo',exist_ok=True) for a in items:print(a) for a in items:print("下載圖片:"+a)b=a.split('/')[-1]urllib.request.urlretrieve(a,'photo/'+str(int(time.time()))+'.jpg')print(a+'.jpg')time.sleep(2)運行結果:
?圖片如下:
?總結:
如果本文的表頭或者url處不懂或不知道怎么找,請移步到基礎篇先去學習一下!
如果本文有不當之處,請你指出,謝謝!!!
總結
以上是生活随笔為你收集整理的手把手教你爬虫requests实战演练——python篇的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: R语言基础入门(10)之矩阵和数组
- 下一篇: Linux系统安装及配置——Centos