python作业6月14日
生活随笔
收集整理的這篇文章主要介紹了
python作业6月14日
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
# 爬蟲原理 提取有用數據 保存 # 互聯網 # 瀏覽器 發送http請求 # 梨視頻:分析視頻文件 保存到本地 # 今日內容 # request模塊的詳細使用 # selenium模塊 # 1 請求庫 # 2 解析庫 # 3 存儲庫 import requests import re # response=requests.get(url='https://www.pearvideo.com/video_1566066') # <a href="video_1461493" class="actplay openapp" target="_blank"> # <div class="video-main"> # <img class="img" src="https://image.pearvideo.com/cont/20181023/cont-1461493-11648067.png" alt="離婚后需要幫前妻還債嗎?"> # <div class="vdo-time">02:35</div></div> # <div class="vdo-summary"> # <p class="vdo-tt">離婚后需要幫前妻還債嗎?</p> # <div class="none vdo-open-tips"></div> # </div> # </a> # re.findall('正則表達式','jiexiewenben','zhengzemoshi') # print(response.status_code) # print(response.text) # res=re.findall('<a href="video_(.*?)"',response.text,re.S) # # 。當前位置 *所有 ?找到就停下來 # print(res) # for m_id in res: # detail_url='https://www.pearvideo.com/video_'+m_id # print(detail_url) # 獲取視頻詳情頁 import uuid
?
# 1 發送請求 def get_page(url): response=requests.get(url) return response # 2 解析數據?
def parse_index(text): res=re.findall('<a href="video_(.*?)"',text,re.S) detail_url_list=[] for m_id in res: detail_url='https://www.pearvideo.com/video_'+m_id detail_url_list.append(detail_url) return detail_url_list def parse_detail(text): movie_url=re.findall('srcUrl="(.*?)"',text,re.S)[0] return movie_url # 3 保存數據 def save_movie(movie_url): response=requests.get(movie_url) with open(f'{uuid.uuid4()}.mp4','wb') as f: f.write(response.content) f.flush() from concurrent.futures import ThreadPoolExecutor from multiprocessing import pool if __name__ == "__main__": # ThreadPoolExecutor?
# index_res=get_page(url='https://www.pearvideo.com/') # detail_url_list=parse_index(index_res.text) # for detail_url in detail_url_list: # detail_res =get_page(url=detail_url) # movie_url=parse_detail(detail_res.text) # print(movie_url) # save_movie(movie_url) # 導入線程池模塊 url='https://www.pearvideo.com/' pool.submit(get_page,url)作業:
'''''' ''' 主頁:https://movie.douban.com/top250GETUser-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36re正則:# 電影詳情頁url、圖片鏈接、電影名稱、電影評分、評價人數<div class="item">.*?href="(.*?)">.*?src="(.*?)".*?<span class="title">(.*?)</span>.*?<span class="rating_num".*?>(.*?)</span>.*?<span>(.*?)人評價 ''' import requests import reurl = 'https://movie.douban.com/top250' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36' } # 1、往豆瓣TOP250發送請求獲取響應數據 response = requests.get(url, headers=headers)# print(response.text)# 2、通過正則解析提取數據 # 電影詳情頁url、圖片鏈接、電影名稱、電影評分、評價人數 movie_content_list = re.findall(# 正則規則'<div class="item">.*?href="(.*?)">.*?src="(.*?)".*?<span class="title">(.*?)</span>.*?<span class="rating_num".*?>(.*?)</span>.*?<span>(.*?)人評價.*?<p class>==<p class>(.*?)</p>.*?<spam class="inq">(.*?)</span>',# 解析文本response.text,# 匹配模式re.S) for movie_content in movie_content_list:# 解壓賦值每一部電影detail_url, movie_jpg, name, point,yu,jili,num = movie_contentdata = f'電影名稱:{name}, 詳情頁url:{detail_url}, 圖片url:{movie_jpg}, 評分: {point}, 評價人數: {num},介紹語:{yu}激勵語;{jili} \n'#print(data)# 3、保存數據,把電影信息寫入文件中with open('li.txt', 'a', encoding='utf-8') as f:f.write(data)轉載于:https://www.cnblogs.com/jjjpython1/p/11061838.html
總結
以上是生活随笔為你收集整理的python作业6月14日的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: hihocoder #1465 : 后
- 下一篇: BZOJ 2442: [Usaco201