當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

xpath+多进程爬取网易云音乐热歌榜。

發布時間：2024/9/30 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 xpath+多进程爬取网易云音乐热歌榜。小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

用到的工具，外鏈轉換工具

網易云網站直接打開源代碼里面并沒有對應的歌曲信息，需要對url做處理，
查看網站源代碼路徑；發現把里面的#號去掉會顯示所有內容，

右鍵打開的源代碼路徑：view-source:https://music.163.com/#/discover/toplist?id=3778678去掉#號后：view-source:https://music.163.com/discover/toplist?id=3778678

資源拿到了，開始寫代碼；

import requests from lxml import etree import os from multiprocessing import Pool headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' } # 創建存儲路徑 pathname = './music/' if not os.path.exists(pathname):os.mkdir(pathname) # 獲取歌曲鏈接的函數 def get_urls(url):try:response = requests.get(url=url,headers=headers)music = etree.HTML(response.text)music_urls = music.xpath('//ul[@class="f-hide"]/li')musiclist=[]for music_url in music_urls:url = music_url.xpath('./a/@href')[0]name = music_url.xpath('./a/text()')[0]musiclist.append({'key':name,'url':'https://link.hhtjim.com/163/'+url.split('=')[-1]+'.mp3'})# 多進程啟動爬取pool.map(get_music,musiclist)except Exception:print('get_urls failed') # 下載歌曲的函數 def get_music(url):try:# 判斷歌曲是否已下載，避免網絡問題導致重新爬取if os.path.exists(pathname+url['key']+'.mp3'):print('歌曲已存在')else:response = requests.get(url=url['url'],headers=headers)with open(pathname+url['key']+'.mp3','wb') as f:f.write(response.content)print('正在下載：'+url['key'],url['url'])except Exception:print('get_music failed')if __name__ == '__main__':# 爬取的url的源代碼路徑url = 'https://music.163.com/discover/toplist?id=3778678'# 開啟進程池pool = Pool()get_urls(url)

代碼中獲取歌曲鏈接是拼接的路由要用到音樂外鏈工具，

控制臺輸出；

正在下載：那個女孩 https://link.hhtjim.com/163/1300994613.mp3 正在下載：Lemon https://link.hhtjim.com/163/536622304.mp3 正在下載：給未來 https://link.hhtjim.com/163/1377131180.mp3 正在下載：四塊五 https://link.hhtjim.com/163/1365221826.mp3 正在下載：再也沒有 https://link.hhtjim.com/163/480580003.mp3 正在下載：云煙成雨 https://link.hhtjim.com/163/513360721.mp3 正在下載：你是人間四月天 https://link.hhtjim.com/163/1344897943.mp3 正在下載：靜悄悄 https://link.hhtjim.com/163/553815178.mp3 正在下載：我的名字 https://link.hhtjim.com/163/554241732.mp3 正在下載：我的一個道姑朋友 https://link.hhtjim.com/163/1367452194.mp3 正在下載：感謝你曾來過 https://link.hhtjim.com/163/460578140.mp3 正在下載：心安理得 https://link.hhtjim.com/163/474739467.mp3 正在下載：煙火里的塵埃 https://link.hhtjim.com/163/29004400.mp3

打開文件夾查看是否下載成功；

done。

總結

以上是生活随笔為你收集整理的xpath+多进程爬取网易云音乐热歌榜。的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Python使用request包请求网页
下一篇： python怎么连接MongoDB数据库