爬取网易云音乐个人动态中的视频(Ⅲ): 实现爬取过程
生活随笔
收集整理的這篇文章主要介紹了
爬取网易云音乐个人动态中的视频(Ⅲ): 实现爬取过程
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
回顧
有了?爬取網(wǎng)易云音樂(lè)個(gè)人動(dòng)態(tài)中的視頻(Ⅰ)?和 爬取網(wǎng)易云音樂(lè)個(gè)人動(dòng)態(tài)中的視頻(Ⅱ) 的鋪墊, 編寫爬蟲的代碼便顯得沒(méi)那么突出了.
實(shí)現(xiàn)
直接show代碼!?
給出加密的代碼
encrypt_api.py
import base64 from Cryptodome.Cipher import AES import os import json import binascii# 來(lái)源: https://blog.csdn.net/tzs_1041218129/article/details/52789153 # 來(lái)源: https://github.com/darknessomi/musicbox/blob/master/NEMbox/encrypt.py # 根據(jù)上述兩個(gè)網(wǎng)站, 對(duì)其做了一點(diǎn)點(diǎn)修改, 使其對(duì)我代碼能生效__all__ = ['encrypt_data']MODULUS = ('00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7''b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280''104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932''575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b''3ece0462db0a22b8e7') PUBKEY = '010001' NONCE = b'0CoJUm6Qyw8W8jud'def aes(text, key):pad = 16 - len(text) % 16text = text + bytearray([pad] * pad)encryptor = AES.new(key, 2, b'0102030405060708')ciphertext = encryptor.encrypt(text)return base64.b64encode(ciphertext)def rsa(text, pubkey, modulus):text = text[::-1]rs = pow(int(binascii.hexlify(text), 16),int(pubkey, 16), int(modulus, 16))return format(rs, 'x').zfill(256)def encrypt_data(dict_data):"""text = {"ids": "[\"12A059550A712E4DDB3013DCDE3C92B4\", \"5B0AF067CBB42F7789F7B97E13827565\"]","resolution": "1080","csrf_token": ""}"""text = json.dumps(dict_data).encode('utf-8')secret = binascii.hexlify(os.urandom(16))[:16]params = aes(aes(text, NONCE), secret)encSecKey = rsa(secret, PUBKEY, MODULUS)data = {"params": params,"encSecKey": encSecKey}return data接下來(lái)是主要的腳本
代碼比較粗糙, 其中使用time.sleep方法等待phantomjs加載頁(yè)面
main.py
import requests import csv from selenium import webdriver import timefrom encrypt_api import encrypt_data############################################# # BEGIN 一些url和http請(qǐng)求頭的設(shè)置 header = {'Accept': '*/*','Accept-Encoding': 'gzip,deflate,sdch','Accept-Language': 'zh-CN,zh;q=0.8,gl;q=0.6,zh-TW;q=0.4','Connection': 'keep-alive','Content-Type': 'application/x-www-form-urlencoded','Host': 'music.163.com','Referer': 'http://music.163.com','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36', }event_url = 'http://music.163.com/#/user/event?id=343142613' enent_mv_api_url = 'http://music.163.com/weapi/cloudvideo/playurl'# END 一些url和http請(qǐng)求頭的設(shè)置 #############################################driver = webdriver.PhantomJS(executable_path=r'E:\Study\phantomjs-2.1.1-windows\bin\phantomjs.exe') driver.get(event_url) # 等待3s, 期望頁(yè)面能加載成功 time.sleep(3) # 滾動(dòng)到頁(yè)底 js = "document.getElementById('g_iframe').contentWindow.scrollTo(0,9999999)" driver.execute_script(js) time.sleep(3) driver.get_screenshot_as_file('tmp/test_screenshot.png') driver.switch_to.frame('g_iframe') event_mv_list = [{'details': i.text, 'id': i.get_attribute('data-vid')}for i in driver.find_elements_by_css_selector('div.info.f-pa') if i is not None] # event_mv_list[0] # {'name': '【耳機(jī)體驗(yàn)】3DC音效《BINGBIAN病變》秋仁 - by 自由者音效\n47149\n04:10', 'id': '5B0AF067CBB42F7789F7B97E13827565'}print('共有%d個(gè)視頻' % len(event_mv_list)) if len(event_mv_list):event_mv_list_ids = [i['id'] for i in event_mv_list]data = {'ids': str(event_mv_list_ids),"resolution": "1080","csrf_token": ""}data = encrypt_data(data)sess = requests.session()resp = sess.post(enent_mv_api_url, data=data, headers=header)if resp.status_code == 200:# 保存視頻urlwith open('tmp/mv_urls.txt', 'w') as f:for each in resp.json()['urls']:f.write(each['url']+'\n')# 保存jsonwith open('tmp/resp_json.txt', 'w') as f:f.write(resp.text)# 把視頻的信息和json一起保存with open('tmp/mv_info.csv', 'w', newline='') as f:writer = csv.DictWriter(f, ['name','like','time','id','url','size','validityTime','r'])writer.writeheader()csv_dict_data = resp.json()['urls']for i, each in enumerate(csv_dict_data):each['name'], each['like'], each['time'] = event_mv_list[i]['details'].split('\n')writer.writerow(each)結(jié)果
給出mv_urls.txt的截圖, 如下
下載
有了這些url之后, 可以采用下載工具去進(jìn)行下載, 諸如IDM等等多線程的下載工具都很不錯(cuò)! 但是需要注意的是, 這樣下載下來(lái)的文件可能文件名并不是你在網(wǎng)頁(yè)上面看到的MV名, 所以可能需要配合使用mv_info.csv來(lái)對(duì)其進(jìn)行重命名操作. 這是不難的.
更多
以上代碼存放在?GetCloudMusicVideoOnEvent
總結(jié)
以上是生活随笔為你收集整理的爬取网易云音乐个人动态中的视频(Ⅲ): 实现爬取过程的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: python写dnf游戏脚本辅助_利用P
- 下一篇: TABLE中TR、TH和TD