當前位置：首頁 > 编程语言 > python >内容正文

python

抓取Bilibili哔哩哔哩网站视频(Java和Python双版本实现)

發布時間：2023/12/8 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了抓取Bilibili哔哩哔哩网站视频(Java和Python双版本实现) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1、B站視頻真實地址分析

我一直覺得編程語言只是一種工具，重要的是思想🐶。下面先來分析下B站視頻的真實地址。

1.1 獲取視頻的信息數據

使用PC通過瀏覽器隨便打開一個B站的視頻，右鍵檢查或者是按F12，查看網頁源代碼。我們會發現有一個script標簽內的內容是這樣的。

嗯，仿佛這就是視頻的信息了。下面我們將其復制出來，格式化一下。

window.__playinfo__ = {"data": {"accept_format": "flv720,flv480,mp4","accept_description": ["高清 720P", "清晰 480P", "流暢 360P"],"accept_quality": [64, 32, 16],"dash": {"video": [{"id": 64,"baseUrl": "http://cn-sdbz-cu-v-05.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30064.m4s?expires=1589032200&platform=pc&ssig=7TagzkxicmXQCX-eJG1rWw&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=80000000","base_url": "http://cn-sdbz-cu-v-05.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30064.m4s?expires=1589032200&platform=pc&ssig=7TagzkxicmXQCX-eJG1rWw&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=80000000","backupUrl": ["http://cn-sdyt-cu-v-11.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30064.m4s?expires=1589032200&platform=pc&ssig=7TagzkxicmXQCX-eJG1rWw&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000", "http://cn-hbcd2-cu-v-07.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30064.m4s?expires=1589032200&platform=pc&ssig=7TagzkxicmXQCX-eJG1rWw&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000"],"backup_url": ["http://cn-sdyt-cu-v-11.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30064.m4s?expires=1589032200&platform=pc&ssig=7TagzkxicmXQCX-eJG1rWw&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000", "http://cn-hbcd2-cu-v-07.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30064.m4s?expires=1589032200&platform=pc&ssig=7TagzkxicmXQCX-eJG1rWw&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000"],"bandwidth": 1883922,"mimeType": "video/mp4","mime_type": "video/mp4","width": 720,"height": 1280,"SegmentBase": {"Initialization": "0-974", "indexRange": "975-1162"},"segment_base": {"initialization": "0-974", "index_range": "975-1162"},"codecid": 7},], "audio": [{"id": 30280,"baseUrl": "http://cn-sdyt-cu-v-05.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30280.m4s?expires=1589032200&platform=pc&ssig=ud9zkd5aAUp7mB4yPjI_LA&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=80000000","base_url": "http://cn-sdyt-cu-v-05.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30280.m4s?expires=1589032200&platform=pc&ssig=ud9zkd5aAUp7mB4yPjI_LA&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=80000000","backupUrl": ["http://cn-hbcd2-cu-v-14.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30280.m4s?expires=1589032200&platform=pc&ssig=ud9zkd5aAUp7mB4yPjI_LA&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000", "http://cn-sdjn2-cu-v-05.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30280.m4s?expires=1589032200&platform=pc&ssig=ud9zkd5aAUp7mB4yPjI_LA&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000"],"backup_url": ["http://cn-hbcd2-cu-v-14.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30280.m4s?expires=1589032200&platform=pc&ssig=ud9zkd5aAUp7mB4yPjI_LA&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000", "http://cn-sdjn2-cu-v-05.bilivideo.com/upgcxcode/18/80/187918018/187918018-1-30280.m4s?expires=1589032200&platform=pc&ssig=ud9zkd5aAUp7mB4yPjI_LA&oi=1894210281&trid=4b3d732f515544e49c843d5f2c87f64bu&nfc=1&nfb=maPYqpoel5MI3qOUX6YpRA==&mid=388810686&logo=40000000"],"bandwidth": 319474,"mimeType": "audio/mp4","mime_type": "audio/mp4","SegmentBase": {"Initialization": "0-919", "indexRange": "920-1107"},"segment_base": {"initialization": "0-919", "index_range": "920-1107"},},]}}, "session": "996ecc0413599104d175e5c254e70fb7", "videoFrame": {} }

我刪除了一些沒有的信息，通過上面的信息我們可以得到以下幾點信息：

B站的視頻是音視頻分離的。

我們可以從js中獲取真實音視頻地址。

提供了"高清 720P", "清晰 480P", "流暢 360P"方式供我們選擇。不要問我為什么沒有1080P的，俺也不知道😢。

下面我們再來看一下B站自己發送請求的信息。

我們發現每次請求時都攜帶了，此次請求文件的字節位置信息。

1.2 抓取B站視頻的思路

請求想要下載視頻的地址，獲取頁面html。

從頁面中解析出視頻的基本信息，音視頻的url地址等信息。

下載音視頻文件，發送請求時帶上請求的范圍(range)。(注：本文的實現中，沒有使用多線程，直接請求的整個文件)

將下載完成的音視頻文件合并成完成的視頻文件。

1.3 用到的第三方庫和軟件

第三方軟件

ffmpeg：用于合并音視頻文件。官方網址：http://ffmpeg.org/。
- ffmpeg：是處理音視頻的利器，感興趣可以搜索相關資料了解下。本文只要你下載下來，將ffmpeg位置寫到代碼變量里就可以了。

Python

requests：用于發送Http請求
ffmpeg-python：方便操作ffmpeg
HTMLParser：python自帶的HTML解析工具

Java

hutool：一個國人開源的Java工具包。強烈推薦。
jsoup：Java解析HTML的利器

2、代碼實現

注：本文中代碼下載音視頻均采用的單線程的方式，如果使用多線程，一定要計算好每個請求的請求范圍，以及下載完成后，合并文件時的順序。

2.1 Python的實現

解析html

class BiliHTMLParser(HTMLParser):"""繼承自HTMLParser。用于解析html"""def __init__(self):super().__init__()self.isTitle = 0self.videoName = "" # 視頻名稱self.videoInfo = {} # 視頻信息def handle_starttag(self, tag, attrs):if tag != 'title':returnself.isTitle += 1def handle_endtag(self, tag):if tag == 'title' and self.isTitle:self.isTitle -= 1def handle_data(self, data):"""獲取當前頁面的視頻信息:param data: tag中的數據:return:"""if data and self.isTitle: # 用于獲取視頻名稱self.videoName = dataif data.startswith('window.__playinfo__='):infoStr = data.split('window.__playinfo__=')[-1] # 截取`window.__playinfo__=`之后的字符串self.videoInfo = json.loads(infoStr) # 字符串轉字典dict

獲取視頻信息

def getVideo(videoInfo, videoName):""":param videoInfo: 視頻信息字典dict:param videoName: 視頻名稱:return:"""# 獲取視頻的url和初始的大小范圍videoBaseUrl = videoInfo['data']['dash']['video'][0]['baseUrl']videoBaseRange = videoInfo['data']['dash']['video'][0]['SegmentBase']['Initialization']# 獲取音頻的url和初始的大小范圍audioBaseUrl = videoInfo['data']['dash']['audio'][0]['baseUrl']audioBaseRange = videoInfo['data']['dash']['audio'][0]['SegmentBase']['Initialization']# 文件下載videoSize = getVideoInfo(videoBaseUrl, videoBaseRange)videoFileName = downloadFile(videoBaseUrl, videoSize, "video", videoName)audioSize = getVideoInfo(audioBaseUrl, audioBaseRange)audioFileName = downloadFile(audioBaseUrl, audioSize, "audio", videoName)# 合并文件outFilePath = "./%s/%s.mp4" % (videoName, videoName)mergeFiles(videoFileName, audioFileName, outFilePath)def getVideoInfo(baseUrl, range):"""獲取視頻或音頻文件的總大小:param baseUrl::param range::return:"""headers = {'Referer': videoUrl,'Range': 'bytes=%s' % (range),}videoRes = requests.get(url=baseUrl, headers=headers)# 獲取視頻總大小headersInfo = videoRes.headerstotal = headersInfo['Content-Range'].split('/')[-1]print('資源的總字節數：%s' % total)return total

下載音視頻

def downloadFile(url, totalSize, type, videoName):"""下載資源:param url: 資源url:param totalSize: 資源總大小:param type: video/audio:param videoName: 視頻名稱:return:"""headers = {'Referer': videoUrl,'Range': "bytes=%s-%s" % (str(0), str(totalSize))}fileDir = "./%s" % videoNameif not os.path.exists(fileDir):os.mkdir(fileDir)fileName = "./%s/%s-%s.mp4" % (videoName, videoName, type)if not os.path.exists(fileName):res = requests.get(url=url, headers=headers, stream=True)print("開始下載：%s" % type)data = res.contentwith open(fileName, 'wb') as file_obj:file_obj.write(data)print("完成%s的下載" % type)return fileName

合并音視頻

def mergeFiles(videoFilePath, audioFilePath, outFilePath):"""合并音視頻"""print("開始合并音視頻")videoFile = ffmpeg.input(videoFilePath)audioFile = ffmpeg.input(audioFilePath)stream = ffmpeg.output(videoFile, audioFile, outFilePath, vcodec='copy', acodec='copy')ffmpeg.run(stream, cmd=ffmpegPath)print("合并音視頻完成")

2.2 Java實現

解析Html

/** 解析HTML獲取相關信息 */private static void htmlParser(){HttpResponse res = HttpRequest.get(VIDEO_URL).timeout(2000).execute();String html = res.body();Document document = Jsoup.parse(html);Element title = document.getElementsByTag("title").first();// 視頻名稱VIDEO_INFO.videoName = title.text();// 截取視頻信息Pattern pattern = Pattern.compile("(?<=<script>window.__playinfo__=).*?(?=</script>)");Matcher matcher = pattern.matcher(html);if (matcher.find()) {VIDEO_INFO.videoInfo = new JSONObject(matcher.group());} else {System.err.println("未匹配到視頻信息，退出程序！");return;}getVideoInfo();}

獲取視頻信息

/** 解析視頻和音頻的具體信息 */private static void getVideoInfo(){// 獲取視頻的基本信息JSONObject videoInfo = VIDEO_INFO.videoInfo;JSONArray videoInfoArr = videoInfo.getJSONObject("data").getJSONObject("dash").getJSONArray("video");VIDEO_INFO.videoBaseUrl = videoInfoArr.getJSONObject(0).getStr("baseUrl");VIDEO_INFO.videoBaseRange = videoInfoArr.getJSONObject(0).getJSONObject("SegmentBase").getStr("Initialization");HttpResponse videoRes = HttpRequest.get(VIDEO_INFO.videoBaseUrl).header("Referer", VIDEO_URL).header("Range", "bytes=" + VIDEO_INFO.videoBaseRange).header("User-Agent", USER_AGENT).timeout(2000).execute();VIDEO_INFO.videoSize = videoRes.header("Content-Range").split("/")[1];// 獲取音頻基本信息JSONArray audioInfoArr = videoInfo.getJSONObject("data").getJSONObject("dash").getJSONArray("audio");VIDEO_INFO.audioBaseUrl = audioInfoArr.getJSONObject(0).getStr("baseUrl");VIDEO_INFO.audioBaseRange = audioInfoArr.getJSONObject(0).getJSONObject("SegmentBase").getStr("Initialization");HttpResponse audioRes = HttpRequest.get(VIDEO_INFO.audioBaseUrl).header("Referer", VIDEO_URL).header("Range", "bytes=" + VIDEO_INFO.audioBaseRange).header("User-Agent", USER_AGENT).timeout(2000).execute();VIDEO_INFO.audioSize = audioRes.header("Content-Range").split("/")[1];downloadFile();}

下載音視頻

/** 下載音視頻 */private static void downloadFile(){// 保存音視頻的位置SAVE_PATH = "." + File.separator + VIDEO_INFO.videoName;File fileDir = new File(SAVE_PATH);if (!fileDir.exists()){fileDir.mkdirs();}// 下載視頻File videoFile = new File(SAVE_PATH + File.separator + VIDEO_INFO.videoName + "_video.mp4");if (!videoFile.exists()){System.out.println("--------------開始下載視頻文件--------------");HttpResponse videoRes = HttpRequest.get(VIDEO_INFO.videoBaseUrl).header("Referer", VIDEO_URL).header("Range", "bytes=0-" + VIDEO_INFO.videoSize).header("User-Agent", USER_AGENT).execute();videoRes.writeBody(videoFile);System.out.println("--------------視頻文件下載完成--------------");}// 下載音頻File audioFile = new File(SAVE_PATH + File.separator + VIDEO_INFO.videoName + "_audio.mp4");if (!audioFile.exists()){System.out.println("--------------開始下載音頻文件--------------");HttpResponse audioRes = HttpRequest.get(VIDEO_INFO.audioBaseUrl).header("Referer", VIDEO_URL).header("Range", "bytes=0-" + VIDEO_INFO.audioSize).header("User-Agent", USER_AGENT).execute();audioRes.writeBody(audioFile);System.out.println("--------------音頻文件下載完成--------------");}mergeFiles(videoFile,audioFile);}

合并視頻

private static void mergeFiles(File videoFile,File audioFile){System.out.println("--------------開始合并音視頻--------------");String outFile = SAVE_PATH + File.separator + VIDEO_INFO.videoName + ".mp4";List<String> commend = new ArrayList<>();commend.add(FFMPEG_PATH);commend.add("-i");commend.add(videoFile.getAbsolutePath());commend.add("-i");commend.add(audioFile.getAbsolutePath());commend.add("-vcodec");commend.add("copy");commend.add("-acodec");commend.add("copy");commend.add(outFile);ProcessBuilder builder = new ProcessBuilder();builder.command(commend);try {builder.inheritIO().start().waitFor();System.out.println("--------------音視頻合并完成--------------");} catch (InterruptedException | IOException e) {System.err.println("音視頻合并失敗！");e.printStackTrace();}}

3、源碼地址

GitHub源碼地址

總結

以上是生活随笔為你收集整理的抓取Bilibili哔哩哔哩网站视频(Java和Python双版本实现)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Ngrok 实现内网穿透教程（Ngrok
下一篇： Bilibili视频爬虫