當前位置：首頁 > 编程语言 > python >内容正文

python

python跑一亿次循环_python爬虫爬取微博评论

發布時間：2024/7/23 python 35 豆豆

生活随笔收集整理的這篇文章主要介紹了 python跑一亿次循环_python爬虫爬取微博评论小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原標題：python爬蟲爬取微博評論

python爬蟲是程序員們一定會掌握的知識，練習python爬蟲時，很多人會選擇爬取微博練手。python爬蟲微博根據微博存在于不同媒介上，所爬取的難度有差異，無論是python新入手的小白，還是已經熟練掌握的程序員，可以拿來練手。本文介紹python爬取微博評論的代碼實例。

一、爬蟲微博

與QQ空間爬蟲類似，可以爬取新浪微博用戶的個人信息、微博信息、粉絲、關注和評論等。

爬蟲抓取微博的速度可以達到 1300萬/天以上，具體要視網絡情況。

難度程度排序：網頁端>手機端>移動端。微博端就是最好爬的微博端。

二、python爬蟲爬取微博評論

第一步：確定評論用戶的id

# -*- coding:utf-8 -*-

import requests

import re

import time

import pandas

as pd

urls = 'https://m.weibo.cn/api/comments/show?id=4073157046629802&page={}'

headers = {'Cookies':'Your cookies',

'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6)

AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'}

第二步：找到html標簽

tags = re.compile('?\w+[^>]*>')

第三步：設置提取評論function

def get_comment(url):

j = requests.get(url, headers=headers).json()

comment_data = j['data']['data']

for data in comment_data:

try:

第四步：利用正則表達式去除文本中的html標簽

comment = tags.sub('', data['text']) # 去掉html標簽

reply = tags.sub('', data['reply_text'])

weibo_id = data['id']

reply_id = data['reply_id']

comments.append(comment)

comments.append(reply)

ids.append(weibo_id)

ids.append(reply_id)

第五步：爬取評論

df = pd.DataFrame({'ID': ids, '評論': comments})

df = df.drop_duplicates()

df.to_csv('觀察者網.csv', index=False, encoding='gb18030')

以上python爬蟲爬取微博評論的實例，對于新入手的小白，可以用微博端練練手哦~

原文至：https://www.py.cn/spider/example/22977.html

總結

以上是生活随笔為你收集整理的python跑一亿次循环_python爬虫爬取微博评论的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： matlab设置固定的窗宽窗位,pyth
下一篇： beego原生mysql查询_Beego