生活随笔
收集整理的這篇文章主要介紹了
利用Python中的BeautifulSoup库爬取豆瓣读书中书本信息
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
????利用BeautifulSoup庫,獲取前250本圖書的信息,需要爬取的信息包括書名、書名的URL鏈接、作者、出版社和出版時間、書本價格、評分和評論,把這些信息存到txt文件,要求將這些信息對齊,排列整齊 (我是剛學習網絡爬蟲,代碼如有錯誤,望指正)
網址為:https://book.douban.com/top250
代碼如下:
import requests
from bs4
import BeautifulSoup
import time
headers
={"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"}
book_list
=[]def book_info(url
):book_data
=requests
.get
(url
,headers
=headers
)soup
=BeautifulSoup
(book_data
.text
,"lxml")book_is
=soup
.select
("div.pl2 > a")lines
=soup
.select
("p.pl")marks
=soup
.select
("span.rating_nums")comtents
=soup
.select
("span.inq")for book_i
,line
,mark
,comtent
in zip(book_is
,lines
,marks
,comtents
):line
=line
.get_text
().split
("/")data
={"book_name":book_i
.get_text
().replace
('\n','').replace
(' ',''),"book_url":book_i
['href'],"line":' '.join
(line
),"mark":mark
.get_text
(),"comtent":comtent
.get_text
()}book_list
.append
(data
)if __name__
=='__main__':urls
=["https://book.douban.com/top250?start={}".format(str(i
)) for i
in range(0,250,25)]for url
in urls
:book_info
(url
)time
.sleep
(1)for word
in book_list
:fo
=open(r'D:\Python爬蟲\doubanbook.txt',"a+",encoding
='utf-8')try:fo
.write
(word
["book_name"]+" "+word
["book_url"]+" "+word
["line"]+" "+word
["mark"]+"分"+" "+word
["comtent"]+"\n")fo
.write
("\n")fo
.close
()except UnicodeDecodeError
:pass
結果部分截圖:
總結
以上是生活随笔為你收集整理的利用Python中的BeautifulSoup库爬取豆瓣读书中书本信息的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。