起点中文网 字体反爬技术 网页可以显示数字字母 网页代码是乱码或空格
生活随笔
收集整理的這篇文章主要介紹了
起点中文网 字体反爬技术 网页可以显示数字字母 网页代码是乱码或空格
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
我接過一段代碼
# -*- coding: utf-8 -*- """ Created on Tue Mar 23 14:38:01 2021@author: xinyi """import xlwt import requests from lxml import etree import timeall_info_list = []def get_info(url):html = requests.get(url)selector = etree.HTML(html.text)infos = selector.xpath('//ul[@class="all-img-list cf"]/li')for info in infos:title = info.xpath('div[2]/h4/a/text()')[0]author = info.xpath('div[2]/p[1]/a[1]/text()')[0]style_1 = info.xpath('div[2]/p[1]/a[2]/text()')[0]style_2 = info.xpath('div[2]/p[1]/a[3]/text()')[0]style = style_1+'·'+style_2complete = info.xpath('div[2]/p[1]/span/text()')[0]introduce = info.xpath('div[2]/p[2]/text()')[0].strip()word = info.xpath('div[2]/p[3]/span/text()')[0].strip('萬字')info_list = [title,author,style,complete,introduce,word]all_info_list.append(info_list)time.sleep(1)if __name__ == '__main__':urls = ['http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,101)]for url in urls:get_info(url)header = ['title','author','style','complete','introduce','word']book = xlwt.Workbook(encoding='utf-8')sheet = book.add_sheet('Sheet1')for h in range(len(header)):sheet.write(0, h, header[h])i = 1for list in all_info_list:j = 0for data in list:sheet.write(i, j, data)j += 1i += 1book.save('xiaoshuo.xls')三. 最終代碼
# -*- coding: utf-8 -*- """ Created on Tue Mar 23 14:38:01 2021@author: xinyi """import xlwt import requests from lxml import etree import timeall_info_list = []def get_info(url):html = requests.get(url)selector = etree.HTML(html.text)infos = selector.xpath('//ul[@class="all-img-list cf"]/li')for info in infos:title = info.xpath('div[2]/h4/a/text()')[0]author = info.xpath('div[2]/p[1]/a[1]/text()')[0]style_1 = info.xpath('div[2]/p[1]/a[2]/text()')[0]style_2 = info.xpath('div[2]/p[1]/a[3]/text()')[0]style = style_1+'·'+style_2complete = info.xpath('div[2]/p[1]/span/text()')[0]introduce = info.xpath('div[2]/p[2]/text()')[0].strip()word = info.xpath('div[2]/p[3]/span/text()')[0].strip('萬字')info_list = [title,author,style,complete,introduce,word]all_info_list.append(info_list)time.sleep(1)if __name__ == '__main__':urls = ['http://a.qidian.com/?page={}'.format(str(i)) for i in range(1,101)]for url in urls:get_info(url)header = ['title','author','style','complete','introduce','word']book = xlwt.Workbook(encoding='utf-8')sheet = book.add_sheet('Sheet1')for h in range(len(header)):sheet.write(0, h, header[h])i = 1for list in all_info_list:j = 0for data in list:sheet.write(i, j, data)j += 1i += 1book.save('xiaoshuo.xls')四. 參考資料
參考1
參考2
參考3
參考4
參考5
參考6:mango
感謝各位大佬的優(yōu)秀文章
總結(jié)
以上是生活随笔為你收集整理的起点中文网 字体反爬技术 网页可以显示数字字母 网页代码是乱码或空格的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 甲骨文公司将云层添加至其大型机VTL当中
- 下一篇: 计算机带给我们的改变英语作文,技术正改变