利用Python搜索51CTO推荐博客并保存至Excel
生活随笔
收集整理的這篇文章主要介紹了
利用Python搜索51CTO推荐博客并保存至Excel
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
一、背景
近期在學(xué)習(xí)爬蟲,利用Requests模塊獲取頁面,BeautifulSoup來獲取需要的內(nèi)容,最后利用xlsxwriter模塊講內(nèi)容保存至excel,在此記錄一下,后續(xù)可舉一反三,利用其抓取其他內(nèi)容持久和存儲到文件內(nèi),或數(shù)據(jù)庫等。
二、代碼
編寫了兩個模塊,geturl3和getexcel3,最后在main內(nèi)調(diào)用
git源碼地址
geturl3.py代碼內(nèi)容如下:
#!/bin/env python # -*- coding:utf-8 -*- # @Author : kaliarchimport requests from bs4 import BeautifulSoupclass get_urldic:#獲取搜索關(guān)鍵字def get_url(self):urlList = []first_url = 'https://blog.51cto.com/search/result?q='after_url = '&type=&page='try:search = input("Please input search name:")page = int(input("Please input page:"))except Exception as e:print('Input error:',e)exit()for num in range(1,page+1):url = first_url + search + after_url + str(num)urlList.append(url)print("Please wait....")return urlList,search#獲取網(wǎng)頁文件def get_html(self,urlList):response_list = []for r_num in urlList:request = requests.get(r_num)response = request.contentresponse_list.append(response)return response_list#獲取blog_name和blog_urldef get_soup(self,html_doc):result = {}for g_num in html_doc:soup = BeautifulSoup(g_num,'html.parser')context = soup.find_all('a',class_='m-1-4 fl')for i in context:title=i.get_text()result[title.strip()]=i['href']return resultif __name__ == '__main__':blog = get_urldic()urllist, search = blog.get_url()html_doc = blog.get_html(urllist)result = blog.get_soup(html_doc)for k,v in result.items():print('search blog_name is:%s,blog_url is:%s' % (k,v))getexcel3.py代碼內(nèi)容如下:
#!/bin/env python # -*- coding:utf-8 -*- # @Author : kaliarchimport xlsxwriterclass create_excle:def __init__(self):self.tag_list = ["blog_name", "blog_url"]def create_workbook(self,search=" "):excle_name = search + '.xlsx'#定義excle名稱workbook = xlsxwriter.Workbook(excle_name)worksheet_M = workbook.add_worksheet(search)print('create %s....' % excle_name)return workbook,worksheet_Mdef col_row(self,worksheet):worksheet.set_column('A:A', 12)worksheet.set_row(0, 17)worksheet.set_column('A:A',58)worksheet.set_column('B:B', 58)def shell_format(self,workbook):#表頭格式merge_format = workbook.add_format({'bold': 1,'border': 1,'align': 'center','valign': 'vcenter','fg_color': '#FAEBD7'})#標(biāo)題格式name_format = workbook.add_format({'bold': 1,'border': 1,'align': 'center','valign': 'vcenter','fg_color': '#E0FFFF'})#正文格式normal_format = workbook.add_format({'align': 'center',})return merge_format,name_format,normal_format#寫入title和列名def write_title(self,worksheet,search,merge_format):title = search + "搜索結(jié)果"worksheet.merge_range('A1:B1', title, merge_format)print('write title success')def write_tag(self,worksheet,name_format):tag_row = 1tag_col = 0for num in self.tag_list:worksheet.write(tag_row,tag_col,num,name_format)tag_col += 1print('write tag success')#寫入內(nèi)容def write_context(self,worksheet,con_dic,normal_format):row = 2for k,v in con_dic.items():if row > len(con_dic):breakcol = 0worksheet.write(row,col,k,normal_format)col+=1worksheet.write(row,col,v,normal_format)row+=1print('write context success')#關(guān)閉exceldef workbook_close(self,workbook):workbook.close()if __name__ == '__main__':print('This is create excel mode')main.py代碼內(nèi)容如下:
#!/bin/env python # -*- coding:utf-8 -*- # @Author : kaliarchimport geturl3 import getexcel3#獲取url字典 def get_dic():blog = geturl3.get_urldic()urllist, search = blog.get_url()html_doc = blog.get_html(urllist)result = blog.get_soup(html_doc)return result,search#寫入excle def write_excle(urldic,search):excle = getexcel3.create_excle()workbook, worksheet = excle.create_workbook(search)excle.col_row(worksheet)merge_format, name_format, normal_format = excle.shell_format(workbook)excle.write_title(worksheet,search,merge_format)excle.write_tag(worksheet,name_format)excle.write_context(worksheet,urldic,normal_format)excle.workbook_close(workbook)def main():url_dic ,search_name = get_dic()write_excle(url_dic,search_name)if __name__ == '__main__':main()三、效果展示
運(yùn)行代碼,填寫搜索的關(guān)鍵字,及搜索多少頁
查看會生成一個以搜索關(guān)鍵字命名的excel,打開寫入的內(nèi)容
利用其就可以搜索并保持自己需要的51CTO推薦博客,可以多搜索幾個
轉(zhuǎn)載于:https://blog.51cto.com/kaliarch/2067103
《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀總結(jié)
以上是生活随笔為你收集整理的利用Python搜索51CTO推荐博客并保存至Excel的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 通过安装scl软件集,使用高版本gcc的
- 下一篇: LeetCode OJ Basic Ca