生活随笔
收集整理的這篇文章主要介紹了
爬虫项目(二)---采集从03月02号以来的世界各国疫情数据
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
該內(nèi)容出自黑馬程序員教程
采集從03月02號以來的世界各國疫情數(shù)據(jù)
步驟:
Ⅰ,重構(gòu)項目(一)的代碼,以提高擴展性
把功能封裝到一個類中每一個小功能變成一個方法通過run方法啟動爬蟲
import requests
import re
import json
from bs4
import BeautifulSoup
class CoronaSpider(object):def __init__(self
):self
.home_url
= 'https://ncov.dxy.cn/ncovh5/view/pneumonia'def get_content_from_url(self
,url
):response
= requests
.get
(url
)return response
.content
.decode
()def parse_home_page(self
,home_page
): soup
= BeautifulSoup
(home_page
,'lxml')script
= soup
.find
(id='getListByCountryTypeService2true')text
= script
.textjson_str
= re
.findall
(r'\[.+\]',text
)[0]data
= json
.loads
(json_str
)return data
def save(self
,data
,path
):with open(path
,'w') as fp
:json
.dump
(data
,fp
,ensure_ascii
=False)def crawl_last_day(self
):home_page
= self
.get_content_from_url
(self
.home_url
)last_data
= self
.parse_home_page
(home_page
)self
.save
(last_data
,'E:\Jupyter_workspace\study\python\爬蟲\last_day_nature_num1.json')def run(self
):self
.crawl_last_day
()if __name__
== '__main__':spider
= CoronaSpider
()spider
.run
()
很顯然以達成原樣的效果,重構(gòu)完成
Ⅱ,實現(xiàn)采集從03月02號以來的世界各國疫情數(shù)據(jù)
加載最近一日各國疫情數(shù)據(jù)
遍歷各國疫情數(shù)據(jù),獲取從03月02號以來的世界各國疫情的URL
發(fā)送請求,獲取從03月02號以來的世界各國疫情的json字符串
解析各個國家疫情的json字符串,轉(zhuǎn)化為Python類型數(shù)據(jù),添加到列表中
將該列表以json格式保存從03月02號以來的世界各國疫情數(shù)據(jù)信息
import requests
import re
import json
from bs4
import BeautifulSoup
from tqdm
import tqdm
class CoronaSpider(object):def __init__(self
):self
.home_url
= 'https://ncov.dxy.cn/ncovh5/view/pneumonia'def get_content_from_url(self
,url
):response
= requests
.get
(url
)return response
.content
.decode
()def parse_home_page(self
,home_page
): soup
= BeautifulSoup
(home_page
,'lxml')script
= soup
.find
(id='getListByCountryTypeService2true')text
= script
.textjson_str
= re
.findall
(r'\[.+\]',text
)[0]data
= json
.loads
(json_str
)return data
def save(self
,data
,path
):with open(path
,'w') as fp
:json
.dump
(data
,fp
)'''def save(self,data):#5,以json格式保存最近一日各國疫情數(shù)據(jù)with open('yy1.json','w') as fp:json.dump(data,fp)#,ensure_ascii=False這個ensure_ascii是dump方法的屬性控制編碼,為了顯示漢字,我這邊運行一直保存,沒法用這個編碼格式,故這里去掉了,當然要是顯示漢字,可以加上'''def crawl_last_day_corona_virus(self
):home_page
= self
.get_content_from_url
(self
.home_url
)last_data_corona_virus
= self
.parse_home_page
(home_page
)self
.save
(last_data_corona_virus
,'E:\Jupyter_workspace\study\python\爬蟲\last_day_nature_num111.json')def crawl_corona_virus(self
):with open('E:\Jupyter_workspace\study\python\爬蟲\last_day_nature_num111.json') as fp
:last_day_corona_virus
= json
.load
(fp
)corona_virus
= []for country
in tqdm
(last_day_corona_virus
,'獲取從01月23號以來的世界各國疫情信息'):statustics_data_url
= country
['statisticsData']statustics_data_json_str
= self
.get_content_from_url
(statustics_data_url
)statustics_data
= json
.loads
(statustics_data_json_str
)['data']for one_day
in statustics_data
:one_day
['provinceName'] = country
['provinceName']one_day
['countryShortCode'] = country
['countryShortCode']corona_virus
.extend
(statustics_data
)self
.save
(corona_virus
,'E:\Jupyter_workspace\study\python\爬蟲\corona_virus.json')def run(self
):self
.crawl_corona_virus
()if __name__
== '__main__':spider
= CoronaSpider
()spider
.run
()
這個json文件是各國的疫情數(shù)據(jù):https://file1.dxycdn.com/2020/0315/831/3402160489185731552-135.json
總結(jié)
以上是生活随笔為你收集整理的爬虫项目(二)---采集从03月02号以来的世界各国疫情数据的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔推薦給好友。