爬虫实战:爬虫加数据分析,重庆电气小哥一文带你分析重庆所有旅游景点
寒假已經(jīng)到了,玩是要玩的,作為一個(gè)地地道道的重慶電網(wǎng)小哥,今天想用python爬蟲+數(shù)據(jù)分析 的方式告訴你重慶哪些地方好玩。
先上一張最后景區(qū)地點(diǎn)分布結(jié)果圖
數(shù)據(jù)來(lái)源:去哪兒旅行
網(wǎng)址:去哪兒旅行-重慶
用request請(qǐng)求到j(luò)son數(shù)據(jù)
第一部分:爬蟲
數(shù)據(jù)搜索:小試牛刀
import requests keyword = "重慶" page=1#打印第一頁(yè)headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36"} url = f'http://piao.qunar.com/ticket/list.json?keyword={keyword}®ion=&from=mpl_search_suggest&page={page}' res = requests.request("GET", url, headers=headers)try:res_json = res.json()data = res_json['data']print(data) except:pass結(jié)果
json返回的數(shù)據(jù)格式是字典型,我們需要從中找到我感興趣的關(guān)鍵詞
搜索結(jié)果
發(fā)現(xiàn)我們感興趣的是sightList
于是可以修改代碼為
import requests keyword = "重慶" page=1headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36"} url = f'http://piao.qunar.com/ticket/list.json?keyword={keyword}®ion=&from=mpl_search_suggest&page={page}' res = requests.request("GET", url, headers=headers)res_json = res.json() sightLists = res_json['data']['sightList']#sightList是感興趣的 for sight in sightLists:print(sight)再次提取信息,修改代碼為
import requests import pandas as pd keyword = "重慶" page=1#查看第一頁(yè)headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36"} url = f'http://piao.qunar.com/ticket/list.json?keyword={keyword}®ion=&from=mpl_search_suggest&page={page}' res = requests.request("GET", url, headers=headers)res_json = res.json() sightLists = res_json['data']['sightList']#sightList是感興趣的 for sight in sightLists:name=(sight['sightName'] if 'sightName' in sight.keys() else None)#名稱districts=(sight['districts'] if 'districts' in sight.keys() else None)#地址star=(sight['star'] if 'star' in sight.keys() else None) #星級(jí)qunarPrice=(sight['qunarPrice'] if 'qunarPrice' in sight.keys() else None)#最低價(jià)格saleCount=(sight['saleCount'] if 'saleCount' in sight.keys() else None)#購(gòu)買人數(shù)score=(sight['score'] if 'score' in sight.keys() else None )#評(píng)分point=(sight['point'] if 'point' in sight.keys() else None )#坐標(biāo)位置intro=(sight['intro'] if 'intro' in sight.keys() else None)#介紹print('名稱:{0},地址:{1},星級(jí):{2},價(jià)格:{3},saleCount:{4},評(píng)分:{5},坐標(biāo):{6},介紹:{7}'.format(name,districts,star,qunarPrice,saleCount,score,point,intro))我們需要將數(shù)據(jù)寫入表格。
import requests import pandas as pd import numpy as np keyword = "重慶" page=1#查看第一頁(yè)headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3947.100 Safari/537.36"} url = f'http://piao.qunar.com/ticket/list.json?keyword={keyword}®ion=&from=mpl_search_suggest&page={page}' res = requests.request("GET", url, headers=headers)res_json = res.json() sightLists = res_json['data']['sightList']#sightList是感興趣的 for sight in sightLists:name=(sight['sightName'] if 'sightName' in sight.keys() else None)#名稱districts=(sight['districts'] if 'districts' in sight.keys() else None)#地址star=(sight['star'] if 'star' in sight.keys() else None) #星級(jí)qunarPrice=(sight['qunarPrice'] if 'qunarPrice' in sight.keys() else None)#最低價(jià)格saleCount=(sight['saleCount'] if 'saleCount' in sight.keys() else None)#購(gòu)買人數(shù)score=(sight['score'] if 'score' in sight.keys() else None )#評(píng)分point=(sight['point'] if 'point' in sight.keys() else None )#坐標(biāo)位置intro=(sight['intro'] if 'intro' in sight.keys() else None)#介紹#print('名稱:{0},地址:{1},星級(jí):{2},價(jià)格:{3},saleCount:{4},評(píng)分:{5},坐標(biāo):{6},介紹:{7}'.format(name,districts,star,qunarPrice,saleCount,score,point,intro))shuju=np.array((name,districts,star,qunarPrice,saleCount,score,point,intro))shuju=shuju.reshape(-1,8)shuju=pd.DataFrame(shuju,columns=['名稱','地址','星級(jí)','最低價(jià)格','購(gòu)買人數(shù)','評(píng)分','坐標(biāo)位置','介紹'])#print(shuju)shuju.to_csv('重慶景點(diǎn)數(shù)據(jù).csv', mode='a+', index=False,header=False) # mode='a+'追加寫入多頁(yè)爬取
前面以一頁(yè)數(shù)據(jù)為例,整理出啦大概代碼,現(xiàn)在需要爬取多頁(yè)
第二部分:數(shù)據(jù)分析
前面我們爬取了數(shù)據(jù),現(xiàn)在來(lái)分析下。
1.讀取數(shù)據(jù)
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns plt.rcParams['font.sans-serif'] = ['SimHei'] # 設(shè)置加載的字體名 plt.rcParams['axes.unicode_minus'] = False#df=pd.read_csv('重慶景點(diǎn)數(shù)據(jù).csv',header=None,names=list(['名稱', '地址', '星級(jí)', '最低價(jià)格', '購(gòu)買人數(shù)', '評(píng)分', '坐標(biāo)位置', '介紹'])) df = df.drop_duplicates()#刪除重復(fù)數(shù)據(jù)。得到470行數(shù)據(jù) print(df.head())去除重復(fù)數(shù)據(jù)后,得到重慶有470處景點(diǎn)
2.景點(diǎn)價(jià)格分析
最高Top20
最低Top20
df_qunarPrice = df.pivot_table(index='名稱',values='最低價(jià)格') df_qunarPrice.sort_values('最低價(jià)格',inplace=True,ascending=True) #print(df_qunarPrice[:20])#最高價(jià)格top20 df_qunarPrice[:20].plot(kind='barh') plt.title('最低Top20') plt.show()3.景點(diǎn)評(píng)分分析
評(píng)分最高Top20
沒(méi)有評(píng)分(可能是網(wǎng)站還未收錄該地方評(píng)分吧…)
4.月銷售額分析
最高Top20
最低Top20(可能未收錄該地方數(shù)據(jù)把,可能該地方免費(fèi)吧)
5.景點(diǎn)等級(jí)分布
from pyecharts.charts import * from pyecharts import options as opts from pyecharts.globals import ThemeTypedf_star = df["星級(jí)"].value_counts() df_star = df_star.sort_values(ascending=False) print(df_star)查找有等級(jí)的景點(diǎn)名稱,即3星級(jí)及其以上
print(df[df["星級(jí)"]!='無(wú)'].sort_values("星級(jí)",ascending=False)['名稱'])展示部分圖,太多啦
6.景點(diǎn)地址地圖繪圖
先保存文本地文件
繪制地圖
import pandas as pdstations = pd.read_csv('data重慶.csv',delimiter=',')from pyecharts.charts import Geo from pyecharts import options from pyecharts.globals import GeoTypeg = Geo().add_schema(maptype="重慶")# 給所有點(diǎn)附上標(biāo)簽 'StationID' for i in stations.index:s = stations.iloc[i]g.add_coordinate(s['名稱'],s['lon'],s['lat'])#地區(qū)名稱,經(jīng)度,緯度# 給每個(gè)點(diǎn)的值賦為 1 data_pair = [(stations.iloc[i]['名稱'],1) for i in stations.index]# 畫圖 g.add('',data_pair, type_=GeoType.EFFECT_SCATTER, symbol_size=2) g.set_series_opts(label_opts=options.LabelOpts(is_show=False)) g.set_global_opts(title_opts=options.TitleOpts(title="重慶景點(diǎn)分布圖by-yudengwu"))# 保存結(jié)果到 html result = g.render('stations.html') 主城區(qū)那邊好玩的多作者:電氣-余登武。寫作屬實(shí)不易,如果你覺(jué)得很好,動(dòng)個(gè)手點(diǎn)個(gè)贊再走。
總結(jié)
以上是生活随笔為你收集整理的爬虫实战:爬虫加数据分析,重庆电气小哥一文带你分析重庆所有旅游景点的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 爬虫笔记:pyquery详解
- 下一篇: 芋头从什么时候开始压叉呀?