百度迁徙数据爬取 生成excel数据
生活随笔
收集整理的這篇文章主要介紹了
百度迁徙数据爬取 生成excel数据
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
百度遷徙爬蟲
- 一、原由
- 二、部分代碼
- 三、效果展示
- 四、可執行.exe 下載鏈接
一、原由
學校表白墻有償爬取百度遷徙數據,就拿下了。
根據情況生成三個excel文件爬取每天的數據信息
二、部分代碼
def JsonTextConvert(text):text = text.encode('utf-8').decode('unicode_escape')head, sep, tail = text.partition('(')tail=tail.replace(")","")return taildef UrlFormate(rankMethod, dt, name, migrationType, date):list_date = list(date)list_date.insert(4, '-')list_date.insert(7, '-')formatDate = ''.join(list_date)formatDate = formatDate + " 00:00:00"timeArray = time.strptime(formatDate, "%Y-%m-%d %H:%M:%S")timeUnix = time.mktime(timeArray)ID = code[name]if migrationType == 'in' or migrationType == 'out' or rankMethod == 'historycurve':url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&type=move_{3}&date={4}&callback=jsonp_{5}000_0000000'.format(rankMethod, dt, ID, migrationType, date, int(timeUnix))elif rankMethod == 'internalflowhistory':url = 'http://huiyan.baidu.com/migration/{0}.jsonp?dt={1}&id={2}&date={3}&callback=jsonp_{4}000_0000000'.format(rankMethod, dt, ID, date, int(timeUnix))return urldef GetData(cityName, moveType, date, rankMethod):# historycurve 'cityrank'response = requests.get(UrlFormate(rankMethod, 'city', cityName, moveType, date), timeout=10)text = response.textrawData = json.loads(JsonTextConvert(text))if rawData['errno'] == 501:return 501data = rawData['data']list = data['list']return listdef write_Excel(data, data_time, move_type):name = 'd:/qianxi/'+date_constant+'/'+data_time+"_"+move_type+".xlsx"app = xw.App(visible=True, add_book=False)wb = app.books.add()sht = wb.sheets['sheet1']sht.range('A1').options(expand='table').value = dataprint(sht.range('A1').value)wb.save(name)# 退出工作簿wb.close()# 推出excelapp.quit()returndef function_cityrank(rankMethod, type, date_time):result = []type_name = ['']for i in code:type_name.append(i)result.append(type_name)for a in code:list_data = {}list_name = []list_name.append(a)# historycurve 'cityrank'tags = GetData(a, type, date_time, rankMethod)if tags == 501:return 501for tag in tags:list_data[tag['city_name']] = tag['value']for i in code:if i in list_data:list_name.append(list_data[i])else:list_name.append(0)result.append(list_name)print(result)write_Excel(result, date_time, type)returndef function_historycurve(date_time):result = []type_name = ['city_name', 'move_in', 'move_out', 'internal']result.append(type_name) # http://huiyan.baidu.com/migration/internalflowhistory.jsonp?dt=city&id=440100&date=20201114&callback=jsonp_1605340876623_8581344 # cityName, moveType, date, rankMethodlist_data = {}for a in code:list_name = []list_name.append(a)# internalflowhistorytags_in = GetData(a, 'in', date_time, 'historycurve')tags_out = GetData(a, 'out', date_time, 'historycurve')tips = GetData(a, type, date_time, 'internalflowhistory')if date_time in tags_in:list_name.append(tags_in[date_time])else:list_name.append(0)if date_time in tags_in:list_name.append(tags_out[date_time])else:list_name.append(0)if date_time in tags_in:list_name.append(tips[date_time])else:list_name.append(0)result.append(list_name)print(result)write_Excel(result, date_time, "規模")returndef create_document(date_time):tag = function_cityrank('cityrank', 'in', date_time)if tag == 501:print('查無該日期信息')return 501print('in_excel 已經生成')function_cityrank('cityrank', 'out', date_time)print('out_excel 已經生成')function_historycurve(date_time)print('guimo_excel 已生成')return三、效果展示
在D:目錄下,生成qianxi文件夾,進而生成日期文件夾及三個需求excel
xxxx_in.xlsx
規模.xlsx
四、可執行.exe 下載鏈接
總結
以上是生活随笔為你收集整理的百度迁徙数据爬取 生成excel数据的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 安全:incaseformat蠕虫病毒来
- 下一篇: window.addeventliste