當前位置：首頁 > 编程语言 > python >内容正文

python

python 异步加载图片_Python 爬取拉钩网异步加载页面

發布時間：2023/12/20 python 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 异步加载图片_Python 爬取拉钩网异步加载页面小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

如下是我簡單的獲取拉鉤網異步加載頁面信息的過程

獲取的是深圳 Python 崗位的所有信息，并保存在Mongo中

(對于異步加載，有的人說是把你要爬頁面的信息整個頁面先爬下來，保存本地，然后再看有沒有你要的東西，有不是異步，沒有就是異步；這種方式當然是沒有任何問題，但是我的判斷方式是，當我點擊頁面某個位置時，頁面的鏈接并沒有變化，而內容卻發生了變化，這種我就說它是異步加載，當然，異步加載方式很多，我們要具體網站具體分析)

這個東西完全可以封裝成類，各司其職(這里就可以延伸到Scrapy框架) 后面會更新一個使用Scrapy框架抓取信息的教程

當然還有selenium+phantomjs

直接上代碼

import requests

import json

import pymongo

headers = {

'Referer':'https://www.lagou.com/jobs/list_Python?px=default&city=%E6%B7%B1%E5%9C%B3',

'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'}

# headers中的Referer參數是必須的，？號之前都是必須的后面可以省略，不會對結果有影響

pagenum = 1

key = 'Python' #這里可以設置一個列表，先抓取頁面所有的技術名稱，保存起來，然后抓取職位信息的時候循環嵌套遍歷

first = 'true'#可以不要，沒發現有什么作用

post_data = {'first': first,'kd':key,'pn':pagenum}

#first:代表是不是首頁，kd:代表關鍵字，pn:代表第幾頁

json_url = 'https://www.lagou.com/jobs/positionAjax.json?px=default&city=%E6%B7%B1%E5%9C%B3&needAddtionalResult=false&isSchoolJob=0'

#獲取json內容

def get_content(post_data):

r = requests.post(json_url,headers=headers,data=post_data)

datas = json.loads(r.text)

return datas['content']

#獲取mongo連接

def get_connect():

client = pymongo.MongoClient('localhost', 27017)

lagou = client['panpan']

lagoudt = lagou['lagou']

return lagoudt

#數據寫入數據庫

def to_mongo(results):

lagou = get_connect()

for result in results:

lagou.insert(

{'positionName' : result['positionName'],

'positionLibles' : ','.join(result['positionLables']),

'workYear' : result['workYear'],

'education': result['education'],

'salary' : result['salary'],

'city' : result['city'],

'financeStage' : result['financeStage'],

'industryField' : result['industryField'],

'createTime' : result['createTime'],

'positionAdvantage' : result['positionAdvantage'],

'companySize' : result['companySize'],

'district' : result['district'],

'companyShortName' : result['companyShortName'],

'companyFullName' : result['companyFullName'],

'firstType' : result['firstType'],

'secondType' : result['secondType'],

'subwayline' : result['subwayline'],

'stationname' : result['stationname'],

'linestaion' : result['linestaion']})

total_page = get_content(post_data)['pageSize'] #總頁數

#循環每一頁的內容

for page in range(1,total_page+1):

first = 'false'

print(page)#記錄當前頁碼

post_data = {'kd':'Python','pn':page}

data = get_content(post_data)

to_mongo(data['positionResult']['result'])

這明細是一個異步加載的例子，我就不多說了，前面有

圖片.png

這個一看就是通過Ajax 實現的異步加載嗎，而且Response里返回的JSon內容就是我們需要的呀，直接取不就行了，話不多說，直接看上面代碼，有疑問的可以給我留言，我也是剛開始學，有問題的地方，請您指正

圖片.png

總結

以上是生活随笔為你收集整理的python 异步加载图片_Python 爬取拉钩网异步加载页面的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Anaconda Python 解决安装
下一篇： GBase 8c 权限说明