當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

爬取前程无忧数据，并存入数据库

發布時間：2023/12/14 数据库 35 豆豆

生活随笔收集整理的這篇文章主要介紹了爬取前程无忧数据，并存入数据库小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

代碼如下：

import urllib.request ##請求 import ssl import re import xlwt import pymysqlssl._create_default_https_context = ssl._create_unverified_context##去爬取數據，返回的是HTML頁面的內容 def getContent(name,j):headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4503.5 Safari/537.36",'Connection': 'keep-alive'}j = j+1url = "https://search.51job.com/list/000000,000000,0000,00,9,99,%s,2,%d.html?lang=c&postchannel=0000&workyear=99&cotype=99&degreefrom=99&jobterm=99&companysize=99&ord_field=0&dibiaoid=0&line=&welfare="%(name,j)##請求對象（URL + 請求頭）req = urllib.request.Request(url, headers=headers)##獲取頁面內容page = urllib.request.urlopen(req).read()##對獲取的到內容，設置編碼：防止中文亂碼page = page.decode("GBK")return page##使用正則找出頁面中工作相關的信息 def getItem(content):pattern = re.compile(r'"job_href":"(.+?)","job_name":"(.+?)".+?"company_href":"(.+?)","company_name":"(.+?)","providesalary_text":"(.*?)".+?"workarea_text":"(.*?)","updatedate":"(.*?)".*?"companytype_text":"(.*?)","degreefrom":"(.*?)".*?"attribute_text":(.*?),"companysize_text":"(.*?)",.*?,"companyind_text":"(.*?)".*?')res = re.findall(pattern,content)return res##將找出的信息存儲在Excel表格中 def saveExcel(list):##(2)工作簿wb = xlwt.Workbook()##(3)表sheet = wb.add_sheet("數據分析50")##(4)寫數據：一行一行的寫header = ["公司的名字", "公司的網址","公司類型","公司規模","行業","工作地點","崗位名字", "待遇","崗位詳情", "發布時間","學歷","招聘要求"]##表頭for (i,v) in enumerate(header):sheet.write(0,i,v)##(0崗位詳情，1崗位名字，2公司的網址，3公司的名字，4待遇，5工作地點，6發布時間，7公司類型，8學歷，9招聘要求，10公司規模，11行業)for (i,tuple) in enumerate(list):sheet.write(i + 1, 0, tuple[3])sheet.write(i + 1, 1, tuple[2])sheet.write(i + 1, 2, tuple[7])sheet.write(i + 1, 3, tuple[10])sheet.write(i + 1, 4, tuple[11])sheet.write(i + 1, 5, tuple[5])sheet.write(i + 1, 6, tuple[1])sheet.write(i + 1, 7, tuple[4])sheet.write(i + 1, 8, tuple[0])sheet.write(i + 1, 9, tuple[6])sheet.write(i + 1, 10, tuple[8])sheet.write(i + 1, 11, tuple[9])##保存wb.save("51job2.xls")list=[] name = input("請輸入您想要搜索的行業") for j in range(0,201):print("正在為您查詢第%s頁數據,請不要進行任何操作或退出程序。"%(j+1))aaa = getContent(name,j)content = getItem(aaa)list.extend(content)def saveMysql(list):conn = pymysql.connect(host="localhost",user="root",password="123",database="xmmysql",charset="utf8")cursor = conn.cursor() ##創建游標(新建查詢會話),通過游標執行SQL語句for i in list:sql = "insert into sjfx(name,wz,leix,gm,hy,gzdd,gwmz,dy,gwxq,fbsj,xl,zpyq) values('%s','%s','%s','%s','%s','%s','%s','%s','%s','%s','%s','%s')"%(i[0],i[1],i[2],i[3],i[4],i[5],i[6],i[7],i[8],i[9],i[10],i[11])cursor.execute(sql) ##將SQL語句放入游標中，準備執行conn.commit() ##提交cursor.close()conn.close()# saveExcel(list) # saveMysql(list)

總結

以上是生活随笔為你收集整理的爬取前程无忧数据，并存入数据库的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：艾默生变频器报警PHP,艾默生变频器故障
下一篇： MicroMsg.SDK.WXMsgIm