pillow生成企业营业执照
生活随笔
收集整理的這篇文章主要介紹了
pillow生成企业营业执照
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
本次項目呢其實也就是為了完成學校的實訓項目,但我覺得好不錯,所以分享出來。那就廢話不多說,直接上。
第一步 爬蟲:企查查電腦端,公司的網址信息
企查查電腦端需要用戶登錄后才能爬取信息,因此事先需要注冊賬號,使用selenium方法模擬登錄
import time import pandas as pd from selenium import webdriver from selenium.webdriver import ActionChainsa = [] def login(driver):driver.delete_all_cookies()url = "https://www.qcc.com/weblogin?back=%2F" #企查查登錄網址driver.get(url)time.sleep(10)# 點擊密碼登入driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[1]/div[2]/a').click()time.sleep(1)# 輸入賬號密碼driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[1]/input').send_keys(username)driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[2]/input').send_keys(password)button = driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[3]/div/div/div[1]/span')#滑動滑塊ActionChains(driver).click_and_hold(button).perform()ActionChains(driver).move_by_offset(xoffset=308, yoffset=0).perform()ActionChains(driver).release().perform()time.sleep(2)driver.find_element_by_xpath('/html/body/div[1]/div[3]/div/div[2]/div[3]/form/div[4]/button/strong').click()# 點擊登錄time.sleep(0.5)模擬登錄進入后,在搜索欄輸入你想要爬取有關信息的公司,我這里輸入的是游戲,
url_a = [#搜索搜索游戲'https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22GD%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22BJ%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22JS%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22SH%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22ZJ%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22SC%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22SD%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&filter=%7B%22rchain%22%3A%5B%7B%22pr%22%3A%22HB%22%7D%5D%7D','https://www.qcc.com/web/search?key=%E6%B8%B8%E6%88%8F&p={}&searchIndex=%7B%22scope%22%3A%22it%22%7D']num = 1for r in url_a:for j in range(1,6):driver.get(r.format(j))for i in range(1,20):d = driver.find_element_by_xpath('/html/body/div[1]/div[2]/div[2]/div[4]/div/div[2]/div/table/tr[{}]/td[3]/div/a[1]'.format(i))print('第{}條----->>>'.format(num),d.get_attribute("href")) #獲取公司網址num += 1a.append(d.get_attribute("href"))time.sleep(5)da = pd.DataFrame(a)da.to_csv('./data.csv') def main():while True:option = webdriver.ChromeOptions()option.add_experimental_option('excludeSwitches', ['enable-automation']) # webdriver防檢測option.add_argument("--disable-blink-features=AutomationControlled")option.add_argument("--no-sandbox")option.add_argument("--disable-dev-usage")option.add_experimental_option("prefs", {"profile.managed_default_content_settings.img_zeng": 2})driver = webdriver.Chrome(executable_path=r"D:\chrome\chromedriver.exe",options=option)driver.set_page_load_timeout(15)login(driver)if __name__ == '__main__':username = '13101351758'#用戶名password = 'chhlh0911'#密碼headers = {#請求頭'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'}main()以上是爬取公司網址的代碼,部分結果如下:
接下來是爬取上列網址的公司的信息
import requests import time from lxml import etree import pandas as pd import csv base_url = 'https://www.qcc.com/web/search?key='#搜索欄網址 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0' } data = pd.read_csv('./data.csv',header=None) data.columns=['id','url'] a = [] f = open('企查查.csv','w',newline='') writer = csv.writer(f) title = ['統一社會信用代碼','名稱','類型','所屬地區','法定代表人','注冊資本','成立日期','營業期限','經營范圍'] def getCompanyData(url,num):try:response = requests.get(url, headers=headers)response.encoding="utf-8"html = etree.HTML(response.text)try:shehui_xinyong=html.xpath('//*[@id="cominfo"]/div[2]/table/tr[1]/td[2]/text()')[0]#信用代碼except:shehui_xinyong = Nonetry:gongsi_name = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[1]/td[4]/text()')[0]#名稱except:gongsi_name = Nonetry:qiyeleix = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[5]/td[2]/text()')[0]#企業類型except:qiyeleix = Nonetry:suoshudiqu = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[6]/td[4]/text()')[0]#所屬地區except:suoshudiqu = Nonetry:fadingdaibiaoren = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[2]/td[2]/div/span[2]/span/a/text()')[0]#法定代表人except:fadingdaibiaoren = Nonetry:zhuceziben = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[3]/td[2]/text()')[0]#注冊資本except:zhuceziben = Nonetry:chengliriqi = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[2]/td[6]/text()')[0]#成立日期except:chengliriqi = Nonetry:yingyeqixian = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[5]/td[4]/text()')[0]#營業期限except:yingyeqixian = Nonetry:jingyingfanwei = html.xpath('//*[@id="cominfo"]/div[2]/table/tr[10]/td[2]/text()')[0]#經營范圍except:jingyingfanwei = Nonewriter.writerow([shehui_xinyong,gongsi_name,fadingdaibiaoren,zhuceziben,qiyeleix,yingyeqixian,suoshudiqu,jingyingfanwei,chengliriqi])print('第{}條------->>>'.format(num), gongsi_name)num += 1time.sleep(30)except:time.sleep(10)print('錯誤')pass for j,i in enumerate(data.url):getCompanyData(i,j+1) f.close() # print('successlly')爬取部分結果如下:
?我當時保存的方式沒有設置正確,后來實在不知如何改進,若有知道的伙伴可以告知。
為了完成后續的任務,便從同學那里要來了正確的信息表,展示部分信息如下:
?以上爬蟲部分的任務已經完成,接下來就是生成營業執照
第二步 生成企業營業執照
將爬取的信息,通過模板的位置定位,將信息填進正確的位置
模板照片如下:
?
# coding:utf-8 from PIL import Image, ImageDraw, ImageFont import pandas as pd word_size = 18 # 文字大小 font = ImageFont.truetype("./simsun.ttc",25) font_1 = ImageFont.truetype("./simsun.ttc",20) data = pd.read_csv('./data_qiye.csv',encoding='gbk')for i in range(0,120):im1 = Image.open('./model.jpg')draw = ImageDraw.Draw(im1)strs =data.loc[i]draw.text((650, 568), strs['統一社會信用代碼'], (0, 0, 0), font=font_1)for id,_p in enumerate(['企業名稱','企業類型','注冊地址','法定代表人','注冊資本','成立日期','營業期限']):print(strs[_p])draw.text((345,640+id*40),strs[_p], (0, 0, 0),font=font)len_d = len(strs['經營范圍'])for lo in range(0,int(len_d/30)+1):try:draw.text((345,925+lo*33), strs['經營范圍'][lo*30:lo*30+30], (0, 0, 0), font=font_1)except:passif lo==7:breakim1.save('./imgs/{}.jpg'.format(strs['統一社會信用代碼']))最后生成的圖大致如下:
?在位置確定上還有不足,所以信息沒有與前面對齊,由于我的能力還不算太好,試了好幾次還是不行,所以最后放棄。不過所有代碼基本就是這樣,需要改進的直接在上面改就行。
數據集上傳在資源里面 data_qiye.csv
總結
以上是生活随笔為你收集整理的pillow生成企业营业执照的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java 中文转拼音_java中文转拼音
- 下一篇: android录屏gif,gif录屏软件