Selenium实例2-截图爬取漫画
生活随笔
收集整理的這篇文章主要介紹了
Selenium实例2-截图爬取漫画
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
整體思路分三步:
模擬瀏覽器–>截取瀏覽器當前屏幕–>保存漫畫截圖
目標網址
http://www.1kkk.com/ch1000-514226/
(1)獲取瀏覽器(模擬瀏覽器)
def getBrowser(self):broswer = webdriver.PhantomJS()try:broswer.get(self.startUrl)except:print("error url")return broswer(2)打開開發者工具,分析需要爬取的頁碼數,然后找到下一頁
代碼如下
def saveCartoon(self,broswer):#broswer.title.split('_')[0]cartoonTitle = '1漫畫'self.createDir(cartoonTitle)os.chdir(cartoonTitle)#/html/body/div[2]/h1/font/span[2],獲取漫畫頁數sumPage = int(self.broswer.find_element_by_xpath('//font[@class="zf40"]/span[2]').text)i = 1while i<=sumPage:imgName = str(i) + '.png'broswer.get_screenshot_as_file(imgName)i = i+1#自動翻頁NextTag = broswer.find_element_by_id('next')NextTag.click()time.sleep(5)(3)創建目錄函數
def createDir(self,dirName):if os.path.exists(dirName):print("create directory failed")else:try:os.makedirs(dirName)except:print("create directory failed")else:print("create directory failed")(4)爬取的內容如下
完整代碼
from selenium import webdriver import os import timeclass GetCartoon(object):def __init__(self):self.startUrl = 'http://www.1kkk.com/ch1-406302'self.broswer = self.getBrowser()self.saveCartoon(self.broswer)self.broswer.quit()def getBrowser(self):broswer = webdriver.PhantomJS()try:broswer.get(self.startUrl)except:print("error url")return broswerdef saveCartoon(self,broswer):#broswer.title.split('_')[0]cartoonTitle = '1漫畫'self.createDir(cartoonTitle)os.chdir(cartoonTitle)#/html/body/div[2]/h1/font/span[2],獲取漫畫頁數sumPage = int(self.broswer.find_element_by_xpath('//font[@class="zf40"]/span[2]').text)i = 1while i<=sumPage:imgName = str(i) + '.png'broswer.get_screenshot_as_file(imgName)i = i+1#自動翻頁NextTag = broswer.find_element_by_id('next')NextTag.click()time.sleep(5)def createDir(self,dirName):if os.path.exists(dirName):print("create directory failed")else:try:os.makedirs(dirName)except:print("create directory failed")else:print("create directory failed")if __name__ == '__main__':GC = GetCartoon() 《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀總結
以上是生活随笔為你收集整理的Selenium实例2-截图爬取漫画的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Selenium实例1-自动登录小米社区
- 下一篇: Flask-session