當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

爬虫笔记（一）

發布時間：2025/3/19 编程问答 11 豆豆

生活随笔收集整理的這篇文章主要介紹了爬虫笔记（一）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.安裝 bs4

pip install bs4

2.安裝Selenium

pip install Selenium

3.安裝瀏覽器驅動
Chrome驅動文件下載：Chrome
Firefox驅動文件下載:Firefox

下載后將其中的執行程序放到python的script文件夾中，就可以了。

爬取動態網頁內容：

4.使用Selenium

from selenium import webdriver from time import sleep #executable_path 不能少，后邊就是上面安裝的驅動的位置 driver = webdriver.Firefox(executable_path=r'C:\Users\Administrator\python39\Scripts\geckodriver.exe')driver.get("http://www.santostang.com/2018/07/04/hello-world/")

5.爬取第一個評論

from selenium import webdriver from time import sleepdriver = webdriver.Firefox(executable_path=r'C:\Users\Administrator\python39\Scripts\geckodriver.exe') driver.get("http://www.santostang.com/2018/07/04/hello-world/") driver.switch_to.frame(driver.find_element_by_css_selector("iframe[title='livere-comment']")) # 使用CSS選擇器查找元素,找到class為'reply-content'的div元素 comment = driver.find_element_by_css_selector('div.reply-content')# 通過元素的tag去尋找‘p’元素 content = comment.find_element_by_tag_name('p') print(content.text)

這是因為源代碼需要解析成一個iframe，則需要添加如下一行代碼。
，driver.switch_to.frame(driver.find_element_by_css_selector("iframe[title='livere']"))
==title=‘livere’==是頁面中的，所有不同頁面可能不同。

6.爬取一頁中的所有評論，即多個元素
在element后面加上s

comments = driver.find_elements_by_css_selector('div.reply-content')for eachcomment in comments:content = eachcomment.find_element_by_tag_name('p')print(content.text)

7.另一種方法

from selenium import webdriver from selenium.webdriver.firefox.firefox_binary import FirefoxBinary import time# 通過selenium啟動博客網站 caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True #不能是False binary = FirefoxBinary(r'C:\Program Files\Mozilla Firefox\firefox.exe')driver = webdriver.Firefox(firefox_binary=binary, capabilities=caps) driver.get("http://www.santostang.com/2018/07/04/hello-world/") driver.switch_to.frame(driver.find_element_by_css_selector("iframe[title='livere-comment']"))comments = driver.find_elements_by_css_selector('div.reply-content') for eachcomment in comments:content = eachcomment.find_element_by_tag_name('p')print(content.text)

8.查看網頁中的內容
1.右鍵點擊檢查

（1）左上角的箭頭，點擊一次后可以去點擊網頁上你想要查看的部分，就會移動到該部分的代碼。
（2）在network中的ALL部分，一般在第一個是網頁的配置，在header中可以找的網頁的 agent 和 host 這兩個重要信息。

與50位技術專家面對面20年技術見證，附贈技術全景圖

總結

以上是生活随笔為你收集整理的爬虫笔记（一）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。