生活随笔
收集整理的這篇文章主要介紹了
丑憨批的爬虫笔记4BeautifulSoup4
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
pip install beautifulsoup4
https://python123.io/ws/demo.html
使用方法
參數:1.html信息2.解析器
import requests
r
= requests
.get
('https://python123.io/ws/demo.html')
demo
=r
.text
from bs4
import BeautifulSoup
soup
= BeautifulSoup
(demo
,'html.parser')
print(soup
.prettify
())
import requests
r
= requests
.get
('https://python123.io/ws/demo.html')
demo
=r
.text
from bs4
import BeautifulSoup
soup
= BeautifulSoup
(demo
,'html.parser')
soup
.title
tag
=soup
.a
soup
.a
.name
soup
.a
.parent
.name
soup
.a
.parent
.parent
.name
tag
.attrs
tag
.attrs
['class']
tag
.attrs
['href']
type(tag
.attrs
)
type(tag
)
soup
.a
.string
soup
.p
soup
.p
.string
type(soup
.p
.string
)
無法處理注釋
summary
基于bs4庫的HTML遍歷方法
.contents用法
import requests
r
= requests
.get
('https://python123.io/ws/demo.html')
demo
=r
.text
from bs4
import BeautifulSoup
soup
= BeautifulSoup
(demo
,'html.parser')
soup
.head
soup
.head
.contents
soup
.body
.contents
len(soup
.body
.contents
)
soup
.body
.contents
[1]
遍歷兒子節點
上行遍歷
import requests
r
= requests
.get
('https://python123.io/ws/demo.html')
demo
=r
.text
from bs4
import BeautifulSoup
soup
= BeautifulSoup
(demo
,'html.parser')
soup
.title
.parent
soup
.html
.parent
soup
.parent
平行遍歷
標簽樹平行遍歷有條件
import requests
r
= requests
.get
('https://python123.io/ws/demo.html')
demo
=r
.text
from bs4
import BeautifulSoup
soup
= BeautifulSoup
(demo
,'html.parser')
soup
.a
.next_sibling
soup
.a
.next_sibling
.next_sibling
soup
.a
.previous_sibling
soup
.a
.previous_sibling
.previous_sibling
summary
格式化和編碼
.prettify
import requests
r
= requests
.get
('https://python123.io/ws/demo.html')
demo
=r
.text
from bs4
import BeautifulSoup
soup
= BeautifulSoup
(demo
,'html.parser')
print(soup
.prettify
())
print(soup
.a
.prettify
())
編碼
無
**
summary
**
標簽tag
標簽名Name
標簽屬性Attributes
標簽間字符串NavigableDtring
注釋的字符串Comment
總結
以上是生活随笔為你收集整理的丑憨批的爬虫笔记4BeautifulSoup4的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。