當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

Python 爬虫学习系列教程

發(fā)布時(shí)間：2024/7/23 python 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 爬虫学习系列教程小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

Python爬蟲(chóng) --- 中高級(jí)爬蟲(chóng)學(xué)習(xí)路線

：https://www.cnblogs.com/Eeyhan/p/14148832.html

看不清圖時(shí)，可以把圖片保存到本地在打開(kāi)查看。。。

Python爬蟲(chóng)學(xué)習(xí)系列教程

From：https://cuiqingcai.com/1052.html

一、爬蟲(chóng)入門

1.?Python爬蟲(chóng)入門一之綜述

2.?Python爬蟲(chóng)入門二之爬蟲(chóng)基礎(chǔ)了解

3.?Python爬蟲(chóng)入門三之Urllib庫(kù)的基本使用

4.?Python爬蟲(chóng)入門四之Urllib庫(kù)的高級(jí)用法

5.?Python爬蟲(chóng)入門五之URLError異常處理

6.?Python爬蟲(chóng)入門六之Cookie的使用

7.?Python爬蟲(chóng)入門七之正則表達(dá)式

二、爬蟲(chóng)實(shí)戰(zhàn)

1.?Python爬蟲(chóng)實(shí)戰(zhàn)一之爬取糗事百科段子

# -*- coding:utf-8 -*-import requests import re import osclass QSBK(object):def __init__(self):self.__url = r'https://www.qiushibaike.com'self.__head = Noneself.__data = Noneself.__proxy = Nonedef drop_n(self, content):'''去掉換行符和網(wǎng)頁(yè)注釋:param content: html 網(wǎng)頁(yè)內(nèi)容:return: 返回去掉換行符之后的網(wǎng)頁(yè)內(nèi)容'''content = re.sub(r'\n', '', content)content = re.sub(r'', '', content)return contentdef crawl(self):r = requests.get("{0}/hot".format(self.__url))if r.status_code == 200:print("status_code : {0}".format(r.status_code))print r.urlcontent = self.drop_n(r.content)page_num_regex = re.compile(r'<li><span class="current" >(.*?)</span></li>')page_num = re.findall(page_num_regex, content)[0]s = r'<div class="article block untagged mb15.*?>' \r'<div class="author clearfix">' \r'<a .*?>.*?</a><a.*?web-list-author-text.*?><h2>(.*?)</h2></a>' \r'.*?<a href="(.*?)".*?web-list-content.*?><div class="content"><span>(.*?)</span>'# print spattern = re.compile(s)items = re.findall(pattern, content)print u'第 {0} 頁(yè)'.format(page_num)for item in items:print item[0], item[1], item[2]#os.system('pause')raw_input(u'按 Enter鍵繼續(xù)...')next_page_regex = re.compile(r'<ul class="pagination">.*<li><a href="(.*?)".*?><span.*?/span></a></li></ul>')next_page = re.findall(next_page_regex, content)[0]while next_page:next_url = '{0}{1}'.format(self.__url, next_page)r = requests.get(next_url)if r.status_code == 200:print("status_code : {0}".format(r.status_code))print r.urlcontent = self.drop_n(r.content)page_num = re.findall(page_num_regex, content)[0]items = re.findall(pattern, content)print u'第 {0} 頁(yè)'.format(page_num)for item in items:print item[0], item[1], item[2]# os.system('pause')raw_input(u'按 Enter鍵繼續(xù)...')next_page = re.findall(next_page_regex, content)[0]print next_pagepasselse:print("status_code : {0}".format(r.status_code))passif __name__ == "__main__":qsbk = QSBK()qsbk.crawl()pass

運(yùn)行結(jié)果截圖：

2.?Python爬蟲(chóng)實(shí)戰(zhàn)二之爬取百度貼吧帖子

3.?Python爬蟲(chóng)實(shí)戰(zhàn)三之實(shí)現(xiàn)山東大學(xué)無(wú)線網(wǎng)絡(luò)掉線自動(dòng)重連

4.?Python爬蟲(chóng)實(shí)戰(zhàn)四之抓取淘寶MM照片

5.?Python爬蟲(chóng)實(shí)戰(zhàn)五之模擬登錄淘寶并獲取所有訂單

6.?Python爬蟲(chóng)實(shí)戰(zhàn)六之抓取愛(ài)問(wèn)知識(shí)人問(wèn)題并保存至數(shù)據(jù)庫(kù)

7.?Python爬蟲(chóng)實(shí)戰(zhàn)七之計(jì)算大學(xué)本學(xué)期績(jī)點(diǎn)

8.?Python爬蟲(chóng)實(shí)戰(zhàn)八之利用Selenium抓取淘寶匿名旺旺

三、爬蟲(chóng)利器

1.?Python爬蟲(chóng)利器一之Requests庫(kù)的用法

2.?Python爬蟲(chóng)利器二之Beautiful Soup的用法

3.?Python爬蟲(chóng)利器三之Xpath語(yǔ)法與lxml庫(kù)的用法

4.?Python爬蟲(chóng)利器四之PhantomJS的用法

5.?Python爬蟲(chóng)利器五之Selenium的用法

6.?Python爬蟲(chóng)利器六之PyQuery的用法

四、爬蟲(chóng)進(jìn)階

1.?Python爬蟲(chóng)進(jìn)階一之爬蟲(chóng)框架概述

2.?Python爬蟲(chóng)進(jìn)階二之PySpider框架安裝配置

3.?Python爬蟲(chóng)進(jìn)階三之爬蟲(chóng)框架Scrapy安裝配置

4.?Python爬蟲(chóng)進(jìn)階四之PySpider的用法

5.?Python爬蟲(chóng)進(jìn)階五之多線程的用法

6.?Python爬蟲(chóng)進(jìn)階六之多進(jìn)程的用法

7.?Python爬蟲(chóng)進(jìn)階七之設(shè)置ADSL撥號(hào)服務(wù)器代理

《一只小爬蟲(chóng)》

《一只并發(fā)的小爬蟲(chóng)》

《Python與簡(jiǎn)單網(wǎng)絡(luò)爬蟲(chóng)的編寫(xiě)》

《Python寫(xiě)爬蟲(chóng)——抓取網(wǎng)頁(yè)并解析HTML》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（一）：抓取網(wǎng)頁(yè)的含義和URL基本構(gòu)成》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（二）：利用urllib2通過(guò)指定的URL抓取網(wǎng)頁(yè)內(nèi)容》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（三）：異常的處理和HTTP狀態(tài)碼的分類》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（四）：Opener與Handler的介紹和實(shí)例應(yīng)用》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（五）：urllib2的使用細(xì)節(jié)與抓站技巧》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（六）：一個(gè)簡(jiǎn)單的百度貼吧的小爬蟲(chóng)》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（七）：Python中的正則表達(dá)式教程》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（八）：糗事百科的網(wǎng)絡(luò)爬蟲(chóng)（v0.2）源碼及解析》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（九）：百度貼吧的網(wǎng)絡(luò)爬蟲(chóng)（v0.4）源碼及解析》

《[Python]網(wǎng)絡(luò)爬蟲(chóng)（十）：一個(gè)爬蟲(chóng)的誕生全過(guò)程（以山東大學(xué)績(jī)點(diǎn)運(yùn)算為例）》

《用python爬蟲(chóng)抓站的一些技巧總結(jié) zz》

《python爬蟲(chóng)高級(jí)代碼》

總結(jié)

以上是生活随笔為你收集整理的Python 爬虫学习系列教程的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： Linux 动态库和静态库
下一篇：简明Python教程学习笔记_5_解决问

python

Python 爬虫学习 系列教程

Python爬蟲(chóng) --- 中高級(jí)爬蟲(chóng)學(xué)習(xí)路線

Python爬蟲(chóng)學(xué)習(xí)系列教程

一、爬蟲(chóng)入門

二、爬蟲(chóng)實(shí)戰(zhàn)