當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

11_简书业务分析

發(fā)布時間：2024/1/1 编程问答 21 豆豆

生活随笔收集整理的這篇文章主要介紹了 11_简书业务分析小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

簡書結(jié)構(gòu)分析
創(chuàng)建簡書爬蟲項目
創(chuàng)建crawl解析器
配置簡書下載格式

博文配套視頻課程：24小時實現(xiàn)從零到AI人工智能

簡書結(jié)構(gòu)分析

創(chuàng)建簡書爬蟲項目

C:\Users\Administrator\Desktop>scrapy startproject jianshu New Scrapy project 'jianshu', using template directory 'd:\anaconda3\lib\site-packages\scrapy\templates\project', created in:C:\Users\Administrator\Desktop\jianshuYou can start your first spider with:cd jianshuscrapy genspider example example.com

創(chuàng)建crawl解析器

之前創(chuàng)建的spider解析器采用都是basic模板，這次爬蟲是要下載簡書文章，需要支持正則表達(dá)式匹配，因此建議采用crawl模板來創(chuàng)建spider解析器

C:\Users\Administrator\Desktop>cd jianshuC:\Users\Administrator\Desktop\jianshu>scrapy genspider -t crawl jianshu_spider jianshu.com Created spider 'jianshu_spider' using template 'crawl' in module:jianshu.spiders.jianshu_spider

配置簡書下載格式

# -*- coding: utf-8 -*- import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Ruleclass JianshuSpiderSpider(CrawlSpider):name = 'jianshu_spider'allowed_domains = ['jianshu.com']start_urls = ['https://www.jianshu.com/']# 可以指定爬蟲抓取的規(guī)則，支持正則表達(dá)式# https://www.jianshu.com/p/df7cad4eb8d8# https://www.jianshu.com/p/07b0456cbadb?*****# https://www.jianshu.com/p/.*rules = (Rule(LinkExtractor(allow=r'https://www.jianshu.com/p/[0-9a-z]{12}.*'), callback='parse_item', follow=True),)# name = title = url = collection = scrapy.Field()def parse_item(self, response):print(response.text)

總結(jié)

以上是生活随笔為你收集整理的11_简书业务分析的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

业务

上一篇：这些单晶XRD测试问题你了解吗？（二）
下一篇：电话机器人的技术分析