Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令
1. 有些scrapy命令,只有在scrapy project根目錄下才available,比如crawl命令
2 . scrapy genspider taobao http://detail.tmall.com/item.htm?id=12577759834
自動(dòng)在spider目錄下生成taobao.py
#?-*-?coding:?utf-8?-*- import?scrapyclass?TaobaoSpider(scrapy.Spider):name?=?"taobao"allowed_domains?=?["http://detail.tmall.com/item.htm?id=12577759834"]start_urls?=?('http://www.http://detail.tmall.com/item.htm?id=12577759834/',)def?parse(self,?response):pass還有其它模板可以用
scrapy genspider taobao2 http://detail.tmall.com/item.htm?id=12577759834 ?--template=crawl
#?-*-?coding:?utf-8?-*- import?scrapy from?scrapy.contrib.linkextractors?import?LinkExtractor from?scrapy.contrib.spiders?import?CrawlSpider,?Rulefrom?project004.items?import?Project004Itemclass?Taobao2Spider(CrawlSpider):name?=?'taobao2'allowed_domains?=?['http://detail.tmall.com/item.htm?id=12577759834']start_urls?=?['http://www.http://detail.tmall.com/item.htm?id=12577759834/']rules?=?(Rule(LinkExtractor(allow=r'Items/'),?callback='parse_item',?follow=True),)def?parse_item(self,?response):i?=?Project004Item()#i['domain_id']?=?response.xpath('//input[@id="sid"]/@value').extract()#i['name']?=?response.xpath('//div[@id="name"]').extract()#i['description']?=?response.xpath('//div[@id="description"]').extract()return?i3.列出當(dāng)前項(xiàng)目所有spider:?scrapy list
4.view命令在瀏覽器中查看網(wǎng)頁內(nèi)容
? ?scrapy view http://www.example.com/some/page.html
5.查看設(shè)置
scrapy settings --get BOT_NAME
6.運(yùn)行自包含的spider,不需要?jiǎng)?chuàng)建項(xiàng)目
scrapy runspider <spider_file.py>
7.scrapy project的部署:?scrapy deploy?
部署spider首先要有spider的server環(huán)境,一般使用scrapyd
安裝scrapyd:pip install scrapyd
文檔:http://scrapyd.readthedocs.org/en/latest/install.html
8.所有可用命令
C:\Users\IBM_ADMIN\PycharmProjects\pycrawl\project004>scrapy
Scrapy 0.24.4 - project: project004
Usage:
? scrapy <command> [options] [args]
Available commands:
? bench ? ? ? ? Run quick benchmark test
? check ? ? ? ? Check spider contracts
? crawl ? ? ? ? Run a spider
? deploy ? ? ? ?Deploy project in Scrapyd target
? edit ? ? ? ? ?Edit spider
? fetch ? ? ? ? Fetch a URL using the Scrapy downloader
? genspider ? ? Generate new spider using pre-defined templates
? list ? ? ? ? ?List available spiders
? parse ? ? ? ? Parse URL (using its spider) and print the results
? runspider ? ? Run a self-contained spider (without creating a project)
? settings ? ? ?Get settings values
? shell ? ? ? ? Interactive scraping console
? startproject ?Create new project
? version ? ? ? Print Scrapy version
? view ? ? ? ? ?Open URL in browser, as seen by Scrapy
轉(zhuǎn)載于:https://blog.51cto.com/dingbo/1600296
《新程序員》:云原生和全面數(shù)字化實(shí)踐50位技術(shù)專家共同創(chuàng)作,文字、視頻、音頻交互閱讀總結(jié)
以上是生活随笔為你收集整理的Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Eclipse上安装GIT插件EGit及
- 下一篇: python html parse