scrapy的name变量_python-将file_name参数传递给管道以在scrapy中...
我需要從命令行中獲取一個自變量(-a FILE_NAME =“ stuff”),并將其應用于在pipeline.py文件中由我的CSVWriterPipeLine創建的文件. (我之所以使用pipeline.py是因為內置的導出器正在重復數據并在輸出文件中重復標題.相同的代碼,但是在管道中進行寫入修復了它.)
我嘗試從scrapy.utils.project導入get_project_settings中看到
但是我無法從命令行更改文件名.
我還嘗試實現頁面上的@avaleske解決方案,因為它專門解決了這個問題,但是我不知道他談論的代碼在我的scrapy文件夾中的位置.
救命?
settings.py:
BOT_NAME = 'internal_links'
SPIDER_MODULES = ['internal_links.spiders']
NEWSPIDER_MODULE = 'internal_links.spiders'
CLOSESPIDER_PAGECOUNT = 100
ITEM_PIPELINES = ['internal_links.pipelines.CsvWriterPipeline']
# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'internal_links (+http://www.mycompany.com)'
FILE_NAME = "mytestfilename"
pipelines.py:
import csv
class CsvWriterPipeline(object):
def __init__(self, file_name):
header = ["URL"]
self.file_name = file_name
self.csvwriter = csv.writer(open(self.file_name, 'wb'))
self.csvwriter.writerow(header)
def process_item(self, item, internallinkspider):
# build your row to export, then export the row
row = [item['url']]
self.csvwriter.writerow(row)
return item
spider.py:
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from internal_links.items import MyItem
class MySpider(CrawlSpider):
name = 'internallinkspider'
allowed_domains = ['angieslist.com']
start_urls = ['http://www.angieslist.com']
rules = (Rule(SgmlLinkExtractor(), callback='parse_url', follow=True), )
def parse_url(self, response):
item = MyItem()
item['url'] = response.url
return item
總結
以上是生活随笔為你收集整理的scrapy的name变量_python-将file_name参数传递给管道以在scrapy中...的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 东城墙面打磨机前面勾子什么用?
- 下一篇: 民生信用卡去哪里提额