Scrapy学习-24-集成elasticsearch
生活随笔
收集整理的這篇文章主要介紹了
Scrapy学习-24-集成elasticsearch
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
elasticsearch簡單集成到scrapy中
使用elasticsearch的python接口處理數據
?https://github.com/elastic/elasticsearch-dsl-py?
?
elasticsearch-dsl-py官方使用文檔
?http://elasticsearch-dsl.readthedocs.io/en/latest/?
?
創建一個DocType類,類似于item類
# 以獲取jobbole網站的文章為例from datetime import datetime from elasticsearch_dsl import DocType, Date, Nested, Boolean, \analyzer, InnerObjectWrapper, Completion, Keyword, Text, Integerfrom elasticsearch_dsl.connections import connections connections.create_connection(hosts=["localhost"]) # 允許連接至多臺服務器class ArticleType(DocType):#伯樂在線文章類型title = Text(analyzer="ik_max_word")create_date = Date()url = Keyword()url_object_id = Keyword()front_image_url = Keyword()front_image_path = Keyword()praise_nums = Integer()comment_nums = Integer()fav_nums = Integer()tags = Text(analyzer="ik_max_word")content = Text(analyzer="ik_max_word")class Meta:index = "jobbole"doc_type = "article"if __name__ == "__main__":ArticleType.init() # init方法會根據類定義直接生成mapping?
創建一個items類,接收數據
class JobBoleArticleItem(scrapy.Item):title = scrapy.Field()create_date = scrapy.Field(input_processor=MapCompose(date_convert),)url = scrapy.Field()url_object_id = scrapy.Field()front_image_url = scrapy.Field(output_processor=MapCompose(return_value))front_image_path = scrapy.Field()praise_nums = scrapy.Field(input_processor=MapCompose(get_nums))comment_nums = scrapy.Field(input_processor=MapCompose(get_nums))fav_nums = scrapy.Field(input_processor=MapCompose(get_nums))tags = scrapy.Field(input_processor=MapCompose(remove_comment_tags),output_processor=Join(","))content = scrapy.Field()def get_insert_sql(self):insert_sql = """insert into jobbole_article(title, url, create_date, fav_nums)VALUES (%s, %s, %s, %s) ON DUPLICATE KEY UPDATE content=VALUES(fav_nums)"""params = (self["title"], self["url"], self["create_date"], self["fav_nums"])return insert_sql, paramsdef save_to_es(self):article = ArticleType()article.title = self['title']article.create_date = self["create_date"]article.content = remove_tags(self["content"])article.front_image_url = self["front_image_url"]if "front_image_path" in self:article.front_image_path = self["front_image_path"]article.praise_nums = self["praise_nums"]article.fav_nums = self["fav_nums"]article.comment_nums = self["comment_nums"]article.url = self["url"]article.tags = self["tags"]article.meta.id = self["url_object_id"]article.save()return?
創建一個pipeline類,處理elasticsearch數據寫入
from models.es_types import ArticleTypeclass ElasticsearchPipeline(object):def process_item(self, item, spider):item.save_to_es()return item?
配置settings
ITEM_PIPELINES = {'ArticleSpider.pipelines.ElasticsearchPipeline': 1 }?
轉載于:https://www.cnblogs.com/cq146637/p/9090697.html
超強干貨來襲 云風專訪:近40年碼齡,通宵達旦的技術人生總結
以上是生活随笔為你收集整理的Scrapy学习-24-集成elasticsearch的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Elasticsearch 之(24)I
- 下一篇: BZOJ4245 [ONTAK2015]