當前位置：首頁 > 编程语言 > python >内容正文

python

elasticsearch in查询_Python Elasticsearch DSL 查询、过滤、聚合操作实例

發布時間：2024/1/23 python 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 elasticsearch in查询_Python Elasticsearch DSL 查询、过滤、聚合操作实例小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

技術博客： https://github.com/yongxinz/tech-blog

同時，也歡迎關注我的微信公眾號 AlwaysBeta，更多精彩內容等你來。

Elasticsearch 基本概念

Index：Elasticsearch用來存儲數據的邏輯區域，它類似于關系型數據庫中的database 概念。一個index可以在一個或者多個shard上面，同時一個shard也可能會有多個replicas。

Document：Elasticsearch里面存儲的實體數據，類似于關系數據中一個table里面的一行數據。

document由多個field組成，不同的document里面同名的field一定具有相同的類型。document里面field可以重復出現，也就是一個field會有多個值，即multivalued。

Document type：為了查詢需要，一個index可能會有多種document，也就是document type. 它類似于關系型數據庫中的 table 概念。但需要注意，不同document里面同名的field一定要是相同類型的。

Mapping：它類似于關系型數據庫中的 schema 定義概念。存儲field的相關映射信息，不同document type會有不同的mapping。

Python Elasticsearch DSL 使用簡介

連接 Es：

import elasticsearches = elasticsearch.Elasticsearch([{'host': '127.0.0.1', 'port': 9200}])

先看一下搜索，q 是指搜索內容，空格對 q 查詢結果沒有影響，size 指定個數，from_ 指定起始位置，filter_path 可以指定需要顯示的數據，如本例中顯示在最后的結果中的只有 _id 和 _type。

res_3 = es.search(index="bank", q="Holmes", size=1, from_=1)res_4 = es.search(index="bank", q=" 39225 5686 ", size=1000, filter_path=['hits.hits._id', 'hits.hits._type'])

查詢指定索引的所有數據：

其中，index 指定索引，字符串表示一個索引；列表表示多個索引，如 index=["bank", "banner", "country"]；正則形式表示符合條件的多個索引，如 index=["apple*"]，表示以 apple 開頭的全部索引。

search 中同樣可以指定具體 doc-type。

from elasticsearch_dsl import Searchs = Search(using=es, index="index-test").execute()print s.to_dict()

根據某個字段查詢，可以多個查詢條件疊加：

s?=?Search(using=es,?index="index-test").query("match",?sip="192.168.1.1")s?=?s.query("match",?dip="192.168.1.2")s?=?s.excute()

多字段查詢：

from?elasticsearch_dsl.query?import?MultiMatch,?Matchmulti_match?=?MultiMatch(query='hello',?fields=['title',?'content'])s?=?Search(using=es,?index="index-test").query(multi_match)s?=?s.execute()print?s.to_dict()

還可以用 Q() 對象進行多字段查詢，fields 是一個列表，query 為所要查詢的值。

from?elasticsearch_dsl?import?Qq?=?Q("multi_match",?query="hello",?fields=['title',?'content'])s?=?s.query(q).execute()print?s.to_dict()

Q() 第一個參數是查詢方法，還可以是 bool。

q?=?Q('bool',?must=[Q('match',?),?Q('match',?content='world')])s?=?s.query(q).execute()print?s.to_dict()

通過 Q() 進行組合查詢，相當于上面查詢的另一種寫法。

q?=?Q("match",?)?|?Q("match",?)s?=?s.query(q).execute()print(s.to_dict())#?{"bool":?{"should":?[...]}}q?=?Q("match",?)?&?Q("match",?)s?=?s.query(q).execute()print(s.to_dict())#?{"bool":?{"must":?[...]}}q?=?~Q("match",?)s?=?s.query(q).execute()print(s.to_dict())#?{"bool":?{"must_not":?[...]}}

過濾，在此為范圍過濾，range 是方法，timestamp 是所要查詢的 field 名字，gte 為大于等于，lt 為小于，根據需要設定即可。

關于 term 和 match 的區別，term 是精確匹配，match 會模糊化，會進行分詞，返回匹配度分數，(term 如果查詢小寫字母的字符串，有大寫會返回空即沒有命中，match 則是不區分大小寫都可以進行查詢，返回結果也一樣)

#?范圍查詢s?=?s.filter("range",?timestamp={"gte":?0,?"lt":?time.time()}).query("match",?country="in")#?普通過濾res_3?=?s.filter("terms",?balance_num=["39225",?"5686"]).execute()

其他寫法：

s?=?Search()s?=?s.filter('terms',?tags=['search',?'python'])print(s.to_dict())#?{'query':?{'bool':?{'filter':?[{'terms':?{'tags':?['search',?'python']}}]}}}s?=?s.query('bool',?filter=[Q('terms',?tags=['search',?'python'])])print(s.to_dict())#?{'query':?{'bool':?{'filter':?[{'terms':?{'tags':?['search',?'python']}}]}}}s?=?s.exclude('terms',?tags=['search',?'python'])#?或者s?=?s.query('bool',?filter=[~Q('terms',?tags=['search',?'python'])])print(s.to_dict())#?{'query':?{'bool':?{'filter':?[{'bool':?{'must_not':?[{'terms':?{'tags':?['search',?'python']}}]}}]}}}

聚合可以放在查詢，過濾等操作的后面疊加，需要加 aggs。

bucket 即為分組，其中第一個參數是分組的名字，自己指定即可，第二個參數是方法，第三個是指定的 field。

metric 也是同樣，metric 的方法有 sum、avg、max、min 等，但是需要指出的是，有兩個方法可以一次性返回這些值，stats 和 extended_stats，后者還可以返回方差等值。

#?實例1s.aggs.bucket("per_country",?"terms",?field="timestamp").metric("sum_click",?"stats",?field="click").metric("sum_request",?"stats",?field="request")#?實例2s.aggs.bucket("per_age",?"terms",?field="click.keyword").metric("sum_click",?"stats",?field="click")#?實例3s.aggs.metric("sum_age",?"extended_stats",?field="impression")#?實例4s.aggs.bucket("per_age",?"terms",?field="country.keyword")#?實例5，此聚合是根據區間進行聚合a?=?A("range",?field="account_number",?ranges=[{"to":?10},?{"from":?11,?"to":?21}])res?=?s.execute()

最后依然要執行 execute()，此處需要注意，s.aggs 操作不能用變量接收(如 res=s.aggs，這個操作是錯誤的)，聚合的結果會保存到 res 中顯示。

排序

s?=?Search().sort('category',?'-title',?{"lines"?:?{"order"?:?"asc",?"mode"?:?"avg"}})

分頁

s?=?s[10:20]#?{"from":?10,?"size":?10}

一些擴展方法，感興趣的同學可以看看：

s = Search()# 設置擴展屬性使用`.extra()`方法s = s.extra(explain=True)# 設置參數使用`.params()`s = s.params(search_type="count")# 如要要限制返回字段，可以使用`source()`方法# only return the selected fieldss = s.source(['title', 'body'])# don't return any fields, just the metadatas = s.source(False)# explicitly include/exclude fieldss = s.source(include=["title"], exclude=["user.*"])# reset the field selections = s.source(None)# 使用dict序列化一個查詢s = Search.from_dict({"query": {"match": {"title": "python"}}})# 修改已經存在的查詢s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42})

參考文檔：

http://fingerchou.com/2017/08/12/elasticsearch-dsl-with-python-usage-1/

http://fingerchou.com/2017/08/13/elasticsearch-dsl-with-python-usage-2/

https://blog.csdn.net/JunFeng666/article/details/78251788

總結

以上是生活随笔為你收集整理的elasticsearch in查询_Python Elasticsearch DSL 查询、过滤、聚合操作实例的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： c++ windows 点击按钮跳转另一
下一篇： tcppwebbrower 关闭安全警报