elasticsearch高级查询进阶
文章目錄
- 前期準(zhǔn)備
- 應(yīng)用場景
- 1.constant_score查詢-不考慮文檔頻率得分,與搜索關(guān)鍵字命中更多的返回結(jié)果
- 2.sort排序-分?jǐn)?shù)相同情況下,按照指定價格域排序
- 3.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同進行召回
- 4.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同,再加上制定field的分?jǐn)?shù),最后最終得分返回,eg:title\^3\+content^1+time
- 5.不考慮TFIDF得分,同一區(qū)域下,不同品牌權(quán)重不同
- 6.如何基于地理位置查詢,并且類似于自如租房查找周邊價格便宜并且距離近的搜索,但是距離不會完全限定死?
- 7.有些場景需要根據(jù)配置參數(shù)值進行排序,例如在所有手機中xiaomi手機得分最高?
- 8.bm25相似度調(diào)優(yōu),禁用歸一化
- 9.query_string使用:
- 10.黃桃、罐頭badcase-命中黃桃和罐頭商品排在前面,沒有完全命中排在后面解決方案
- 監(jiān)控
- _stats索引監(jiān)控
前期準(zhǔn)備
索引mappings:
{"shop_titled_index": {"mappings": {"properties": {"brand": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"price": {"type": "long"},"region": {"type": "long"},"shopId": {"type": "long"},"skuId": {"type": "long"},"title": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}}}}} }準(zhǔn)備數(shù)據(jù):
{"_index": "shop_titled_index","_type": "_doc","_id": "dJAM3HYByj_ONITHr0gq","_score": 1,"_source": {"brand": "iphone","price": 8000,"title": "iphone 12 64G red 5G","skuId": 2020122201,"shopId": 2,"region": 1001}} {"_index": "shop_titled_index","_type": "_doc","_id": "9ZA6inYByj_ONITHT0bH","_score": 1,"_source": {"brand": "iphone","price": 8000,"title": "iphone 12 64G red 5G","skuId": 2020122201,"shopId": 1,"region": 1001}}應(yīng)用場景
1.constant_score查詢-不考慮文檔頻率得分,與搜索關(guān)鍵字命中更多的返回結(jié)果
{"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "iphone"}},"boost": 1}},{"constant_score": {"filter": {"match": {"title": "12"}}}}]}}2.sort排序-分?jǐn)?shù)相同情況下,按照指定價格域排序
{"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "iphone"}},"boost": 1}},{"constant_score": {"filter": {"match": {"title": "12"}}}}]}},"sort": [{"_score": {"order": "desc"}},{"price": {"order": "asc"}}] }3.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同進行召回
{"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "red"}},"boost": 1}},{"constant_score": {"filter": {"match": {"brand": "iphone"}},"boost":3}}]}},"sort":[{"_score":{"order":"desc"},"price":{"order":"asc"}}] }4.不考慮文檔頻率TF/IDF情況下,不同域打分權(quán)重不同,再加上制定field的分?jǐn)?shù),最后最終得分返回,eg:title^3+content^1+time
{"query": {"function_score": {"query": {"bool": {"should": [{"constant_score": {"filter": {"match": {"title": "red"}},"boost": 1}},{"constant_score": {"filter": {"match": {"brand": "iphone"}},"boost": 3}}]}},"field_value_factor": {"field": "shopId"},"boost_mode": "sum"}} }5.不考慮TFIDF得分,同一區(qū)域下,不同品牌權(quán)重不同
文檔:https://www.elastic.co/guide/cn/elasticsearch/guide/current/function-score-filters.html
{"query": {"function_score": {"query": {"term": {"region":1002}},"boost": "1","functions": [{"filter": {"term": {"brand.keyword": "huawei"}},"weight": 3},{"filter":{"match":{"brand":"xiaomi"}},"weight":1}],"score_mode": "sum","boost_mode": "sum"}} }使用注意,以下查詢會由于function_score沒有主query,則會返回所有文檔
{"query": {"function_score": {"functions": [{"filter": {"term": {"brand.keyword": "huawei"}},"weight": 3},{"filter":{"match":{"brand":"xiaomi"}},"weight":1}],"score_mode": "sum","boost_mode": "sum"}} }6.如何基于地理位置查詢,并且類似于自如租房查找周邊價格便宜并且距離近的搜索,但是距離不會完全限定死?
參考文檔:https://www.cnblogs.com/xiaoxiaoliu/p/11054405.html
3.準(zhǔn)備數(shù)據(jù)
{"location":{"lon":"116.488781","lat":"39.950565"},"price":"4000","name":"朝陽公園 兩室一廳 12m" } {"location":{"lon":"116.327805","lat":"39.900988"},"price":"2400","name":"北京西站 三室一廳 9m" } {"location": {"lon": "116.403981","lat": "39.916485"},"price": "88888","name": "故宮 無價之寶" } {"location": {"lon": "116.341316","lat": "39.948795"},"price": "3700","name": "北京動物園 三室一廳 19m" }4.geo_distance:找出附近兩公里以內(nèi)數(shù)據(jù)
GET geo_index/_search {"query": {"constant_score": {"filter": {"geo_distance": {"distance": "2km","location": {"lat": 39.93869837,"lon": 116.48357391}}},"boost": 1.2}} }輸出
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 1.2,"hits": [{"_index": "geo_index","_type": "_doc","_id": "1JC14HYByj_ONITHikiw","_score": 1.2,"_source": {"location": {"lon": "116.488781","lat": "39.950565"},"price": "4000","name": "朝陽公園 兩室一廳 12m"}}]} }5.找出數(shù)據(jù),并按照距離排序
文檔:https://www.elastic.co/guide/cn/elasticsearch/guide/current/sorting-by-distance.html
{"query": {"constant_score": {"filter": {"geo_distance": {"distance": "10km","location": {"lat": 39.93869837,"lon": 116.48357391}}},"boost": 1.2}},"sort": {"_geo_distance": {"location": [{"lat": 39.93869837,"lon": 116.48357391}],"unit": "km","distance_type": "arc","order": "asc"}} }6.根據(jù)附近租房和價格查找數(shù)據(jù)
我更偏向距離更近,因此將權(quán)重調(diào)高
參考:https://www.elastic.co/guide/cn/elasticsearch/guide/current/decay-functions.html#CO119-4
結(jié)果:
{"took": 5,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 0.7460326,"hits": [{"_index": "geo_index","_type": "_doc","_id": "95A14XYByj_ONITHg0if","_score": 0.7460326,"_source": {"location": {"lon": "116.47155762","lat": "39.9523853"},"price": "3500","name": "亮馬橋 兩室一廳 12m"}},{"_index": "geo_index","_type": "_doc","_id": "1JC14HYByj_ONITHikiw","_score": 0.36586136,"_source": {"location": {"lon": "116.488781","lat": "39.950565"},"price": "4000","name": "朝陽公園 兩室一廳 12m"}},{"_index": "geo_index","_type": "_doc","_id": "1ZC34HYByj_ONITHRkht","_score": 5.823735e-39,"_source": {"location": {"lon": "116.341316","lat": "39.948795"},"price": "3700","name": "北京動物園 三室一廳 19m"}},{"_index": "geo_index","_type": "_doc","_id": "1pC44HYByj_ONITHAkgJ","_score": 0,"_source": {"location": {"lon": "116.327805","lat": "39.900988"},"price": "2400","name": "北京西站 三室一廳 9m"}}]} }7.有些場景需要根據(jù)配置參數(shù)值進行排序,例如在所有手機中xiaomi手機得分最高?
function_score結(jié)合scrit_score排序
{"query": {"function_score": {"query": {"match_all":{}},"functions": [{"script_score": {"script": {"lang": "painless","params": {"brand": "xiaomi"},"source": "if(doc['brand.keyword'].size() == 0)return 0f; String brandStr = doc['brand.keyword'].value ?: new String();if(params.brand.compareTo(brandStr) == 0){return 1f}return 0"}}}],"score_mode":"sum","boost_mode":"replace"}} }score_mode定義的是如何將各個function的分值合并成一個綜合的分值; boost_mode則定義如何將這個綜合的分值作用在原始query產(chǎn)生的分值上
8.bm25相似度調(diào)優(yōu),禁用歸一化
BM25:bm25提供兩個調(diào)參因子
k1:k1 這個參數(shù)控制著詞頻結(jié)果在詞頻飽和度中的上升速度。默認(rèn)值為 1.2 。值越小飽和度變化越快,值越大飽和度變化越慢。詞頻飽和度可以參看下面官方文檔的截圖,圖中反應(yīng)了詞頻對應(yīng)的得分曲線,k1 控制 tf of BM25 這條曲線。
b:這個參數(shù)控制著字段長歸一值所起的作用, 0.0 會禁用歸一化, 1.0 會啟用完全歸一化。默認(rèn)值為 0.75
title用兩cbm25忽略文檔長度歸一化,搜索結(jié)果與文檔長度無關(guān)
輸出:
{"took": 1,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 0.20983505,"hits": [{"_index": "my_sim_index","_type": "_doc","_id": "nZBO5nYByj_ONITHhknJ","_score": 0.20983505,"_source": {"title": "Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF.","body": "Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF."}},{"_index": "my_sim_index","_type": "_doc","_id": "oZBW5nYByj_ONITHkEli","_score": 0.20983505,"_source": {"title": "or similarity per field. The similarity setting provides a simple way of choosing a similarity","body": "or similarity per field. The similarity setting provides a simple way of choosing a similarity"}},{"_index": "my_sim_index","_type": "_doc","_id": "npBP5nYByj_ONITHK0mo","_score": 0.18360566,"_source": {"title": "A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost.","body": "A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost."}}]} }0.20983505得分相同,盡管文檔長度不一樣
利用body搜索:
GET my_sim_index/_search {"query":{"match":{"body":"similarity"}} }可以看出最后雖然都命中similary兩次但是會受到文檔長度影響
9.query_string使用:
{"query":{"query_string":{"query":"(title:red)^1.0 AND (brand:iphone)"}} }10.黃桃、罐頭badcase-命中黃桃和罐頭商品排在前面,沒有完全命中排在后面解決方案
方案一:利用contant_score
添加一個忽略TFIDF得分并且自定義得分的查詢過濾器用來給完全命中的商品排在前面
方案二
在原function_score查詢語句下的functions里面添加過濾器并添加權(quán)重
監(jiān)控
_stats索引監(jiān)控
Elasticsearch Index Monitoring(索引監(jiān)控)之Index Stats API詳解
請求方式:
參數(shù)解釋:
1 { 2 "_nodes": {3 "total": 1,4 "successful": 1,5 "failed": 06 },7 "cluster_name": "ELKTEST",8 "nodes": {9 "lnlHC8yERCKXCuAc_2DPCQ": {10 "timestamp": 1534242595995,11 "name": "OPS01-ES01",12 "transport_address": "10.9.125.148:9300",13 "host": "10.9.125.148",14 "ip": "10.9.125.148:9300",15 "roles": [16 "master",17 "data",18 "ingest"19 ],20 "attributes": {21 "ml.machine_memory": "8203104256",22 "xpack.installed": "true",23 "ml.max_open_jobs": "20",24 "ml.enabled": "true"25 },26 "indices": {27 "docs": {28 "count": 8111612, # 顯示節(jié)點上有多少文檔29 "deleted": 16604 # 有多少已刪除的文檔還未從數(shù)據(jù)段中刪除30 },31 "store": {32 "size_in_bytes": 2959876263 # 顯示該節(jié)點消耗了多少物理存儲33 },34 "indexing": { #表示索引文檔的次數(shù),這個是通過一個計數(shù)器累加計數(shù)的。當(dāng)文檔被刪除時,它不會減少。注意這個值永遠(yuǎn)是遞增的,發(fā)生在內(nèi)部索引數(shù)據(jù)的時候,包括那些更新操作35 "index_total": 17703152,36 "index_time_in_millis": 2801934,37 "index_current": 0,38 "index_failed": 0,39 "delete_total": 46242,40 "delete_time_in_millis": 2130,41 "delete_current": 0,42 "noop_update_total": 0,43 "is_throttled": false,44 "throttle_time_in_millis": 0 # 這個值高的時候,說明磁盤流量設(shè)置太低45 },46 "get": {47 "total": 185179,48 "time_in_millis": 22341,49 "exists_total": 185178,50 "exists_time_in_millis": 22337,51 "missing_total": 1,52 "missing_time_in_millis": 4,53 "current": 054 },55 "search": { 56 "open_contexts": 0, # 主動檢索的次數(shù),57 "query_total": 495447, # 查詢總數(shù)58 "query_time_in_millis": 298344, # 節(jié)點啟動到此查詢消耗總時間, query_time_in_millis / query_total的比值可以作為你的查詢效率的粗略指標(biāo)。比值越大,每個查詢用的時間越多,你就需要考慮調(diào)整或者優(yōu)化。59 "query_current": 0, #后面關(guān)于fetch的統(tǒng)計,是描述了查詢的第二個過程(也就是query_the_fetch里的fetch)。fetch花的時間比query的越多,表示你的磁盤很慢,或者你要fetch的的文檔太多。或者你的查詢參數(shù)分頁條件太大,(例如size等于1萬60 "fetch_total": 130194,61 "fetch_time_in_millis": 51211,62 "fetch_current": 0,63 "scroll_total": 22,64 "scroll_time_in_millis": 2196665,65 "scroll_current": 0,66 "suggest_total": 0,67 "suggest_time_in_millis": 0,68 "suggest_current": 069 },70 "merges": { # 包含lucene段合并的信息,它會告訴你有多少段合并正在進行,參與的文檔數(shù),這些正在合并的段的總大小,以及花在merge上的總時間。 如果你的集群寫入比較多,這個merge的統(tǒng)計信息就很重要。merge操作會消耗大量的磁盤io和cpu資源。如果你的索引寫入很多,你會看到大量的merge操作71 "current": 0,72 "current_docs": 0,73 "current_size_in_bytes": 0, ..總結(jié)
以上是生活随笔為你收集整理的elasticsearch高级查询进阶的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Async
- 下一篇: hive的一些常见内置函数