Es聚合查询
Es分頁(yè)查詢:from+size
{ "query": { "bool": { "must": [ { "term": { "architect.keyword": { "value": "郭鋒" } } }, { "range": { "NRunTime": { "lte": 100 } } } ] } }, "size": 10, "from": 100 }
from 相當(dāng)于offset,size相當(dāng)于每頁(yè)多少個(gè),上邊例子中代表從第100個(gè)數(shù)據(jù)開始(第11頁(yè)),查詢出10條數(shù)據(jù)
?
Es多字段查詢? multi_match
{
"query": {
"bool": {
"must": {
"multi_match" : {
"query" : "search_key",
"type" : "best_fields",
"fields" : ["column1", "column2"],? ?//字段column1、column2模糊匹配search_key
"analyzer" : "ik_smart"? ? ?//漢字按ik_smart分詞
}
},
"filter": {? ? //filter range lte(小于) hte(大于)
"range":{
"column3":{
"lte":1//小于
}
}
}
}
},
"stored_fields": ["column1", "column2", "column3", "column4","column5"],
"highlight" : {//高亮顯示
"fields" : {
"column1" : {},
"column2" : {}
}
}
}
?
Es match match_phrase查詢
?match:會(huì)將查詢字段分隔,比如查詢java spark,采用match會(huì)分詞 java/spark,將es中包含java、spark、以及java***spark的查詢出來(lái)
?match_phrase:不會(huì)講查詢字段分隔,比如查詢java spark,采用match_phrase會(huì)將es中包含 ***java spark***的內(nèi)容查詢出來(lái)
?match提高查詢召回率,match_phrase提高查詢精度
match查詢例子:
1、match查詢 GET /forum/article/_search {"query": {"bool": {"must": [{"match": {"content": "java spark"}}]}} }查詢結(jié)果 {"took": 54,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 2,"max_score": 0.68640786,"hits": [{"_index": "forum","_type": "article","_id": "2","_score": 0.68640786,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "2017-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams","new_author_last_name": "Williams","new_author_first_name": "Smith"}},{"_index": "forum","_type": "article","_id": "5","_score": 0.68324494,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "2017-03-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java spark","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny"}}]} } View Code?
match_phrase查詢例子:
GET /forum/article/_search {"query": {"match_phrase": {"content": {"query": "java spark","slop" : 50}}} } 結(jié)果: {"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 1,"max_score": 0.5753642,"hits": [{"_index": "forum","_type": "article","_id": "5","_score": 0.5753642,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "2017-03-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java spark","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny"}}]} } View Code?
結(jié)果發(fā)現(xiàn)match_phrase只會(huì)返回既包含java又包含spark的數(shù)據(jù),召回率降低
參考:https://www.jianshu.com/p/18d80cafe145
分詞器
?通過(guò)analyzer指定分詞器類型
?系統(tǒng)默認(rèn)分詞器:
? ① standard分詞器:單詞切分,將詞匯轉(zhuǎn)為小寫、去掉標(biāo)點(diǎn)符號(hào) ------按單詞分
POST _analyze {"analyzer": "standard","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }text分詞結(jié)果: [ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ] View Code?
② whitespace分詞器:空格分詞,對(duì)字符沒有l(wèi)owcase化? ?-----按空格分
POST _analyze {"analyzer": "whitespace","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } 分詞結(jié)果: [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ] View Code?
③simple分詞器:通過(guò)非字母字符來(lái)分隔文本信息,有l(wèi)owcase化,該分詞器去掉數(shù)字類型字符? ----按非首字母分??
POST _analyze {"analyzer": "simple","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } 分詞結(jié)果: [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ] View Code?
④stop分詞器:通過(guò)非字母字符來(lái)分隔文本信息,同時(shí)去掉英文中a、an、the等常用字符,通過(guò)stopwords也可以自己設(shè)置需要過(guò)濾掉的單詞, 該分詞器去掉數(shù)字類型字符? ----按非首字母分 ,去a、an、the
PUT my_index {"settings": {"analysis": {"analyzer": {"my_stop_analyzer": {"type": "stop","stopwords": ["the", "over"]}}}} }POST my_index/_analyze {"analyzer": "my_stop_analyzer","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." } 結(jié)果: [ quick, brown, foxes, jumped, lazy, dog, s, bone ]?
?
中文分詞器:
① ik-max-world:會(huì)將文本做最細(xì)粒度的拆分;盡可能多的拆分出詞語(yǔ)
ik-smart:做最粗粒度的拆分;已被拆出詞語(yǔ)將不會(huì)再次被其他詞語(yǔ)占用
ik分詞器熱刺更新配置:
修改 IK 的配置文件 :ES 目錄/plugins/ik/config/ik/IKAnalyzer.cfg.xml
ik-max-world VS ik-smart例子;
curl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_max_word' -d '聯(lián)想是全球最大的筆記本廠商' #返回{"tokens" : [{"token" : "聯(lián)想","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "是","start_offset" : 2,"end_offset" : 3,"type" : "CN_CHAR","position" : 1},{"token" : "全球","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "最大","start_offset" : 5,"end_offset" : 7,"type" : "CN_WORD","position" : 3},{"token" : "的","start_offset" : 7,"end_offset" : 8,"type" : "CN_CHAR","position" : 4},{"token" : "筆記本","start_offset" : 8,"end_offset" : 11,"type" : "CN_WORD","position" : 5},{"token" : "筆記","start_offset" : 8,"end_offset" : 10,"type" : "CN_WORD","position" : 6},{"token" : "本廠","start_offset" : 10,"end_offset" : 12,"type" : "CN_WORD","position" : 7},{"token" : "廠商","start_offset" : 11,"end_offset" : 13,"type" : "CN_WORD","position" : 8}] }# ik_smartcurl -XGET 'http://localhost:9200/_analyze?pretty&analyzer=ik_smart' -d '聯(lián)想是全球最大的筆記本廠商'# 返回{"tokens" : [{"token" : "聯(lián)想","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "是","start_offset" : 2,"end_offset" : 3,"type" : "CN_CHAR","position" : 1},{"token" : "全球","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "最大","start_offset" : 5,"end_offset" : 7,"type" : "CN_WORD","position" : 3},{"token" : "的","start_offset" : 7,"end_offset" : 8,"type" : "CN_CHAR","position" : 4},{"token" : "筆記本","start_offset" : 8,"end_offset" : 11,"type" : "CN_WORD","position" : 5},{"token" : "廠商","start_offset" : 11,"end_offset" : 13,"type" : "CN_WORD","position" : 6}] } View Code?
?
參考:https://segmentfault.com/a/1190000012553894
es聚合查詢
aggs
"aggs": {"NAME": {# 指定結(jié)果的名稱"AGG_TYPE": {# 指定具體的聚合方法, TODO: # 聚合體內(nèi)制定具體的聚合字段 } } TODO: # 該處可以嵌套聚合 }例子:
{"size": 0,"aggs": {"sum_install": {"date_histogram": {"field": "yyyymmdd","interval": "day"},"aggs": {"types": {"terms": {"field": "type.keyword","size": 10},"aggs": {"install": {"sum": {"field": "install"}}}}}}} } View Code?
作用查詢每天,不同type對(duì)應(yīng)install總量
range用法 && must should
range :field:lte gte
must:and操作? should:or操作
{
? "query": {
? ? "bool": {
? ? ? "must": [
? ? ? ? {"range": {
? ? ? ? ? "recive_time": {
? ? ? ? ? ? "gte": "2017-12-25T01:00:00.000Z",
? ? ? ? ? ? "lte": "2017-12-25T02:10:00.000Z"
? ? ? ? ? }
? ? ? ? }},
? ? ? ? {
? ? ? ? ? "bool": {
? ? ? ? ? ? "should": [
? ? ? ? ? ? ? {"range": {
? ? ? ? ? ? ? ? "live_delay": {
? ? ? ? ? ? ? ? ? "gte": 1500
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? }},
? ? ? ? ? ? ? {
? ? ? ? ? ? ? ? "range": {
? ? ? ? ? ? ? ? ? "stream_break_count.keyword": {
? ? ? ? ? ? ? ? ? ? "gte": 1
? ? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? ? }
? ? ? ? ? ? ? }
? ? ? ? ? ? ]
? ? ? ? ? }
? ? ? ? }
? ? ? ]
? ? }
? }
}
must的兩個(gè)條件都必須滿足,should中的兩個(gè)條件至少滿足一個(gè)就可以
?
Es 原來(lái)索引中添加字段
PUT /my_index/_mapping/my_type
{ "properties": { "new_field_name": { "type": "string" } } }
?
給新添加字段賦值
POST my_index/_update_by_query? ? //批量更新用_update_by_query語(yǔ)法
{ "script": { "lang": "painless", "inline": "ctx._source.new_field_name= '02'" } }? ?//通過(guò)painless更新對(duì)象值
?
參考:https://blog.csdn.net/qq_36330643/article/details/79771652
雜亂查詢語(yǔ)法:
aggs+avg+max+min+order+cardinality(等價(jià)于count(distinct(a)))+filter+sort
terms相當(dāng)于sql中的groupby
aggs:基于搜索查詢,可以嵌套組合復(fù)雜查詢語(yǔ)法
"aggs": {"NAME": {# 指定結(jié)果的名稱"AGG_TYPE": {# 指定具體的聚合方法, TODO: # 聚合體內(nèi)制定具體的聚合字段 } } TODO: # 該處可以嵌套聚合 }?
一、聚合起步
1、創(chuàng)建索引
1.1 創(chuàng)建索引DSL實(shí)現(xiàn)
put cars
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
1.2 創(chuàng)建mysql庫(kù)表sql實(shí)現(xiàn)
CREATE TABLE `cars` (
`id` int(11) NOT NULL,
`price` int(11) DEFAULT NULL,
`color` varchar(255) DEFAULT NULL,
`make` varchar(255) DEFAULT NULL,
`sold` date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
2、統(tǒng)計(jì)不同顏色車的數(shù)目
2.1 統(tǒng)計(jì)不同顏色車的DSL實(shí)現(xiàn)
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"popular_colors": {
"terms": {
"field": "color.keyword"
}
}
}
}
返回結(jié)果:
lve
2.2 統(tǒng)計(jì)不同顏色的mysql實(shí)現(xiàn)
select color, count(color) as cnt from cars group by color order by cnt desc;
返回結(jié)果:
red 4
green 2
blue 2
3、統(tǒng)計(jì)不同顏色車的平均價(jià)格
3.1 統(tǒng)計(jì)不同顏色車的平均價(jià)格DSL實(shí)現(xiàn):
terms 相當(dāng)于groupby
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
返回聚合結(jié)果:
lve
3.2 統(tǒng)計(jì)不同顏色車的平均價(jià)格sql實(shí)現(xiàn):
select color, count(color) as cnt, avg(price) as avg_price from cars group by color order by cnt desc;
color cnt avg_price
red 4 32500.0000
green 2 21000.0000
blue 2 20000.0000
4、每種顏色汽車制造商的分布
4.1 統(tǒng)計(jì)每種顏色汽車制造商的分布dsl實(shí)現(xiàn)
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"make": {
"terms": {
"field": "make.keyword"
}
}
}
}
}
}
返回結(jié)果:
4.2 統(tǒng)計(jì)每種顏色汽車制造商的分布sql實(shí)現(xiàn)
說(shuō)明:和dsl的實(shí)現(xiàn)不嚴(yán)格對(duì)應(yīng)
select color, make from cars order by color;
color make
blue toyota
blue ford
green ford
green toyota
red bmw
red honda
red honda
red honda
5、統(tǒng)計(jì)每個(gè)制造商的最低價(jià)格、最高價(jià)格
5.1 統(tǒng)計(jì)每個(gè)制造商的最低、最高價(jià)格的DSL實(shí)現(xiàn)
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"make_class": {
"terms": {
"field": "make.keyword"
},
"aggs": {
"min_price": {
"min": {
"field": "price"
}
},
"max_price": {
"max": {
"field": "price"
}
}
}
}
}
}
聚合結(jié)果:
5.2 統(tǒng)計(jì)每個(gè)制造商的最低、最高價(jià)格的sql實(shí)現(xiàn)
select make, min(price) as min_price, max(price) as max_price from cars group by make;
make min_price max_price
bmw 80000 80000
ford 25000 30000
honda 10000 20000
toyota 12000 15000
二、聚合進(jìn)階
1、條形圖聚合
1.1 分段統(tǒng)計(jì)每個(gè)區(qū)間的汽車銷售價(jià)格總和
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"price": {
"histogram": {
"field": "price",
"interval": 20000
},
"aggs": {
"revenue": {
"sum": {
"field": "price"
}
}
}
}
}
}
汽車銷售價(jià)格區(qū)間:定義為20000;
分段統(tǒng)計(jì)price和用sum統(tǒng)計(jì)。
1.2 多維度度量不同制造商的汽車指標(biāo)
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"makes": {
"terms": {
"field": "make.keyword",
"size": 10
},
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
}
}
}
}
輸出截取片段:
{
"key": "ford",
"doc_count": 2,
"stats": {
"count": 2,
"min": 25000,
"max": 30000,
"avg": 27500,
"sum": 55000,
"sum_of_squares": 1525000000,
"variance": 6250000,
"std_deviation": 2500,
"std_deviation_bounds": {
"upper": 32500,
"lower": 22500
}
}
}
2、按時(shí)間統(tǒng)計(jì)聚合
2.1 按月份統(tǒng)計(jì)制造商汽車銷量dsl實(shí)現(xiàn)
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"month",
"format":"yyyy-MM-dd"
}
}
}
}
返回結(jié)果:
2.2 按月份統(tǒng)計(jì)制造商汽車銷量sql實(shí)現(xiàn)
SELECT make, count(make) as cnt, CONCAT(YEAR(sold),',',MONTH(sold)) AS data_time
FROM `cars`
GROUP BY YEAR(sold) DESC,MONTH(sold)
查詢結(jié)果如下:
make cnt data_time
bmw 1 2014,1
ford 1 2014,2
ford 1 2014,5
toyota 1 2014,7
toyota 1 2014,8
honda 1 2014,10
honda 2 2014,11
2.3 包含12月份的處理DSL實(shí)現(xiàn)
以上2.1 中沒有12月份的統(tǒng)計(jì)結(jié)果顯示。
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"month",
"format":"yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-31"
}
}
}
}
}
2.4 以季度為單位統(tǒng)計(jì)DSL實(shí)現(xiàn)
GET /cars/transactions/_search
{
"size" : 0,
"aggs": {
"sales":{
"date_histogram":{
"field":"sold",
"interval":"quarter",
"format":"yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds":{
"min":"2014-01-01",
"max":"2014-12-31"
}
},
"aggs":{
"per_make_sum":{
"terms":{
"field": "make.keyword"
},
"aggs":{
"sum_price":{
"sum":{ "field": "price"}
}
}
},
"top_sum": {
"sum": {"field":"price"}
}
}
}
}
}
2.5 基于搜索的(范圍限定)聚合操作
2.5.1 基礎(chǔ)查詢聚合
GET /cars/transactions/_search
{
"query" : {
"match" : {
"make.keyword" : "ford"
}
},
"aggs" : {
"colors" : {
"terms" : {
"field" : "color.keyword"
}
}
}
}
對(duì)應(yīng)的sql實(shí)現(xiàn):
select make, color from cars
where make = "ford";
結(jié)果返回如下:
make color
ford green
ford blue
三、過(guò)濾聚合
1. 過(guò)濾操作
統(tǒng)計(jì)全部汽車的平均價(jià)錢以及單品平均價(jià)錢;
GET /cars/transactions/_search
{
"size" : 0,
"query" : {
"match" : {
"make.keyword" : "ford"
}
},
"aggs" : {
"single_avg_price": {
"avg" : { "field" : "price" }
},
"all": {
"global" : {},
"aggs" : {
"avg_price": {
"avg" : { "field" : "price" }
}
}
}
}
}
等價(jià)于:
select make, color, avg(price) from cars
where make = "ford" ;
select avg(price) from cars;
2、范圍限定過(guò)濾(過(guò)濾桶)
我們可以指定一個(gè)過(guò)濾桶,當(dāng)文檔滿足過(guò)濾桶的條件時(shí),我們將其加入到桶內(nèi)。
GET /cars/transactions/_search
{
"size" : 0,
"query":{
"match": {
"make": "ford"
}
},
"aggs":{
"recent_sales": {
"filter": {
"range": {
"sold": {
"from": "now-100M"
}
}
},
"aggs": {
"average_price":{
"avg": {
"field": "price"
}
}
}
}
}
}
mysql的實(shí)現(xiàn)如下:
select *, avg(price) from cars where period_diff(date_format(now() , '%Y%m') , date_format(sold, '%Y%m')) > 30
and make = "ford";
mysql查詢結(jié)果如下:
id price color make sold avg
3 30000 green ford 2014-05-18 27500.0000
3、后過(guò)濾器
只過(guò)濾搜索結(jié)果,不過(guò)濾聚合結(jié)果——post_filter實(shí)現(xiàn)
GET /cars/transactions/_search
{
"query": {
"match": {
"make": "ford"
}
},
"post_filter": {
"term" : {
"color.keyword" : "green"
}
},
"aggs" : {
"all_colors": {
"terms" : { "field" : "color.keyword" }
}
}
}
post_filter 會(huì)過(guò)濾搜索結(jié)果,只展示綠色 ford 汽車。這在查詢執(zhí)行過(guò) 后 發(fā)生,所以聚合不受影響。
小結(jié)
選擇合適類型的過(guò)濾(如:搜索命中、聚合或兩者兼有)通常和我們期望如何表現(xiàn)用戶交互有關(guān)。選擇合適的過(guò)濾器(或組合)取決于我們期望如何將結(jié)果呈現(xiàn)給用戶。
在 filter 過(guò)濾中的 non-scoring 查詢,同時(shí)影響搜索結(jié)果和聚合結(jié)果。
filter 桶影響聚合。
post_filter 只影響搜索結(jié)果。
四、多桶排序
4.1 內(nèi)置排序
GET /cars/transactions/_search
{
"size" : 0,
"aggs" : {
"colors" : {
"terms" : {
"field" : "color.keyword",
"order": {
"_count" : "asc"
}
}
}
}
}
4.2 按照度量排序
以下是按照汽車平均售價(jià)的升序進(jìn)行排序。
過(guò)濾條件:汽車顏色;
聚合條件:平均價(jià)格;
排序條件:汽車的平均價(jià)格升序。
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword",
"order": {
"avg_price": "asc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
多條件聚合后排序如下所示:
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"colors": {
"terms": {
"field": "color.keyword",
"order": {
"stats.variance": "asc"
}
},
"aggs": {
"stats": {
"extended_stats": {
"field": "price"
}
}
}
}
}
}
4.3 基于“深度”的度量排序
太復(fù)雜,不推薦!
五、近似聚合
cardinality的含義是“基數(shù)”;
5.1 統(tǒng)計(jì)去重后的數(shù)量
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"distinct_colors": {
"cardinality": {
"field": "color.keyword"
}
}
}
}
類似于:
SELECT COUNT(DISTINCT color) FROM cars;
以下:
以月為周期統(tǒng)計(jì);
GET /cars/transactions/_search
{
"size": 0,
"aggs": {
"months": {
"date_histogram": {
"field": "sold",
"interval": "month"
},
"aggs": {
"distinct_colors": {
"cardinality": {
"field": "color.keyword"
}
}
}
}
}
}
轉(zhuǎn)載于:https://www.cnblogs.com/enhance/p/11263859.html
總結(jié)
- 上一篇: 手写一个MVVM
- 下一篇: SDNU 1330.Max Sum(最大