當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

聚合

發(fā)布時(shí)間：2024/4/13 编程问答 22 豆豆

生活随笔收集整理的這篇文章主要介紹了聚合小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

聚合aggregations

聚合可以讓我們極其方便的實(shí)現(xiàn)對(duì)數(shù)據(jù)的統(tǒng)計(jì)、分析。例如：

什么品牌的手機(jī)最受歡迎？
這些手機(jī)的平均價(jià)格、最高價(jià)格、最低價(jià)格？
這些手機(jī)每月的銷售情況如何？

實(shí)現(xiàn)這些統(tǒng)計(jì)功能的比數(shù)據(jù)庫的sql要方便的多，而且查詢速度非常快，可以實(shí)現(xiàn)實(shí)時(shí)搜索效果。

基本概念

Elasticsearch中的聚合，包含多種類型，最常用的兩種，一個(gè)叫桶，一個(gè)叫度量：

桶（bucket）

桶的作用，是按照某種方式對(duì)數(shù)據(jù)進(jìn)行分組，每一組數(shù)據(jù)在ES中稱為一個(gè)桶，例如我們根據(jù)國籍對(duì)人劃分，可以得到中國桶、英國桶，日本桶……或者我們按照年齡段對(duì)人進(jìn)行劃分：0~10,10~20,20~30,30~40等。

Elasticsearch中提供的劃分桶的方式有很多：

Date Histogram Aggregation：根據(jù)日期階梯分組，例如給定階梯為周，會(huì)自動(dòng)每周分為一組
Histogram Aggregation：根據(jù)數(shù)值階梯分組，與日期類似
Terms Aggregation：根據(jù)詞條內(nèi)容分組，詞條內(nèi)容完全匹配的為一組
Range Aggregation：數(shù)值和日期的范圍分組，指定開始和結(jié)束，然后按段分組
……

bucket aggregations 只負(fù)責(zé)對(duì)數(shù)據(jù)進(jìn)行分組，并不進(jìn)行計(jì)算，因此往往bucket中往往會(huì)嵌套另一種聚合：metrics aggregations即度量

度量（metrics）

分組完成以后，我們一般會(huì)對(duì)組中的數(shù)據(jù)進(jìn)行聚合運(yùn)算，例如求平均值、最大、最小、求和等，這些在ES中稱為度量

比較常用的一些度量聚合方式：

Avg Aggregation：求平均值
Max Aggregation：求最大值
Min Aggregation：求最小值
Percentiles Aggregation：求百分比
Stats Aggregation：同時(shí)返回avg、max、min、sum、count等
Sum Aggregation：求和
Top hits Aggregation：求前幾
Value Count Aggregation：求總數(shù)
……

為了測試聚合，我們先批量導(dǎo)入一些數(shù)據(jù)

創(chuàng)建索引：

PUT /cars {"settings": {"number_of_shards": 1,"number_of_replicas": 0},"mappings": {"transactions": {"properties": {"color": {"type": "keyword"},"make": {"type": "keyword"}}}} }

注意：在ES中，需要進(jìn)行聚合、排序、過濾的字段其處理方式比較特殊，因此不能被分詞。這里我們將color和make這兩個(gè)文字類型的字段設(shè)置為keyword類型，這個(gè)類型不會(huì)被分詞，將來就可以參與聚合

導(dǎo)入數(shù)據(jù)

POST /cars/transactions/_bulk { "index": {}} { "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" } { "index": {}} { "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" } { "index": {}} { "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" } { "index": {}} { "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" } { "index": {}} { "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" } { "index": {}} { "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }

聚合為桶

首先，我們按照汽車的顏色color來劃分桶

GET /cars/_search {"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color"}}} }

size：查詢條數(shù)，這里設(shè)置為0，因?yàn)槲覀儾魂P(guān)心搜索到的數(shù)據(jù)，只關(guān)心聚合結(jié)果，提高效率
aggs：聲明這是一個(gè)聚合查詢，是aggregations的縮寫
- popular_colors：給這次聚合起一個(gè)名字，任意。
  - terms：劃分桶的方式，這里是根據(jù)詞條劃分
    - field：劃分桶的字段

結(jié)果：

{"took": 1,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"popular_colors": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "red","doc_count": 4},{"key": "blue","doc_count": 2},{"key": "green","doc_count": 2}]}} }

hits：查詢結(jié)果為空，因?yàn)槲覀冊(cè)O(shè)置了size為0
aggregations：聚合的結(jié)果
popular_colors：我們定義的聚合名稱
buckets：查找到的桶，每個(gè)不同的color字段值都會(huì)形成一個(gè)桶
- key：這個(gè)桶對(duì)應(yīng)的color字段的值
- doc_count：這個(gè)桶中的文檔數(shù)量

通過聚合的結(jié)果我們發(fā)現(xiàn)，目前紅色的小車比較暢銷！

桶內(nèi)度量

前面的例子告訴我們每個(gè)桶里面的文檔數(shù)量，這很有用。但通常，我們的應(yīng)用需要提供更復(fù)雜的文檔度量。例如，每種顏色汽車的平均價(jià)格是多少？

因此，我們需要告訴Elasticsearch使用哪個(gè)字段，使用何種度量方式進(jìn)行運(yùn)算，這些信息要嵌套在桶內(nèi)，度量的運(yùn)算會(huì)基于桶內(nèi)的文檔進(jìn)行

現(xiàn)在，我們?yōu)閯倓偟木酆辖Y(jié)果添加求價(jià)格平均值的度量：

GET /cars/_search {"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color"},"aggs":{"avg_price": { "avg": {"field": "price" }}}}} }

aggs：我們?cè)谏弦粋€(gè)aggs(popular_colors)中添加新的aggs。可見度量也是一個(gè)聚合
avg_price：聚合的名稱
avg：度量的類型，這里是求平均值
field：度量運(yùn)算的字段

結(jié)果：

..."aggregations": {"popular_colors": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "red","doc_count": 4,"avg_price": {"value": 32500}},{"key": "blue","doc_count": 2,"avg_price": {"value": 20000}},{"key": "green","doc_count": 2,"avg_price": {"value": 21000}}]}} ...

可以看到每個(gè)桶中都有自己的avg_price字段，這是度量聚合的結(jié)果

桶內(nèi)嵌套桶

剛剛的案例中，我們?cè)谕皟?nèi)嵌套度量運(yùn)算。事實(shí)上桶不僅可以嵌套運(yùn)算，還可以再嵌套其它桶。也就是說在每個(gè)分組中，再分更多組。

比如：我們想統(tǒng)計(jì)每種顏色的汽車中，分別屬于哪個(gè)制造商，按照make字段再進(jìn)行分桶

GET /cars/_search {"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color"},"aggs":{"avg_price": { "avg": {"field": "price" }},"maker":{"terms":{"field":"make"}}}}} }

原來的color桶和avg計(jì)算我們不變
maker：在嵌套的aggs下新添一個(gè)桶，叫做maker
terms：桶的劃分類型依然是詞條
filed：這里根據(jù)make字段進(jìn)行劃分

部分結(jié)果：

... {"aggregations": {"popular_colors": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "red","doc_count": 4,"maker": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "honda","doc_count": 3},{"key": "bmw","doc_count": 1}]},"avg_price": {"value": 32500}},{"key": "blue","doc_count": 2,"maker": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "ford","doc_count": 1},{"key": "toyota","doc_count": 1}]},"avg_price": {"value": 20000}},{"key": "green","doc_count": 2,"maker": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "ford","doc_count": 1},{"key": "toyota","doc_count": 1}]},"avg_price": {"value": 21000}}]}} } ...

我們可以看到，新的聚合maker被嵌套在原來每一個(gè)color的桶中。
每個(gè)顏色下面都根據(jù) make字段進(jìn)行了分組
我們能讀取到的信息：
- 紅色車共有4輛
- 紅色車的平均售價(jià)是 $32，500 美元。
- 其中3輛是 Honda 本田制造，1輛是 BMW 寶馬制造。

超強(qiáng)干貨來襲云風(fēng)專訪：近40年碼齡，通宵達(dá)旦的技術(shù)人生

總結(jié)

以上是生活随笔為你收集整理的聚合的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。