limit mongodb 聚合_MongoDB 统计 group 操作用不了,试试 mapReduce 吧
問題回顧
今天,同事小張 Q 我, 說自己辛苦花了一天的時間,基于 mongodb 數據庫開發的待辦統計功能一直報錯!
于是筆者花了近半小時了解小張的開發需求以及代碼實現方式,大致明白問題出在對待辦 collection 做統計時,調用 collection 的分組 group 函數、聚合 aggregate 函數的使用方式不對。
待辦 collection 文檔分組(group )函數代碼
GroupByResults groupByResults = mongoTemplate.group(new Criteria().andOperator(criteriaArray),mongoTemplate.getCollectionName(PendingEntity.class), groupBy, PendingEntity.class);long resultCount = ((List)groupByResults.getRawResults().get("retval")).size();待辦 collection 文檔聚合(aggregate)函數代碼
AggregationResults results = mongoTemplate.aggregate(aggregation, "studentScore", PendingEntity.class);double totleScore = results.getUniqueMappedResult().getCollect();問題定位
異常信息
Map-reduce supports operations on sharded collections, both as an input and as an output. This section describes the behaviors of mapReduce specific to sharded collections.However, starting in version 4.2, MongoDB deprecates the map-reduce option to create a new sharded collection as well as the use of the sharded option for map-reduce. To output to a sharded collection, create the sharded collection first. MongoDB 4.2 also deprecates the replacement of an existing sharded collection.Sharded Collection as InputWhen using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to finish.Sharded Collection as OutputIf the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _idfield as the shard key.從異常信息提示來看,我注意到 errmsg 字段值:“can't do command: group on sharded collection”,大意是說分片文檔(sharded collection)不能使用分組 group 函數。
筆者猜測是 sharded collection 的問題,于是筆者從一些技術博客和 mongodb 官網查了下使用 group 函數的一些限制,大致如下:
- 分片表不能 group 分組
- group 操作不會處理超過 20000 個唯一鍵( group by 的關鍵字具有唯一性約束條件下)
顯然,分片表不能 group 的限制,也驗證了我的當初的猜想。
于是我問了下運維組的同事,也證實了 mongodb 在創建 collection 文檔時,會指定文檔數據分片到不同服務器上 ,這是出于對 mongodb 穩定性的考慮吧。
解決方案
既然分片表不能 group ,那如何解決分組統計的問題呢?
答案是用 “mapReduce” 。
想到什么呢?
是不是很類似 Hadoop 中的 Map-Reduce 的思想:
MapReduce最重要的一個思想: 分而治之. 就是將負責的大任務分解成若干個小任務, 并行執行. 完成后在合并到一起. 適用于大量復雜的任務處理場景, 大規模數據處理場景.
Map負責“分”,即把復雜的任務分解為若干個“簡單的任務”來并行處理。可以進行拆分的前提是這些小任務可以并行計算,彼此間幾乎沒有依賴關系。
Reduce負責“合”,即對map階段的結果進行全局匯總。
Hadoop 中的 Map-Reduce 執行流程
來源網絡
翻閱 mongodb 官網文檔,對 mapReduce 函數介紹如下:
Map-reduce supports operations on sharded collections, both as an input and as an output. This section describes the behaviors of mapReduce specific to sharded collections.
However, starting in version 4.2, MongoDB deprecates the map-reduce option to create a new sharded collection as well as the use of the sharded option for map-reduce. To output to a sharded collection, create the sharded collection first. MongoDB 4.2 also deprecates the replacement of an existing sharded collection
.
Sharded Collection as Input
When using sharded collection as the input for a map-reduce operation, mongos will automatically dispatch the map-reduce job to each shard in parallel. There is no special option required. mongos will wait for jobs on all shards to finish.
Sharded Collection as Output
If the out field for mapReduce has the sharded value, MongoDB shards the output collection using the _idfield as the shard key.
大意是 mapReduce 支持對 sharded collections 分片文檔 input / output 操作,其處理邏輯如下:
mapReduce 語法格式:
db.collection.mapReduce(,?????????,?????????????????????????{???????????????????????????out: ,???????????????????????????query: ,???????????????????????????sort: ,???????????????????????????limit: ,???????????????????????????finalize: ,???????????????????????????scope: ,???????????????????????????jsMode: ,???????????????????????????verbose: ,???????????????????????????bypassDocumentValidation: ?????????????????????????}???????????????????????)參數說明:
- map:映射函數(生成鍵值對序列,作為reduce函數參數)
- reduce:統計函數
- query:目標記錄過濾
- sort:目標記錄排序
- limit:限制目標記錄數量
- out:統計結果存放集合(不指定使用臨時集合,在客戶端斷開后自動刪除)
- finalize:最終處理函數(對 reduce 返回結果進行最終整理后存入結果集合)
- Scope:向map、reduce、finalize導入外部變量
- jsMode說明:為 false 時 BSON-->JS-->map-->BSON-->JS-->reduce-->BSON,可處理非常大的mapreduce,為 true 時 BSON-->js-->map-->reduce-->BSON
- verbose:顯示詳細的時間統計信息
于是,我讓小張同學把 group 換成 mapReduce 函數,問題解決!
String reducef = "function(key,values){var total = {count:0};for(var i=0;i mrr = readMongoTemplate.mapReduce(query,readMongoTemplate.getCollectionName(ReadingEntity.class), map, reducef, BasicDBObject.class);問題總結
有時候,問題就出在最顯眼的問題描述上,需要有心人去細細琢磨。
另外,其實大部分問題都可以在官網上找到相關技術解決方案,卻又苦于受英語單詞的折磨。。。
參考
https://docs.mongodb.com/manual/aggregation/
https://docs.mongodb.com/manual/core/map-reduce-sharded-collections/
https://www.cnblogs.com/chenpingzhao/p/7913247.html
https://blog.csdn.net/weixin_42582592/article/details/83080900
https://blog.csdn.net/iteye_19607/article/details/82644559
總結
以上是生活随笔為你收集整理的limit mongodb 聚合_MongoDB 统计 group 操作用不了,试试 mapReduce 吧的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 开封关于启动重污染天气黄色预警Ⅲ级响应的
- 下一篇: 交行普通卡升级白金卡额度会不会涨