亿级别记录的mongodb分页查询java代码实现
1.準備環境
1.1 mongodb下載
1.2 mongodb啟動
? C:\mongodb\bin\mongod --dbpath D:\mongodb\data
1.3 可視化mongo工具Robo 3T下載
2.準備數據
<dependency><groupId>org.mongodb</groupId><artifactId>mongo-java-driver</artifactId><version>3.6.1</version></dependency>
java代碼執行
public static void main(String[] args) {try {/**** Connect to MongoDB ****/// Since 2.10.0, uses MongoClientMongoClient mongo = new MongoClient("localhost", 27017);/**** Get database ****/// if database doesn't exists, MongoDB will create it for youDB db = mongo.getDB("www");/**** Get collection / table from 'testdb' ****/// if collection doesn't exists, MongoDB will create it for youDBCollection table = db.getCollection("person");/**** Insert ****/// create a document to store key and valueBasicDBObject document=null;for(int i=0;i<100000000;i++) {document = new BasicDBObject();document.put("name", "mkyong"+i);document.put("age", 30);document.put("sex", "f");table.insert(document);}/**** Done ****/System.out.println("Done");} catch (UnknownHostException e) {e.printStackTrace();} catch (MongoException e) {e.printStackTrace();}}3.分頁查詢
傳統的limit方式當數據量較大時查詢緩慢,不太適用。考慮別的方式,參考了logstash-input-mongodb的思路:
publicdef get_cursor_for_collection(mongodb, mongo_collection_name, last_id_object, batch_size)collection = mongodb.collection(mongo_collection_name)# Need to make this sort by date in object id then get the first of the series# db.events_20150320.find().limit(1).sort({ts:1})return collection.find({:_id => {:$gt => last_id_object}}).limit(batch_size)endcollection_name = collection[:name]@logger.debug("collection_data is: #{@collection_data}")last_id = @collection_data[index][:last_id]#@logger.debug("last_id is #{last_id}", :index => index, :collection => collection_name)# get batch of events starting at the last_place if it is set last_id_object = last_idif since_type == 'id'last_id_object = BSON::ObjectId(last_id)elsif since_type == 'time'if last_id != ''last_id_object = Time.at(last_id)endendcursor = get_cursor_for_collection(@mongodb, collection_name, last_id_object, batch_size)使用java實現
import java.net.UnknownHostException; import java.util.List;import org.bson.types.ObjectId;import com.mongodb.BasicDBObject; import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.DBCursor; import com.mongodb.DBObject; import com.mongodb.MongoClient; import com.mongodb.MongoException;public class Test {public static void main(String[] args) {int pageSize=50000;try {/**** Connect to MongoDB ****/// Since 2.10.0, uses MongoClientMongoClient mongo = new MongoClient("localhost", 27017);/**** Get database ****/// if database doesn't exists, MongoDB will create it for youDB db = mongo.getDB("www");/**** Get collection / table from 'testdb' ****/// if collection doesn't exists, MongoDB will create it for youDBCollection table = db.getCollection("person");DBCursor dbObjects; Long cnt=table.count();//System.out.println(table.getStats());Long page=getPageSize(cnt,pageSize);ObjectId lastIdObject=new ObjectId("5bda8f66ef2ed979bab041aa");for(Long i=0L;i<page;i++) {Long start=System.currentTimeMillis();dbObjects=getCursorForCollection(table, lastIdObject, pageSize);System.out.println("第"+(i+1)+"次查詢,耗時:"+(System.currentTimeMillis()-start)/1000+"秒");List<DBObject> objs=dbObjects.toArray();lastIdObject=(ObjectId) objs.get(objs.size()-1).get("_id");} } catch (UnknownHostException e) {e.printStackTrace();} catch (MongoException e) {e.printStackTrace();}}public static DBCursor getCursorForCollection(DBCollection collection,ObjectId lastIdObject,int pageSize) {DBCursor dbObjects=null;if(lastIdObject==null) {lastIdObject=(ObjectId) collection.findOne().get("_id"); //TODO 排序sort取第一個,否則可能丟失數據}BasicDBObject query=new BasicDBObject();query.append("_id",new BasicDBObject("$gt",lastIdObject));BasicDBObject sort=new BasicDBObject();sort.append("_id",1);dbObjects=collection.find(query).limit(pageSize).sort(sort);return dbObjects;}public static Long getPageSize(Long cnt,int pageSize) {return cnt%pageSize==0?cnt/pageSize:cnt/pageSize+1;}}4.一些經驗教訓
1. 不小心漏打了一個$符號,導致查詢不到數據,浪費了一些時間去查找原因
query.append("_id",new BasicDBObject("$gt",lastIdObject));2.創建索引
創建普通的單列索引:db.collection.ensureIndex({field:1/-1});? 1是升續 -1是降續
實例:db.articles.ensureIndex({title:1}) //注意 field 不要加""雙引號,否則創建不成功
查看當前索引狀態: db.collection.getIndexes();
實例:
db.articles.getIndexes();
刪除單個索引db.collection.dropIndex({filed:1/-1});
? ? ? 3.執行計劃
? db.student.find({"name":"dd1"}).explain()
?參考文獻:
【1】https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb
【2】https://www.cnblogs.com/yxlblogs/p/4930308.html
【3】https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/
轉載于:https://www.cnblogs.com/davidwang456/p/9890377.html
總結
以上是生活随笔為你收集整理的亿级别记录的mongodb分页查询java代码实现的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: gradle本地、远程仓库配置--转
- 下一篇: Elasticsearch使用BulkP