Elasticsearch分页查询Fromamp;Size VS scroll
對于ES來說,按照一般的查詢流程來說,如果我想查詢數(shù)據(jù):
- 1 客戶端請求發(fā)給某個節(jié)點
- 2 節(jié)點轉(zhuǎn)發(fā)給個個分片,查詢每個分片上的前10條
- 3 結(jié)果返回給節(jié)點,整合數(shù)據(jù),提取前10條
- 4 返回給請求客戶端
這時,你查詢的的數(shù)據(jù)可以獲取整個條數(shù),但是返回的只是默認的10條,所以這個時候就需要考慮使用分頁查詢。
對于數(shù)據(jù)量,博主在800萬條的時候,用From&Size也是沒有問題的,但是博主有一個操作需要查詢一個大概1億7千萬條的數(shù)據(jù),這個時候用From&Size在2千萬條的時候就會出錯,后來查了一下From&Size在大數(shù)據(jù)量下性能下降的厲害,導致一些錯誤出現(xiàn),所以本博主推薦,能用scroll就用scroll。
下面給出2中使用方式的java代碼:
首先呢,需要在java中引入elasticsearch-jar,比如使用maven:
<dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>2.3.2</version> </dependency>然后初始化一個client對象:
private static TransportClient client;private static String INDEX = "index_name";private static String TYPE = "type_name";public static TransportClient init(){Settings settings = ImmutableSettings.settingsBuilder().put("client.transport.sniff", true).put("cluster.name", "cluster_name").build();client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress("localhost",9300));return client;}public static void main(String[] args) {TransportClient client = init();//這樣就可以使用client執(zhí)行查詢了}然后就是創(chuàng)建兩個查詢過程了 ,下面是from-size分頁的執(zhí)行代碼:
System.out.println("from size 模式啟動!"); Date begin = new Date(); long count = client.prepareCount(INDEX).setTypes(TYPE).execute().actionGet().getCount(); SearchRequestBuilder requestBuilder = client.prepareSearch(INDEX).setTypes(TYPE).setQuery(QueryBuilders.matchAllQuery()); for(int i=0,sum=0; sum<count; i++){SearchResponse response = requestBuilder.setFrom(i).setSize(50000).execute().actionGet();sum += response.getHits().hits().length;System.out.println("總量"+count+" 已經(jīng)查到"+sum); } Date end = new Date(); System.out.println("耗時: "+(end.getTime()-begin.getTime()));下面是scroll分頁的執(zhí)行代碼,注意啊!scroll里面的size是相對于每個分片來說的,所以實際返回的數(shù)量是:分片的數(shù)量*size
System.out.println("scroll 模式啟動!"); begin = new Date(); SearchResponse scrollResponse = client.prepareSearch(INDEX).setSearchType(SearchType.SCAN).setSize(10000).setScroll(TimeValue.timeValueMinutes(1)) .execute().actionGet(); count = scrollResponse.getHits().getTotalHits();//第一次不返回數(shù)據(jù) for(int i=0,sum=0; sum<count; i++){scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId()) .setScroll(TimeValue.timeValueMinutes(8)) .execute().actionGet();sum += scrollResponse.getHits().hits().length;System.out.println("總量"+count+" 已經(jīng)查到"+sum); } end = new Date(); System.out.println("耗時: "+(end.getTime()-begin.getTime()));在這里值得一提的是:ES的CURD操作,如果單條數(shù)據(jù)大量數(shù)據(jù)效率一般都比較低,所以要使用bulk操作,例如如下操作:
public static void updateHourByScroll(String Type) throws IOException {System.out.println("scroll 模式啟動!");Date begin = new Date();SearchResponse scrollResponse = client.prepareSearch(Index).setTypes(TYPE).setSearchType(SearchType.SCAN).setSize(5000).setScroll(TimeValue.timeValueMinutes(1)).execute().actionGet(); long count = scrollResponse.getHits().getTotalHits();//第一次不返回數(shù)據(jù) for(int i=0,sum=0; sum<count; i++){ scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId()) .setScroll(TimeValue.timeValueMinutes(8)) .execute().actionGet(); sum += scrollResponse.getHits().hits().length; SearchHits searchHits = scrollResponse.getHits(); List<UpdateRequest> list = new ArrayList<UpdateRequest>(); for (SearchHit hit : searchHits) { String id = hit.getId(); Map<String, Object> source = hit.getSource(); Integer year = Integer.valueOf(source.get("Year").toString()); Integer month = Integer.valueOf(source.get("Mon").toString()); Integer day = Integer.valueOf(source.get("Day").toString()); Integer hour = Integer.valueOf(source.get("Hour").toString()); String time = getyear_month_day_hour(year, month, day, hour); System.out.println(time); UpdateRequest uRequest = new UpdateRequest() .index(Index) .type(Type) .id(id) .doc(jsonBuilder().startObject().field("TimeFormat", time).endObject()); list.add(uRequest); } // 批量執(zhí)行 BulkRequestBuilder bulkRequest = client.prepareBulk(); for (UpdateRequest uprequest : list) { bulkRequest.add(uprequest); } BulkResponse bulkResponse = bulkRequest.execute().actionGet(); if (bulkResponse.hasFailures()) { System.out.println("批量錯誤!"); } System.out.println("總量" + count + " 已經(jīng)查到" + sum); } Date end = new Date(); System.out.println("耗時: "+(end.getTime()-begin.getTime())); }?
總結(jié)
以上是生活随笔為你收集整理的Elasticsearch分页查询Fromamp;Size VS scroll的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Java中HashMap、LinkedH
- 下一篇: Elasticsearch新增一个字段并