當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elasticsearch分页查询Fromamp;Size VS scroll

發(fā)布時間：2024/1/23 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elasticsearch分页查询Fromamp;Size VS scroll 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

對于ES來說，按照一般的查詢流程來說，如果我想查詢數(shù)據(jù)：

1 客戶端請求發(fā)給某個節(jié)點
2 節(jié)點轉(zhuǎn)發(fā)給個個分片，查詢每個分片上的前10條
3 結(jié)果返回給節(jié)點，整合數(shù)據(jù)，提取前10條
4 返回給請求客戶端

這時，你查詢的的數(shù)據(jù)可以獲取整個條數(shù)，但是返回的只是默認的10條，所以這個時候就需要考慮使用分頁查詢。

對于數(shù)據(jù)量，博主在800萬條的時候，用From&Size也是沒有問題的，但是博主有一個操作需要查詢一個大概1億7千萬條的數(shù)據(jù)，這個時候用From&Size在2千萬條的時候就會出錯，后來查了一下From&Size在大數(shù)據(jù)量下性能下降的厲害，導致一些錯誤出現(xiàn)，所以本博主推薦，能用scroll就用scroll。

下面給出2中使用方式的java代碼：

首先呢，需要在java中引入elasticsearch-jar，比如使用maven：

<dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>2.3.2</version> </dependency>

然后初始化一個client對象：

private static TransportClient client;private static String INDEX = "index_name";private static String TYPE = "type_name";public static TransportClient init(){Settings settings = ImmutableSettings.settingsBuilder().put("client.transport.sniff", true).put("cluster.name", "cluster_name").build();client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress("localhost",9300));return client;}public static void main(String[] args) {TransportClient client = init();//這樣就可以使用client執(zhí)行查詢了}

然后就是創(chuàng)建兩個查詢過程了，下面是from-size分頁的執(zhí)行代碼：

System.out.println("from size 模式啟動！"); Date begin = new Date(); long count = client.prepareCount(INDEX).setTypes(TYPE).execute().actionGet().getCount(); SearchRequestBuilder requestBuilder = client.prepareSearch(INDEX).setTypes(TYPE).setQuery(QueryBuilders.matchAllQuery()); for(int i=0,sum=0; sum<count; i++){SearchResponse response = requestBuilder.setFrom(i).setSize(50000).execute().actionGet();sum += response.getHits().hits().length;System.out.println("總量"+count+" 已經(jīng)查到"+sum); } Date end = new Date(); System.out.println("耗時: "+(end.getTime()-begin.getTime()));

下面是scroll分頁的執(zhí)行代碼，注意啊！scroll里面的size是相對于每個分片來說的，所以實際返回的數(shù)量是：分片的數(shù)量*size

System.out.println("scroll 模式啟動！"); begin = new Date(); SearchResponse scrollResponse = client.prepareSearch(INDEX).setSearchType(SearchType.SCAN).setSize(10000).setScroll(TimeValue.timeValueMinutes(1)) .execute().actionGet(); count = scrollResponse.getHits().getTotalHits();//第一次不返回數(shù)據(jù) for(int i=0,sum=0; sum<count; i++){scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId()) .setScroll(TimeValue.timeValueMinutes(8)) .execute().actionGet();sum += scrollResponse.getHits().hits().length;System.out.println("總量"+count+" 已經(jīng)查到"+sum); } end = new Date(); System.out.println("耗時: "+(end.getTime()-begin.getTime()));

在這里值得一提的是：ES的CURD操作，如果單條數(shù)據(jù)大量數(shù)據(jù)效率一般都比較低，所以要使用bulk操作，例如如下操作：

public static void updateHourByScroll(String Type) throws IOException {System.out.println("scroll 模式啟動！");Date begin = new Date();SearchResponse scrollResponse = client.prepareSearch(Index).setTypes(TYPE).setSearchType(SearchType.SCAN).setSize(5000).setScroll(TimeValue.timeValueMinutes(1)).execute().actionGet(); long count = scrollResponse.getHits().getTotalHits();//第一次不返回數(shù)據(jù) for(int i=0,sum=0; sum<count; i++){ scrollResponse = client.prepareSearchScroll(scrollResponse.getScrollId()) .setScroll(TimeValue.timeValueMinutes(8)) .execute().actionGet(); sum += scrollResponse.getHits().hits().length; SearchHits searchHits = scrollResponse.getHits(); List<UpdateRequest> list = new ArrayList<UpdateRequest>(); for (SearchHit hit : searchHits) { String id = hit.getId(); Map<String, Object> source = hit.getSource(); Integer year = Integer.valueOf(source.get("Year").toString()); Integer month = Integer.valueOf(source.get("Mon").toString()); Integer day = Integer.valueOf(source.get("Day").toString()); Integer hour = Integer.valueOf(source.get("Hour").toString()); String time = getyear_month_day_hour(year, month, day, hour); System.out.println(time); UpdateRequest uRequest = new UpdateRequest() .index(Index) .type(Type) .id(id) .doc(jsonBuilder().startObject().field("TimeFormat", time).endObject()); list.add(uRequest); } // 批量執(zhí)行 BulkRequestBuilder bulkRequest = client.prepareBulk(); for (UpdateRequest uprequest : list) { bulkRequest.add(uprequest); } BulkResponse bulkResponse = bulkRequest.execute().actionGet(); if (bulkResponse.hasFailures()) { System.out.println("批量錯誤！"); } System.out.println("總量" + count + " 已經(jīng)查到" + sum); } Date end = new Date(); System.out.println("耗時: "+(end.getTime()-begin.getTime())); }

總結(jié)

以上是生活随笔為你收集整理的Elasticsearch分页查询Fromamp;Size VS scroll的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Java中HashMap、LinkedH
下一篇： Elasticsearch新增一个字段并