當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

02.elasticsearch-meta-field元字段

發布時間：2024/2/28 编程问答 32 豆豆

生活随笔收集整理的這篇文章主要介紹了 02.elasticsearch-meta-field元字段小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

- 1. meta-fields
- 2. 部分meta field詳述
- - 1. _index: 索引名
  - 2. _type: type名
  - 3. _id: doc的id
  - 4. _source : doc文檔的原始json
  - 5. _size: _source的字節長度
  - 6. _field_names: 當前doc中的所有非空字段
  - 7. _ignored: 這個記錄了字段設置忽略格式錯誤的字段之后被ignore的情況
  - 8. _routing: 設置routing value
  - - 1. routing的作用
    - 2. _routing的使用方式
    - 3. 可以設置強制使用routing
    - 4. 使用uniq_id作為自定義routing
  - 9. _meta: application 識別的專用的meta data

1. meta-fields

es index中的元數據主要由這些

_index: 索引名

_type: type名這個在6.0中已經過期了。

_id: doc的id

_source : doc文檔的原始json

_size: _source的字節長度

_field_names: doc中的所有字段（只要某個字段再任意一篇文檔中有非null值就算）

_ignored: 這個記錄了字段設置忽略格式錯誤的字段之后被ignore的情況

_routing: 設置routing value

_meta: application 識別的專用的meta data

2. 部分meta field詳述

1. _index: 索引名

_index字段是一個虛擬的字段,他是es自己存儲的，并沒有存儲在lucene當中。
所以你可以用_index字段做一些term查詢，或者是一些可以被重寫為term查詢的查詢，比如match,query_string,simple_query_string query 等。
但是不能支持正則，前綴查詢等模糊匹配。

使用樣例

GET _search {"size": 10, "aggs": {"indices": {"terms": {"field": "_index", "size": 20}}},"sort": [{"_index": { "order": "asc"}}],"script_fields": {"index_name": {"script": {"source": "doc['_index']", "lang": "painless"}}} }

2. _type: type名

在6.0中已經過期了

3. _id: doc的id

這個 _id字段是es存儲在每個doc中添加的一個字段_id，這個_id字段存儲實際上只存儲了倒排字典索引，沒有存儲doc-values,所以不太適合用來做agg,sort,會在內存中暫用大量空間。
如果想使用_id字段做agg,sort 建議再建一個字段存儲_id的值，并開啟doc-values 設置。
詳細原理參考這里
_id字段值適合直接通過id來查詢的場景。

PUT my_index/_doc/1 {"text": "Document with ID 1" }PUT my_index/_doc/2?refresh=true {"text": "Document with ID 2" }GET my_index/_search {"query": {"terms": {"_id": [ "1", "2" ] }} }

4. _source : doc文檔的原始json

Elasticsearch中_source字段的主要目的是通過doc_id讀取該文檔的原始內容，所以只需要存儲Store即可。他是query,get 查詢時默認返回的字段。

_source其實是名為_source的虛擬Store Field。

Elasticsearch中使用_source字段可以實現以下功能：

Update：部分更新時，需要從讀取文檔保存在_source字段中的原文，然后和請求中的部分字段合并為一個完整文檔。如果沒有_source，則不能完成部分字段的Update操作。

Rebuild：最新的版本中新增了rebuild接口，可以通過Rebuild API完成索引重建，過程中不需要從其他系統導入全量數據，而是從當前文檔的_source中讀取。如果沒有_source，則不能使用Rebuild API。

Script：不管是Index還是Search的Script，都可能用到存儲在Store中的原始內容，如果禁用了_source，則這部分功能不再可用。

Summary：摘要信息也是來源于_source字段。

看到這里，突然想到 doc_values和 store之間是不是會有區別呢
看到了在es的社區中有這樣的答案
簡而言之，兩者提供的支持都是通過doc_id 找到對應的field的內容。
doc_values是為了快速訪問而設計的，適用于sort,agg等場景中使用，因為這些查詢中需要獲取大量的某個field的值，所以速度必須快。
store則是為了優化field的存儲而設計的，所以他的訪問性能沒有doc_values的好，但是存儲上優化比較多，適合存儲大量內容，他是用來在query匹配結束后用來返回給client結果之前獲取對應的field的值，這個時候對應的結果篩選已經結束，相對對filed的訪問量很小。
所以一般情況下是盡量不要在query中對store field執行script的(雖然是可以執行的)，因為這樣會比較慢。

同時，話說回來，_source 存儲了所有的文檔內容，在磁盤容量允許的情況下，建議不要關閉_source

PUT tweets {"mappings": {"_source": {"enabled": false}} }

同時，你也可以設置include,excluds設置 _source 不存儲哪些數據

PUT logs {"mappings": {"_source": {"includes": ["*.count","meta.*"],"excludes": ["meta.description","meta.other.*"]}} }PUT logs/_doc/1 {"requests": {"count": 10,"foo": "bar" },"meta": {"name": "Some metric","description": "Some metric description", "other": {"foo": "one", "baz": "two" }} }GET logs/_search {"query": {"match": {"meta.other.foo": "one" }} }

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 0.2876821,"hits" : [{"_index" : "logs","_type" : "_doc","_id" : "1","_score" : 0.2876821,"_source" : {"meta" : {"other" : { },"name" : "Some metric"},"requests" : {"count" : 10}}}]} }

5. _size: _source的字節長度

_size 存儲了 _source的字節長度
這個需要安裝 mapper-size plugin

sudo bin/elasticsearch-plugin install mapper-size

使用樣例

PUT my_index {"mappings": {"_size": {"enabled": true}} }PUT my_index/_doc/1 {"text": "This is a document" }PUT my_index/_doc/2 {"text": "This is another document" }GET my_index/_search {"query": {"range": {"_size": { "gt": 10}}},"aggs": {"sizes": {"terms": {"field": "_size", "size": 10}}},"sort": [{"_size": { "order": "desc"}}],"script_fields": {"size": {"script": "doc['_size']" }} }

6. _field_names: 當前doc中的所有非空字段

是用來支持 exist查詢的，但是對于有doc_values或者norm的字段，exist查詢不會再使用_field_names

7. _ignored: 這個記錄了字段設置忽略格式錯誤的字段之后被ignore的情況

這個字段主要是配合 ignore_malformed 這個mapping param 使用
ignore_malformed 會讓一個和mapping中定義的type不一致的數據也寫入到索引中去
對于一個field，如果設置了 "ignore_malformed": true, 則對于那些類型不符的數據只會存儲到_source中，但是不會進行index，這樣的話整個doc還是正常寫入的，只有當前的field無法進行搜索。

如果一個doc因為某個field觸發了ignore_malformed 規則，會產生相應的標記,對應的doc中會被添加進去一個 _ignore的field，然后就可以根據這個來查找了。

PUT my_index {"mappings" : {"properties" : {"requests" : {"type" : "text"},"number":{"type": "integer","ignore_malformed": true}}} }PUT my_index/_doc/1 {"requests":"hahaha","number":"5"}PUT my_index/_doc/2 {"requests":"hahaha","number":6}PUT my_index/_doc/3 {"requests":"hahaha","number":"a"}

這三個doc的寫入都不會報錯，第一個會按照數字存儲。

然后可以進行下面的查詢

GET my_index/_search {"query": {"exists": {"field": "_ignored"}} }返回{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "3","_score" : 1.0,"_ignored" : ["number"],"_source" : {"requests" : "hahaha","number" : "a"}}]} }

可以看到該文檔有一個_ignored 字段
即使通過match_all的查詢也是可以看到這個字段的。只有那些有malformed 字段的才會有這個字段。

8. _routing: 設置routing value

1. routing的作用

這個字段很重要，每個doc會被發送到哪個shard其實都是有計算規則的。
默認的計算規則為

routing_factor = num_routing_shards / num_primary_shards shard_num = (hash(_routing) % num_routing_shards) / routing_factor

num_routing_shards is the value of the index.number_of_routing_shards index setting.
num_primary_shards is the value of the index.number_of_shards index setting.

_routing 的默認值是doc 的 _id 字段,也就是每個doc進來之后默認是按照id進行路由分配到不同的shard的。
你也可以自己制定路由的方式。

2. _routing的使用方式

在index,get,update,delete 的時候都可以指定_routing規則。

PUT my_index/_doc/1?routing=user1&refresh=true {"title": "This is a document" }GET my_index/_doc/1?routing=user1GET my_index/_search?routing=user1,user2 {"query": {"match": {"title": "document"}} }

在query中是可以使用_routing字段的

GET my_index/_search {"query": {"terms": {"_routing": [ "user1" ] }} }

3. 可以設置強制使用routing

PUT my_index2 {"mappings": {"_routing": {"required": true }} }put操作會報錯PUT my_index2/_doc/1 {"text": "No routing value provided" }

4. 使用uniq_id作為自定義routing

當你使用自定的_routing param進行index的時候，索引中所有的doc的_id字段是不做唯一保證的（只能在同一個shard上面保證），所以你要自己來控制id在索引中的唯一性。

這里里面可以回憶一下join field，父子文檔都要用相同的routing才行，因為要存儲到同一個分片當中。

9. _meta: application 識別的專用的meta data

_meta field對于es來說目前沒有任何地方用到，只是方便外部業務使用方來記錄一下自己的數據，而且這個數據只能存儲在mapping中，并不會存儲在每個doc當中。
比如下面。

PUT my_index {"mappings": {"_meta": { "class": "MyApp::User","version": {"min": "1.0","max": "1.3"}}} }GET my_index

總結

以上是生活随笔為你收集整理的02.elasticsearch-meta-field元字段的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 01.elasticsearch-map
下一篇： 03.elasticsearch-map