當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

Elasticsearch(三) Python 使用 elasticsearch 的基本操作

發(fā)布時間：2023/12/8 python 43 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elasticsearch(三) Python 使用 elasticsearch 的基本操作小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

參考文章：https://cuiqingcai.com/6214.html

一. python 安裝 elasticsearch標(biāo)準(zhǔn)庫

1. pip install elasticsearch

2. 中文分詞插件：

? ? ? ?elasticsearch默認(rèn)是英文分詞器，所以我們需要安裝一個中文分詞插件?elasticsearch-analysis-ik （注意和elasticsearch的版本對應(yīng)），安裝之后重新啟動 Elasticsearch 自動加載安裝好的插件 :

elasticsearch-plugin?install?https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.5.4/elasticsearch-analysis-ik-6.5.4.zip??

（這里的版本號請?zhí)鎿Q成你的 Elasticsearch 的版本號。）

二.?elasticsearch 相關(guān)概念

Elasticsearch 基本的概念，如節(jié)點、索引、文檔等等

? ? ? ??

1. Node &?Cluster	單個 Elasticsearch 實例稱為一個節(jié)點（Node）； ?一組節(jié)點構(gòu)成一個集群（Cluster）。
2. Index	Elasticsearch 數(shù)據(jù)管理的頂層單位就叫做 Index（索引）；相當(dāng)于 MySQL、MongoDB 等里面的數(shù)據(jù)庫的概念；注意：每個 Index （即數(shù)據(jù)庫）的名字必須是小寫。
3. Document	Index 里面單條的記錄稱為 Document（文檔）； Document 使用 JSON 格式表示；同一個 Index 的Document，不要求有相同的結(jié)構(gòu)（scheme），但最好保持相同，有利于提高搜索效率。
4.?Type	Document 可以分組，這種分組就叫做 Type；它是虛擬的邏輯分組，用來過濾 Document，類似 MySQL 中的數(shù)據(jù)表，MongoDB 中的 Collection；不同的 Type 應(yīng)有相似的結(jié)構(gòu)。（根據(jù)規(guī)劃 Elastic 6.x 版只允許每個 Index 包含一個 Type，7.x 版將會移除 Type。）
5.Fields	即字段，每個 Document 都類似一個 JSON 結(jié)構(gòu)，它包含了許多字段，每個字段都有其對應(yīng)的值； ? ? ? ? ? ? ? ? ? ? ?可以類比 MySQL 數(shù)據(jù)表中的字段。

?

二. python 操作 elasticsearch

1. 創(chuàng)建 Index? --? es.indices.create(index=' ')

即字段，每個 Document 都類似一個 JSON 結(jié)構(gòu)，它包含了許多字段，每個字段都有其對應(yīng)的值；

? ? ? ? ? ? ? ? ? ? ?可以類比 MySQL 數(shù)據(jù)表中的字段。

2.?刪除 Index? --? es.indices.delete(index='news')

result = es.indices.delete(index='news', ignore=[400, 404]) print(result)

3.?插入數(shù)據(jù)? --? es.create()? &? es.index()

es.indices.create(index='news', ignore=400)data = {'title': '美國留給伊拉克的是個爛攤子嗎', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm'}# 方法一：es.create() 手動指定 id 唯一標(biāo)識 result = es.create(index='news', doc_type='politics', id=1, body=data) print(result)# 方法二：es.index() 自動生成id es.index(index='news', doc_type='politics', body=data)

4.?更新數(shù)據(jù)

data = {'title': '美國留給伊拉克的是個爛攤子嗎','url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm','date': '2011-12-16' } result = es.update(index='news', doc_type='politics', body=data, id=1) print(result)# 第二種方法：index -- 數(shù)據(jù)不存在，增加; 如果已經(jīng)存在，更新 es.index(index='news', doc_type='politics', body=data, id=1)

5.?刪除數(shù)據(jù)

# delete -- 指定對應(yīng)的id result = es.delete(index='news', doc_type='politics', id=1) print(result)

6.查詢數(shù)據(jù) --?優(yōu)勢：其異常強大的檢索功能

新建一個索引并指定需要分詞的字段, 更新?mapping 信息

from?elasticsearch?import?Elasticsearches?=?Elasticsearch() mapping?=?{'properties':?{'title':?{'type':?'text','analyzer':?'ik_max_word','search_analyzer':?'ik_max_word'}} } es.indices.delete(index='news',?ignore=[400,?404]) es.indices.create(index='news',?ignore=400)# 設(shè)置mapping 信息：指定字段的類型 type 為 text，分詞器 analyzer 和搜索分詞器 search_analyzer 為 ik_max_word，即中文分詞插件，默認(rèn)的英文分詞器。 result?=?es.indices.put_mapping(index='news',?doc_type='politics',?body=mapping) print(result)

插入幾條新的數(shù)據(jù)

datas = [{'title': '美國留給伊拉克的是個爛攤子嗎','url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm','date': '2011-12-16'},{'title': '公安部：各地校車將享最高路權(quán)','url': 'http://www.chinanews.com/gn/2011/12-16/3536077.shtml','date': '2011-12-16'},{'title': '中韓漁警沖突調(diào)查：韓警平均每天扣1艘中國漁船','url': 'https://news.qq.com/a/20111216/001044.htm','date': '2011-12-17'},{'title': '中國駐洛杉磯領(lǐng)事館遭亞裔男子槍擊嫌犯已自首','url': 'http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml','date': '2011-12-18'} ]for data in datas:es.index(index='news', doc_type='politics', body=data)

查詢? --??根據(jù)關(guān)鍵詞查詢一下相關(guān)內(nèi)容

result = es.search(index='news', doc_type='politics') print(result) # 返回所有結(jié)果

檢索 -- 全文檢索

# 使用 DSL 語句來進行查詢： match 指定全文檢索，檢索字段 title，檢索內(nèi)容 “中國領(lǐng)事館” dsl = {'query': {'match': {'title': '中國領(lǐng)事館'}} }es = Elasticsearch() result = es.search(index='news', doc_type='politics', body=dsl) print(json.dumps(result, indent=2, ensure_ascii=False))

? ? ? ? ?返回的檢索結(jié)果有兩條，第一條的分?jǐn)?shù)為 2.54，第二條的分?jǐn)?shù)為 0.28。這是因為第一條匹配的數(shù)據(jù)中含有“中國”和“領(lǐng)事館”兩個詞，第二條匹配的數(shù)據(jù)中不包含“領(lǐng)事館”，但是包含了“中國”這個詞，所以也被檢索出來了，但是分?jǐn)?shù)比較低。

? ? ? ? 檢索結(jié)果會按照檢索關(guān)鍵詞的相關(guān)性進行排序，這就是一個基本的搜索引擎雛形。

======? ? ? ? ?拓展 -- 高級查詢? ? ? ======

# ElasticSearch Search apis 1. query: 條件查詢 --> match: 分詞查詢，評分機制打分； term: 不分詞查詢；Bool: 真值查詢,通常和must/should/mustnot一起組合； range: 指定字段在某個特定范圍，然后查詢match_phrase: 查詢指定段落？； 2. size: 輸出的數(shù)據(jù)條數(shù) 3. sort: 指定字段排序顯示 4. _source: 指定輸出的字段 5. from: 開始的偏移量 6. to"：結(jié)束位置 7. aggs: 聚合復(fù)雜查詢 8. scrapt_fields: 腳本運算查詢?

.....

參考文章：https://cuiqingcai.com/6214.html

相關(guān)拓展：https://cuiqingcai.com/6255.html

? ? ? ? ? ? ? ? ?https://elasticsearch-py.readthedocs.io/en/master/

------------- END --------------

總結(jié)

以上是生活随笔為你收集整理的Elasticsearch(三) Python 使用 elasticsearch 的基本操作的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：分析11.2.0.3 rac CRS-1
下一篇： JDK与JER的区别