當前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解

發布時間：2024/1/17 数据库 20 豆豆

生活随笔收集整理的這篇文章主要介紹了 elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

1.如何實現mysql與elasticsearch的數據同步？

逐條轉換為json顯然不合適，需要借助第三方工具或者自己實現。核心功能點：同步增、刪、改、查同步。

2、mysql與elasticsearch同步的方法有哪些？優缺點對比？

目前該領域比較牛的插件有：

1）、elasticsearch-jdbc，嚴格意義上它已經不是第三方插件。已經成為獨立的第三方工具。https://github.com/jprante/elasticsearch-jdbc?
2）、elasticsearch-river-mysql插件?https://github.com/scharron/elasticsearch-river-mysql?
3）、go-mysql-elasticsearch（國內作者siddontang）?https://github.com/siddontang/go-mysql-elasticsearch

1-3同步工具/插件對比：

go-mysql-elasticsearch仍處理開發不穩定階段。?
為什么選擇elasticsearch-jdbc而不是elasticsearch-river-mysql插件的原因？（參考：http://stackoverflow.com/questions/23658534/using-elasticsearch-river-mysql-to-stream-data-from-mysql-database-to-elasticsea）?
1）通用性角度：elasticsearch-jdbc更通用，?
2）版本更新角度：elasticsearch-jdbc GitHub活躍度很高，最新的版本2.3.3.02016年5月28日兼容Elasticsearch2.3.3版本。?
而elasticsearch-river-mysql 2012年12月13日后便不再更新。?
綜上，選擇elasticsearch-jdbc作為mysql同步Elasticsearch的工具理所當然。

elasticsearch-jdbc的缺點與不足（他山之石）：

1）、go-mysql-elasticsearch作者siddontang在博客提到的：?
elasticsearch-river-jdbc的功能是很強大，但并沒有很好的支持增量數據更新的問題，它需要對應的表只增不減，而這個幾乎在項目中是不可能辦到的。?
http://www.jianshu.com/p/05cff717563c?
2）、?
博主leotse90在博文中提到elasticsearch-jdbc的缺點：那就是刪除操作不能同步（物理刪除）！?
http://leotse90.com/2015/11/11/ElasticSearch與MySQL數據同步以及修改表結構/

我截止2016年6月16日沒有測試到，不妄加評論。

3、elasticsearch-jdbc如何使用？要不要安裝？

3.1 和早期版本不同點

elasticsearch-jdbcV2.3.2.0版本不需要安裝。以下筆者使用的elasticsearch也是2.3.2測試。?
操作系統：CentOS release 6.6 (Final)?
看到這里，你可能會問早期的版本有什么不同呢？很大不同。從我搜集資料來看，不同點如下：?
1）早期1.x版本，作為插件，需要安裝。?
2）配置也會有不同。

3.2 elasticsearch-jdbc使用(同步方法一）

前提：?
1）elasticsearch 2.3.2 安裝成功，測試ok。?
2）mysql安裝成功，能實現增、刪、改、查。?
可供測試的數據庫為test，表為cc，具體信息如下：

mysql> select * from cc; +----+------------+ | id | name | +----+------------+ | 1 | laoyang | | 2 | dluzhang | | 3 | dlulaoyang | +----+------------+ 3 rows in set (0.00 sec)

第一步：下載工具。?
址：http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.2.0/elasticsearch-jdbc-2.3.2.0-dist.zip?
第二步：導入Centos。路徑自己定，筆者放到根目錄下，解壓。unzip elasticsearch-jdbc-2.3.2.0-dist.zip?
第三步：設置環境變量。

[root@5b9dbaaa148a /]# vi /etc/profile?
export JDBC_IMPORTER_HOME=/elasticsearch-jdbc-2.3.2.0

使環境變量生效：?
[root@5b9dbaaa148a /]# source /etc/profile?
第四步：配置使用。詳細參考：https://github.com/jprante/elasticsearch-jdbc?
1）、根目錄下新建文件夾odbc_es 如下：?

[root@5b9dbaaa148a /]# ll /odbc_es/?
drwxr-xr-x 2 root root 4096 Jun 16 03:11 logs?
-rwxrwxrwx 1 root root 542 Jun 16 04:03 mysql_import_es.sh?

2）、新建腳本mysql_import_es.sh，內容如下；

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es.sh ’#!/bin/sh bin=$JDBC_IMPORTER_HOME/bin lib=$JDBC_IMPORTER_HOME/lib echo '{ "type" : "jdbc", "jdbc": { "elasticsearch.autodiscover":true, "elasticsearch.cluster":"my-application", #簇名，詳見：/usr/local/elasticsearch/config/elasticsearch.yml "url":"jdbc:mysql://10.8.5.101:3306/test", #mysql數據庫地址 "user":"root", #mysql用戶名 "password":"123456", #mysql密碼 "sql":"select * from cc", "elasticsearch" : {"host" : "10.8.5.101","port" : 9300 }, "index" : "myindex", #新的index "type" : "mytype" #新的type } }'| java \-cp "${lib}/*" \-Dlog4j.configurationFile=${bin}/log4j2.xml \org.xbib.tools.Runner \org.xbib.tools.JDBCImporter

3）、為 mysql_import_es.sh 添加可執行權限。?
[root@5b9dbaaa148a odbc_es]# chmod a+x mysql_import_es.sh?
4）執行腳本mysql_import_es.sh?
[root@5b9dbaaa148a odbc_es]# ./mysql_import_es.sh

第五步：測試數據同步是否成功。?
使用elasticsearch檢索查詢：

[root@5b9dbaaa148a odbc_es]# curl -XGET 'http://10.8.5.101:9200/myindex/mytype/_search?pretty' {"took" : 4,"timed_out" : false,"_shards" : {"total" : 8,"successful" : 8,"failed" : 0},"hits" : {"total" : 3,"max_score" : 1.0,"hits" : [ {"_index" : "myindex","_type" : "mytype","_id" : "AVVXKgeEun6ksbtikOWH","_score" : 1.0,"_source" : {"id" : 1,"name" : "laoyang"}}, {"_index" : "myindex","_type" : "mytype","_id" : "AVVXKgeEun6ksbtikOWI","_score" : 1.0,"_source" : {"id" : 2,"name" : "dluzhang"}}, {"_index" : "myindex","_type" : "mytype","_id" : "AVVXKgeEun6ksbtikOWJ","_score" : 1.0,"_source" : {"id" : 3,"name" : "dlulaoyang"}} ]} }

出現以上包含mysql數據字段的信息則為同步成功。

4、 elasticsearch-jdbc 同步方法二

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es_simple.sh #!/bin/sh bin=$JDBC_IMPORTER_HOME/bin lib=$JDBC_IMPORTER_HOME/libjava \-cp "${lib}/*" \-Dlog4j.configurationFile=${bin}/log4j2.xml \org.xbib.tools.Runner \org.xbib.tools.JDBCImporter statefile.json[root@5b9dbaaa148a odbc_es]# cat statefile.json { "type" : "jdbc", "jdbc": { "elasticsearch.autodiscover":true, "elasticsearch.cluster":"my-application", "url":"jdbc:mysql://10.8.5.101:3306/test", "user":"root", "password":"123456", "sql":"select * from cc", "elasticsearch" : {"host" : "10.8.5.101","port" : 9300 }, "index" : "myindex_2", "type" : "mytype_2" } }

腳本和json文件分開，腳本執行前先加載json文件。?
執行方式：直接運行腳本 ./mysql_import_es_simple.sh 即可。

5、Mysql與elasticsearch等價查詢

目標：實現從表cc中查詢id=3的name信息。?
1）MySQL中sql語句查詢：

mysql> select * from cc where id=3; +----+------------+ | id | name | +----+------------+ | 3 | dlulaoyang | +----+------------+ 1 row in set (0.00 sec)

2）elasticsearch檢索：

[root@5b9dbaaa148a odbc_es]# curl http://10.8.5.101:9200/myindex/mytype/_search?pretty -d ' { "filter" : { "term" : { "id" : "3" } } }' {"took" : 3,"timed_out" : false,"_shards" : {"total" : 8,"successful" : 8,"failed" : 0},"hits" : {"total" : 1,"max_score" : 1.0,"hits" : [ {"_index" : "myindex","_type" : "mytype","_id" : "AVVXKgeEun6ksbtikOWJ","_score" : 1.0,"_source" : {"id" : 3,"name" : "dlulaoyang"}} ]} }

常見錯誤：

錯誤日志位置：/odbc_es/logs?
日志內容：?
[root@5b9dbaaa148a logs]# tail -f jdbc.log?
[04:03:39,570][INFO ][org.xbib.elasticsearch.helper.client.BaseTransportClient][pool-3-thread-1] after auto-discovery connected to [{5b9dbaaa148a}{aksn2ErNRlWjUECnp_8JmA}{10.8.5.101}{10.8.5.101:9300}{master=true}]

Bug1、[02:46:23,894][ERROR][importer.jdbc ][pool-3-thread-1] error while processing request: cluster state is RED and not YELLOW, from here on, everything will fail!?
原因：?
you created an index with replicas but you had only one node in the cluster. One way to solve this problem is by allocating them on a second node. Another way is by turning replicas off.?
你創建了帶副本 replicas 的索引，但是在你的簇中只有一個節點。

解決方案：?
方案一：允許分配‘它們’到第二個節點。?
方案二：關閉副本replicas（非常可行）。如下：

curl -XPUT 'localhost:9200/_settings' -d ' {"index" : {"number_of_replicas" : 0} }

‘

Bug2、[13:00:37,137][ERROR][importer.jdbc ][pool-3-thread-1] error while processing request: no cluster nodes available, check settings {autodiscover=false, client.transport.ignore_cluster_name=false, client.transport.nodes_sampler_interval=5s, client.transport.ping_timeout=5s, cluster.name=elasticsearch,?
org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check?
解決方案：?
見上腳本中新增：?
“elasticsearch.cluster”:”my-application”, #簇名，和/usr/local/elasticsearch/config/elasticsearch.yml 簇名保持一致。

參考：?
http://stackoverflow.com/questions/11944915/getting-an-elasticsearch-cluster-to-green-cluster-setup-on-os-x

總結

以上是生活随笔為你收集整理的elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： ES-Hadoop学习之ES和HDFS数
下一篇： MySQL查询语句中的IN 和Exist