DataX使用入门
DataX 是阿里云 DataWorks數據集成 的開源版本,在阿里巴巴集團內被廣泛使用的離線數據同步工具/平臺。DataX 實現了包括 MySQL、Oracle、OceanBase、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、Hologres、DRDS 等各種異構數據源之間高效的數據同步功能。
一、datax需要python環境,需要先安裝python
打開官網 https://www.python.org/downloads/windows/ 下載中心
此處下載2.6.5版本安裝
安裝完成后使用python -V查看是否已安裝成功
二、下載datax
方法一、直接下載DataX工具包:DataX下載地址
http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
下載后解壓至本地某個目錄,進入bin目錄,即可運行同步作業:
$ cd {YOUR_DATAX_HOME}/bin $ python datax.py {YOUR_JOB.json}方法二、下載DataX源碼,自己編譯:DataX源碼
https://github.com/alibaba/DataX
datax的目錄結構
bin目錄下是pytho腳本文件,主要用來執行job文件(默認需要依賴Python2的環境,也可以修改為Python3)
conf目錄存放一些配置文件
job目錄下存放了一個job測試文件(我們通過datax-web生成的臨時job文件不會放在這里,而是在data-web里邊自己配置存放目錄)
lib是依賴的一些jar包
log目錄存放job文件的執行日志
plugin目錄存放的是對不同數據源讀取(Reader)和寫入(Writer)的插件支持
如果沒有在plugin目錄下發現自己需要的Reader或者Writer則需要自己手動安裝(比如ES的Reader和Writer)。
使用Datax執行job文件
python datax.py job文件txt文件傳向mysql的Job文件模板如下(Mysql為例):
{"job": {"content": [{"reader": {"name": "txtfilereader","parameter": {"column": [{"index": 0,"type": "long"},{"index": 1,"type": "string"},{"index": 2,"type": "string"},{"index": 3,"type": "string"},{"index": 4,"type": "string"},{"index": 5,"type": "string"},{"index": 6,"type": "string"},{"index": 7,"type": "string"},{"index": 8,"type": "string"},{"index": 9,"type": "date","format": "yyyy-MM-dd HH:mm:ss"},{"index": 10,"type": "string"},{"index": 11,"type": "date","format": "yyyy-MM-dd HH:mm:ss"},{"index": 12,"type": "long"}],"encoding": "UTF-8","fieldDelimiter": ",","path": ["C:/Users/jxk/Desktop/tst.txt"]}},"writer": {"name": "mysqlwriter","parameter": {"column": ["id","project_type","attach_type","attach_name","attach_url","attach_key","attach_hash","attach_size","created_by","created_date","last_updated_by","last_updated_date","version"],"connection": [{"jdbcUrl": "jdbc:mysql://8.68.24.3:3306/testkettle?characterEncoding=utf-8&serverTimezone=Asia/Shanghai","table": ["comm_attachment"]}],"password": "274100","preSql": ["delete from comm_attachment"],"session": [],"username": "root","writeMode": "insert"}}}],"setting": {"speed": {"channel": "5"}}} }C:/Users/jxk/Desktop/tst.txt文件內容如下
1,sunnyDay,image/png,ttt.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1A, tst,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0 2,sunnyDay,image/png,ttb.png,http://qyn6nlamm.hd-bkt.clouddn.com/Frv7wnlpCWpjlUq-qWFPrjQdm1A, tsb,Frv7wnlpCWpjlUq-qWFPrjQdm1AI,44kb,anonymous,2021-09-16 16:52:38,anonymous,2021-09-16 16:52:38,0數據庫建庫腳本如下
CREATE TABLE `comm_attachment` (`id` int NOT NULL AUTO_INCREMENT COMMENT '主鍵',`project_type` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '項目名-說明該附件是屬于哪個項目的',`attach_type` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件類型',`attach_name` varchar(200) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件名',`attach_url` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件下載地址',`attach_key` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件key',`attach_hash` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件hash',`attach_size` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '附件大小',`created_by` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '創建人',`created_date` timestamp NULL DEFAULT NULL COMMENT '創建時間',`last_updated_by` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT '最后修改人',`last_updated_date` timestamp NULL DEFAULT NULL COMMENT '最后修改時間',`version` int DEFAULT NULL COMMENT '樂觀鎖-版本號',PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_unicode_ci COMMENT='附件表'python執行語句
python datax.py C:\Users\jxk\Desktop\abc.json
執行結果:
在數據庫查看數據:
語句執行過程中可能遇到的問題:
問題描述:在使用Datax插件將數據從hive導入mysql時,發現寫入MySQL報錯 :Could not retrieve transation read-only status server
匹配數據庫和應用中數據庫驅動版本(mysql驅動版本不一致) ----
-查看MySQL版本:
-查看Datax插件MySQL驅動版本:
/datax/plugin/writer/mysqlwriter/libs$ ls mysql-connector* mysql-connector-java-5.1.34.jar下載對應的MySQL驅動版本:https://static.runoob.com/download/mysql-connector-java-8.0.16.jar
Illegalunsupported escape sequence near index 3
注意json文件中的路徑書寫
正確解析:
錯誤寫法:
C:\\Users\\jxk\\Desktop\\tst.txt總結
- 上一篇: 图片管理系统空间 php,自建图片网站
- 下一篇: 影视并购,是谁写的万能故事大纲?