Flink集成Hudi实践
生活随笔
收集整理的這篇文章主要介紹了
Flink集成Hudi实践
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
本文基于Flink1.13.6搭建的Hudi數據湖,作為入門級別的嘗試,目前已安裝Hadoop基本環境。
1、組件版本
| 組件 | 版本 |
| Flink | 1.13.6_scala-2.12 |
| Hudi | 0.12.0 |
| Hadoop | 3.1.0 |
| mysql | 5.7.33 |
| kafka | 2.12-2.7.0 |
2、Flink環境配置
1)解壓Flink安裝包
tar -zxvf flink-1.13.6-bin-scala_2.12.tgz -C /home/opt/2) 修改flink-conf.xml配置
vim /home/opt/flink-1.13.6/conf/flink-conf.yamlclassloader.check-leaked-classloader: false taskmanager.numberOfTaskSlots: 4state.backend: rocksdb execution.checkpointing.interval: 30000 state.checkpoints.dir: hdfs://node01:9000/ckps state.backend.incremental: true3) 拷貝編譯好的hudi包到Flink的lib目錄
cp /opt/software/hudi-0.12.0/packaging/hudi-flink-bundle/target/hudi-flink1.13-bundle_2.12-0.12.0.jar /home/opt/flink-1.13.6/lib/# 拷貝guava包,解決依賴沖突 cp /opt/module/hadoop-3.1.3/share/hadoop/common/lib/guava-27.0-jre.jar /home/opt/flink-1.13.6/lib/3、sql-client 方式
3.1 local模式,啟動Flink
/home/opt/flink-1.13.6/bin/start-cluster.sh/home/opt/flink-1.13.6/bin/sql-client.sh embedded3.2 yarn-session模式
# 啟動yarn-session /home/opt/flink-1.13.6/bin/yarn-session.sh -d # 啟動sql-client /home/opt/flink-1.13.6/bin/sql-client.sh embedded -s yarn-session備注:此處需要拷貝hadoop-mapreduce-client-core-3.1.3.jar包,解決依賴問題。
cp /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.1.3.jar /home/opt/flink-1.13.6/lib/4、示例
插入數據
set sql-client.execution.result-mode=tableau;-- 創建hudi表 CREATE TABLE t1(uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,name VARCHAR(10),age INT,ts TIMESTAMP(3),`partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ('connector' = 'hudi','path' = 'hdfs://node01:8020/tmp/hudi_flink/t1','table.type' = 'MERGE_ON_READ' –- 默認是COW );-- 插入數據 INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');查詢數據
更新數據
?
?
總結
以上是生活随笔為你收集整理的Flink集成Hudi实践的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: mac自带邮箱添加邮箱_如何在Mac上的
- 下一篇: JAVA读取Excel表格,建数据库建表