當(dāng)前位置：首頁 > 运维知识 > 数据库 >内容正文

数据库

《从0到1学习Flink》—— Flink 读取 Kafka 数据批量写入到 MySQL

發(fā)布時間：2025/5/22 数据库 20 豆豆

生活随笔收集整理的這篇文章主要介紹了《从0到1学习Flink》—— Flink 读取 Kafka 数据批量写入到 MySQL 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

前言

之前其實在《從0到1學(xué)習(xí)Flink》—— 如何自定義 Data Sink ？文章中其實已經(jīng)寫了點(diǎn)將數(shù)據(jù)寫入到 MySQL，但是一些配置化的東西當(dāng)時是寫死的，不能夠通用，最近知識星球里有朋友叫我: 寫個從 kafka 中讀取數(shù)據(jù)，經(jīng)過 Flink 做個預(yù)聚合，然后創(chuàng)建數(shù)據(jù)庫連接池將數(shù)據(jù)批量寫入到 mysql 的例子。

于是才有了這篇文章，更多提問和想要我寫的文章可以在知識星球里像我提問，我會根據(jù)提問及時回答和盡可能作出文章的修改。

準(zhǔn)備

你需要將這兩個依賴添加到 pom.xml 中

<dependency><groupId>mysql</groupId><artifactId>mysql-connector-java</artifactId><version>5.1.34</version> </dependency>

讀取 kafka 數(shù)據(jù)

這里我依舊用的以前的 student 類，自己本地起了 kafka 然后造一些測試數(shù)據(jù)，這里我們測試發(fā)送一條數(shù)據(jù)則 sleep 10s，意味著往 kafka 中一分鐘發(fā) 6 條數(shù)據(jù)。

package com.zhisheng.connectors.mysql.utils;import com.zhisheng.common.utils.GsonUtil; import com.zhisheng.connectors.mysql.model.Student; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord;import java.util.Properties;/*** Desc: 往kafka中寫數(shù)據(jù),可以使用這個main函數(shù)進(jìn)行測試* Created by zhisheng on 2019-02-17* Blog: http://www.54tianzhisheng.cn/tags/Flink/*/ public class KafkaUtil {public static final String broker_list = "localhost:9092";public static final String topic = "student"; //kafka topic 需要和 flink 程序用同一個 topicpublic static void writeToKafka() throws InterruptedException {Properties props = new Properties();props.put("bootstrap.servers", broker_list);props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");KafkaProducer producer = new KafkaProducer<String, String>(props);for (int i = 1; i <= 100; i++) {Student student = new Student(i, "zhisheng" + i, "password" + i, 18 + i);ProducerRecord record = new ProducerRecord<String, String>(topic, null, null, GsonUtil.toJson(student));producer.send(record);System.out.println("發(fā)送數(shù)據(jù): " + GsonUtil.toJson(student));Thread.sleep(10 * 1000); //發(fā)送一條數(shù)據(jù) sleep 10s，相當(dāng)于 1 分鐘 6 條}producer.flush();}public static void main(String[] args) throws InterruptedException {writeToKafka();} }

從 kafka 中讀取數(shù)據(jù)，然后序列化成 student 對象。

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("zookeeper.connect", "localhost:2181"); props.put("group.id", "metric-group"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("auto.offset.reset", "latest");SingleOutputStreamOperator<Student> student = env.addSource(new FlinkKafkaConsumer011<>("student", //這個 kafka topic 需要和上面的工具類的 topic 一致new SimpleStringSchema(),props)).setParallelism(1).map(string -> GsonUtil.fromJson(string, Student.class)); //，解析字符串成 student 對象

因為 RichSinkFunction 中如果 sink 一條數(shù)據(jù)到 mysql 中就會調(diào)用 invoke 方法一次，所以如果要實現(xiàn)批量寫的話，我們最好在 sink 之前就把數(shù)據(jù)聚合一下。那這里我們開個一分鐘的窗口去聚合 Student 數(shù)據(jù)。

student.timeWindowAll(Time.minutes(1)).apply(new AllWindowFunction<Student, List<Student>, TimeWindow>() {@Overridepublic void apply(TimeWindow window, Iterable<Student> values, Collector<List<Student>> out) throws Exception {ArrayList<Student> students = Lists.newArrayList(values);if (students.size() > 0) {System.out.println("1 分鐘內(nèi)收集到 student 的數(shù)據(jù)條數(shù)是：" + students.size());out.collect(students);}} });

寫入數(shù)據(jù)庫

這里使用 DBCP 連接池連接數(shù)據(jù)庫 mysql，pom.xml 中添加依賴：

<dependency><groupId>org.apache.commons</groupId><artifactId>commons-dbcp2</artifactId><version>2.1.1</version> </dependency>

如果你想使用其他的數(shù)據(jù)庫連接池請加入對應(yīng)的依賴。

這里將數(shù)據(jù)寫入到 MySQL 中，依舊是和之前文章一樣繼承 RichSinkFunction 類，重寫里面的方法：

package com.zhisheng.connectors.mysql.sinks;import com.zhisheng.connectors.mysql.model.Student; import org.apache.commons.dbcp2.BasicDataSource; import org.apache.flink.configuration.Configuration; import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;import javax.sql.DataSource; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.util.List;/*** Desc: 數(shù)據(jù)批量 sink 數(shù)據(jù)到 mysql* Created by zhisheng_tian on 2019-02-17* Blog: http://www.54tianzhisheng.cn/tags/Flink/*/ public class SinkToMySQL extends RichSinkFunction<List<Student>> {PreparedStatement ps;BasicDataSource dataSource;private Connection connection;/*** open() 方法中建立連接，這樣不用每次 invoke 的時候都要建立連接和釋放連接** @param parameters* @throws Exception*/@Overridepublic void open(Configuration parameters) throws Exception {super.open(parameters);dataSource = new BasicDataSource();connection = getConnection(dataSource);String sql = "insert into Student(id, name, password, age) values(?, ?, ?, ?);";ps = this.connection.prepareStatement(sql);}@Overridepublic void close() throws Exception {super.close();//關(guān)閉連接和釋放資源if (connection != null) {connection.close();}if (ps != null) {ps.close();}}/*** 每條數(shù)據(jù)的插入都要調(diào)用一次 invoke() 方法** @param value* @param context* @throws Exception*/@Overridepublic void invoke(List<Student> value, Context context) throws Exception {//遍歷數(shù)據(jù)集合for (Student student : value) {ps.setInt(1, student.getId());ps.setString(2, student.getName());ps.setString(3, student.getPassword());ps.setInt(4, student.getAge());ps.addBatch();}int[] count = ps.executeBatch();//批量后執(zhí)行System.out.println("成功了插入了" + count.length + "行數(shù)據(jù)");}private static Connection getConnection(BasicDataSource dataSource) {dataSource.setDriverClassName("com.mysql.jdbc.Driver");//注意，替換成自己本地的 mysql 數(shù)據(jù)庫地址和用戶名、密碼dataSource.setUrl("jdbc:mysql://localhost:3306/test");dataSource.setUsername("root");dataSource.setPassword("root123456");//設(shè)置連接池的一些參數(shù)dataSource.setInitialSize(10);dataSource.setMaxTotal(50);dataSource.setMinIdle(2);Connection con = null;try {con = dataSource.getConnection();System.out.println("創(chuàng)建連接池：" + con);} catch (Exception e) {System.out.println("-----------mysql get connection has exception , msg = " + e.getMessage());}return con;} }

核心類 Main

核心程序如下：

public class Main {public static void main(String[] args) throws Exception{final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();Properties props = new Properties();props.put("bootstrap.servers", "localhost:9092");props.put("zookeeper.connect", "localhost:2181");props.put("group.id", "metric-group");props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");props.put("auto.offset.reset", "latest");SingleOutputStreamOperator<Student> student = env.addSource(new FlinkKafkaConsumer011<>("student", //這個 kafka topic 需要和上面的工具類的 topic 一致new SimpleStringSchema(),props)).setParallelism(1).map(string -> GsonUtil.fromJson(string, Student.class)); //student.timeWindowAll(Time.minutes(1)).apply(new AllWindowFunction<Student, List<Student>, TimeWindow>() {@Overridepublic void apply(TimeWindow window, Iterable<Student> values, Collector<List<Student>> out) throws Exception {ArrayList<Student> students = Lists.newArrayList(values);if (students.size() > 0) {System.out.println("1 分鐘內(nèi)收集到 student 的數(shù)據(jù)條數(shù)是：" + students.size());out.collect(students);}}}).addSink(new SinkToMySQL());env.execute("flink learning connectors kafka");} }

運(yùn)行項目

運(yùn)行 Main 類后再運(yùn)行 KafkaUtils.java 類！

下圖是往 Kafka 中發(fā)送的數(shù)據(jù)：

下圖是運(yùn)行 Main 類的日志，會創(chuàng)建 4 個連接池是因為默認(rèn)的 4 個并行度，你如果在 addSink 這個算子設(shè)置并行度為 1 的話就會創(chuàng)建一個連接池：

下圖是批量插入數(shù)據(jù)庫的結(jié)果：

總結(jié)

本文從知識星球一位朋友的疑問來寫的，應(yīng)該都滿足了他的條件（批量/數(shù)據(jù)庫連接池/寫入mysql），的確網(wǎng)上很多的例子都是簡單的 demo 形式，都是單條數(shù)據(jù)就創(chuàng)建數(shù)據(jù)庫連接插入 MySQL，如果要寫的數(shù)據(jù)量很大的話，會對 MySQL 的寫有很大的壓力。這也是我之前在《從0到1學(xué)習(xí)Flink》—— Flink 寫入數(shù)據(jù)到 ElasticSearch 中，數(shù)據(jù)寫 ES 強(qiáng)調(diào)過的，如果要提高性能必定要批量的寫。就拿我們現(xiàn)在這篇文章來說，如果數(shù)據(jù)量大的話，聚合一分鐘數(shù)據(jù)達(dá)萬條，那么這樣批量寫會比來一條寫一條性能提高不知道有多少。

本文原創(chuàng)地址是: http://www.54tianzhisheng.cn/2019/01/15/Flink-MySQL-sink/ , 未經(jīng)允許禁止轉(zhuǎn)載。

關(guān)注我

微信公眾號：zhisheng

另外我自己整理了些 Flink 的學(xué)習(xí)資料，目前已經(jīng)全部放到微信公眾號了。你可以加我的微信：zhisheng_tian，然后回復(fù)關(guān)鍵字：Flink 即可無條件獲取到。

更多私密資料請加入知識星球！

Github 代碼倉庫

https://github.com/zhisheng17/flink-learning/

以后這個項目的所有代碼都將放在這個倉庫里，包含了自己學(xué)習(xí) flink 的一些 demo 和博客。

本文的項目代碼在 https://github.com/zhisheng17/flink-learning/tree/master/flink-learning-connectors/flink-learning-connectors-mysql

總結(jié)

以上是生活随笔為你收集整理的《从0到1学习Flink》—— Flink 读取 Kafka 数据批量写入到 MySQL的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇： hostingranking.cn·基于
下一篇： Python--day63--添加书籍

数据库

《从0到1学习Flink》—— Flink 读取 Kafka 数据批量写入到 MySQL

前言

準(zhǔn)備

讀取 kafka 數(shù)據(jù)

寫入數(shù)據(jù)庫

核心類 Main

運(yùn)行項目

總結(jié)

關(guān)注我

Github 代碼倉庫

相關(guān)文章

總結(jié)