當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

kafka学习_Kafka学习笔记下

發布時間：2023/12/19 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 kafka学习_Kafka学习笔记下小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

4 Kafka API實戰

4.1 環境準備

1)啟動zk和kafka集群，在kafka集群中打開一個消費者

[atguigu@hadoop102 kafka]$ bin/kafka-console-consumer.sh \--zookeeper hadoop102:2181 --topic first

2)導入pom依賴

<dependencies> ? ?<dependency> ? ?<groupId>org.apache.kafkagroupId> ? ?<artifactId>kafka-clientsartifactId> ? ?<version>0.11.0.0version> ?dependency> ? ?<dependency> ? ?<groupId>org.apache.kafkagroupId> ? ?<artifactId>kafka_2.12artifactId> ? ?<version>0.11.0.0version> ?dependency>dependencies>

4.2 Kafka生產者Java API

4.2.1 創建生產者(過時的API)

package com.atguigu.kafka;import java.util.Properties;import kafka.javaapi.producer.Producer;import kafka.producer.KeyedMessage;import kafka.producer.ProducerConfig; public class OldProducer { ?@SuppressWarnings("deprecation") ?public static void main(String[] args) { ? ? ? ?Properties properties = new Properties(); ? ?properties.put("metadata.broker.list", "hadoop102:9092"); ? ?properties.put("request.required.acks", "1"); ? ?properties.put("serializer.class", "kafka.serializer.StringEncoder"); ? ? ? ?Producer<Integer, String> producer = new Producer<Integer,String>(new ProducerConfig(properties)); ? ? ? ?KeyedMessage<Integer, String> message = new KeyedMessage<Integer, String>("first", "hello world"); ? ?producer.send(message ); }}

4.2.2 創建生產者(新API)

package com.atguigu.kafka;import java.util.Properties;import org.apache.kafka.clients.producer.KafkaProducer;import org.apache.kafka.clients.producer.Producer;import org.apache.kafka.clients.producer.ProducerRecord; public class NewProducer { ?public static void main(String[] args) { ? ? ? ?Properties props = new Properties(); ? ?// Kafka服務端的主機名和端口號 ? ?props.put("bootstrap.servers", "hadoop103:9092"); ? ?// 等待所有副本節點的應答 ? ?props.put("acks", "all"); ? ?// 消息發送最大嘗試次數 ? ?props.put("retries", 0); ? ?// 一批消息處理大小 ? ?props.put("batch.size", 16384); ? ?// 請求延時 ? ?props.put("linger.ms", 1); ? ?// 發送緩存區內存大小 ? ?props.put("buffer.memory", 33554432); ? ?// key序列化 ? ?props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ?// value序列化 ? ?props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ?Producer<String, String> producer = new KafkaProducer<>(props); ? ?for (int i = 0; i < 50; i++) { ? ? ? producer.send(new ProducerRecord<String, String>("first", Integer.toString(i), "hello world-" + i)); ? } ? ?producer.close(); }}

4.2.3 創建生產者帶回調函數(新API)

package com.atguigu.kafka;import java.util.Properties;import org.apache.kafka.clients.producer.Callback;import org.apache.kafka.clients.producer.KafkaProducer;import org.apache.kafka.clients.producer.ProducerRecord;import org.apache.kafka.clients.producer.RecordMetadata; public class CallBackProducer { ?public static void main(String[] args) { Properties props = new Properties(); ? ?// Kafka服務端的主機名和端口號 ? ?props.put("bootstrap.servers", "hadoop103:9092"); ? ?// 等待所有副本節點的應答 ? ?props.put("acks", "all"); ? ?// 消息發送最大嘗試次數 ? ?props.put("retries", 0); ? ?// 一批消息處理大小 ? ?props.put("batch.size", 16384); ? ?// 增加服務端請求延時 ? ?props.put("linger.ms", 1);// 發送緩存區內存大小 ? ?props.put("buffer.memory", 33554432); ? ?// key序列化 ? ?props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ?// value序列化 ? ?props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ?KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(props); ? ?for (int i = 0; i < 50; i++) { ? ? ? kafkaProducer.send(new ProducerRecord<String, String>("first", "hello" + i), new Callback() { ? ? ? ? @Override ? ? ? ? public void onCompletion(RecordMetadata metadata, Exception exception) { ? ? ? ? ? if (metadata != null) { ? ? ? ? ? ? System.err.println(metadata.partition() + "---" + metadata.offset()); ? ? ? ? ? } ? ? ? ? } ? ? ? }); ? } ? ?kafkaProducer.close(); }}

4.2.4 自定義分區生產者

0)需求：將所有數據存儲到topic的第0號分區上

1)定義一個類實現Partitioner接口，重寫里面的方法(過時API)

package com.atguigu.kafka;import java.util.Map;import kafka.producer.Partitioner; public class CustomPartitioner implements Partitioner { ?public CustomPartitioner() { ? ?super(); } ?@Override ?public int partition(Object key, int numPartitions) { ? ?// 控制分區 ? ?return 0; }}

2)自定義分區(新API)

package com.atguigu.kafka;import java.util.Map;import org.apache.kafka.clients.producer.Partitioner;import org.apache.kafka.common.Cluster; public class CustomPartitioner implements Partitioner { ?@Override ?public void configure(Map<String, ?> configs) { ? ? } ?@Override ?public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) { ? ?// 控制分區 ? ?return 0; } ?@Override ?public void close() { ? ? }}

3)在代碼中調用

package com.atguigu.kafka;import java.util.Properties;import org.apache.kafka.clients.producer.KafkaProducer;import org.apache.kafka.clients.producer.Producer;import org.apache.kafka.clients.producer.ProducerRecord; public class PartitionerProducer { ?public static void main(String[] args) { ? ? ? ?Properties props = new Properties(); ? ?// Kafka服務端的主機名和端口號 ? ?props.put("bootstrap.servers", "hadoop103:9092"); ? ?// 等待所有副本節點的應答 ? ?props.put("acks", "all"); ? ?// 消息發送最大嘗試次數 ? ?props.put("retries", 0); ? ?// 一批消息處理大小 ? ?props.put("batch.size", 16384); ? ?// 增加服務端請求延時 ? ?props.put("linger.ms", 1); ? ?// 發送緩存區內存大小 ? ?props.put("buffer.memory", 33554432); ? ?// key序列化 ? ?props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ?// value序列化 ? ?props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ?// 自定義分區 ? ?props.put("partitioner.class", "com.atguigu.kafka.CustomPartitioner"); ? ?Producer<String, String> producer = new KafkaProducer<>(props); ? ?producer.send(new ProducerRecord<String, String>("first", "1", "atguigu")); ? ?producer.close(); }}

4)測試

? ?(1)在hadoop102上監控/opt/module/kafka/logs/目錄下first主題3個分區的log日志動態變化情況

[atguigu@hadoop102 first-0]$ tail -f 00000000000000000000.log

[atguigu@hadoop102 first-1]$ tail -f 00000000000000000000.log

[atguigu@hadoop102 first-2]$ tail -f 00000000000000000000.log

? ?(2)發現數據都存儲到指定的分區了。

4.3 Kafka消費者Java API

4.3.1 高級API

0)在控制臺創建發送者

[atguigu@hadoop104 kafka]$ bin/kafka-console-producer.sh \--broker-list hadoop102:9092 --topic first>hello world

1)創建消費者(過時API)

package com.atguigu.kafka.consume;import java.util.HashMap;import java.util.List;import java.util.Map;import java.util.Properties;import kafka.consumer.Consumer;import kafka.consumer.ConsumerConfig;import kafka.consumer.ConsumerIterator;import kafka.consumer.KafkaStream;import kafka.javaapi.consumer.ConsumerConnector;public class CustomConsumer { ?@SuppressWarnings("deprecation") ?public static void main(String[] args) { ? ?Properties properties = new Properties(); ? ? ? ?properties.put("zookeeper.connect", "hadoop102:2181"); ? ?properties.put("group.id", "g1"); ? ?properties.put("zookeeper.session.timeout.ms", "500"); ? ?properties.put("zookeeper.sync.time.ms", "250"); ? ?properties.put("auto.commit.interval.ms", "1000"); ? ? ? ?// 創建消費者連接器 ? ?ConsumerConnector consumer = Consumer.createJavaConsumerConnector(new ConsumerConfig(properties)); ? ? ? ?HashMap<String, Integer> topicCount = new HashMap<>(); ? ?topicCount.put("first", 1); ? ? ? ?Map<String, List<KafkaStream<byte[], byte[]>>> consumerMap = consumer.createMessageStreams(topicCount); ? ? ? ?KafkaStream<byte[], byte[]> stream = consumerMap.get("first").get(0); ? ? ? ?ConsumerIterator<byte[], byte[]> it = stream.iterator(); ? ? ? ?while (it.hasNext()) { ? ? ? System.out.println(new String(it.next().message())); ? } }}

2)官方提供案例(自動維護消費情況)(新API)

package com.atguigu.kafka.consume;import java.util.Arrays;import java.util.Properties;import org.apache.kafka.clients.consumer.ConsumerRecord;import org.apache.kafka.clients.consumer.ConsumerRecords;import org.apache.kafka.clients.consumer.KafkaConsumer; public class CustomNewConsumer { ?public static void main(String[] args) { ? ?Properties props = new Properties(); ? ?// 定義kakfa 服務的地址，不需要將所有broker指定上 ? ?props.put("bootstrap.servers", "hadoop102:9092"); ? ?// 制定consumer group ? ?props.put("group.id", "test"); ? ?// 是否自動確認offset ? ?props.put("enable.auto.commit", "true"); ? ?// 自動確認offset的時間間隔 ? ?props.put("auto.commit.interval.ms", "1000"); ? ?// key的序列化類 ? ?props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); ? ?// value的序列化類 ? ?props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); ? ?// 定義consumer ? ?KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); ? ? ? ?// 消費者訂閱的topic, 可同時訂閱多個 ? ?consumer.subscribe(Arrays.asList("first", "second","third")); ? ?while (true) { ? ? ? // 讀取數據，讀取超時時間為100ms ? ? ? ConsumerRecords<String, String> records = consumer.poll(100); ? ? ? for (ConsumerRecord<String, String> record : records) ? ? ? ? System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value()); ? } }}

4.3.2 低級API

實現使用低級API讀取指定topic，指定partition,指定offset的數據。

1)消費者使用低級API 的主要步驟：

?步驟 ? ?主要工作 ?

?1 ?	?根據指定的分區從主題元數據中找到主副本 ?
?2 ?	?獲取分區最新的消費進度 ?
?3 ?	?從主副本拉取分區的消息 ?
?4 ?	?識別主副本的變化，重試 ?

2)方法描述：

?findLeader() ? ?客戶端向種子節點發送主題元數據，將副本集加入備用節點 ?

?getLastOffset() ?	?消費者客戶端發送偏移量請求，獲取分區最近的偏移量 ?
?run() ?	?消費者低級AP I拉取消息的主要方法 ?
?findNewLeader() ?	?當分區的主副本節點發生故障，客戶將要找出新的主副本 ?

3)代碼：

package com.atguigu;import java.nio.ByteBuffer;import java.util.ArrayList;import java.util.Collections;import java.util.HashMap;import java.util.List;import java.util.Map;import kafka.api.FetchRequest;import kafka.api.FetchRequestBuilder;import kafka.api.PartitionOffsetRequestInfo;import kafka.cluster.BrokerEndPoint;import kafka.common.ErrorMapping;import kafka.common.TopicAndPartition;import kafka.javaapi.FetchResponse;import kafka.javaapi.OffsetResponse;import kafka.javaapi.PartitionMetadata;import kafka.javaapi.TopicMetadata;import kafka.javaapi.TopicMetadataRequest;import kafka.javaapi.consumer.SimpleConsumer;import kafka.message.MessageAndOffset; public class SimpleExample { private List m_replicaBrokers = new ArrayList<>(); public SimpleExample() { ? m_replicaBrokers = new ArrayList<>(); } public static void main(String args[]) { ? SimpleExample example = new SimpleExample(); ? // 最大讀取消息數量 ? long maxReads = Long.parseLong("3"); ? // 要訂閱的topic ? String topic = "test1"; ? // 要查找的分區 ? int partition = Integer.parseInt("0"); ? // broker節點的ip ? List seeds = new ArrayList<>(); ? seeds.add("192.168.9.102"); ? seeds.add("192.168.9.103"); ? seeds.add("192.168.9.104"); ? // 端口 ? int port = Integer.parseInt("9092"); ? try { ? ? example.run(maxReads, topic, partition, seeds, port); ? } catch (Exception e) { ? ? System.out.println("Oops:" + e); ? ? e.printStackTrace(); ? } } public void run(long a_maxReads, String a_topic, int a_partition, List a_seedBrokers, int a_port) throws Exception { ? // 獲取指定Topic partition的元數據 ? PartitionMetadata metadata = findLeader(a_seedBrokers, a_port, a_topic, a_partition); ? if (metadata == null) { ? ? System.out.println("Can't find metadata for Topic and Partition. Exiting"); ? ? return; ? } ? if (metadata.leader() == null) { ? ? System.out.println("Can't find Leader for Topic and Partition. Exiting"); ? ? return; ? } ? String leadBroker = metadata.leader().host(); ? String clientName = "Client_" + a_topic + "_" + a_partition; ? SimpleConsumer consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName); ? long readOffset = getLastOffset(consumer, a_topic, a_partition, kafka.api.OffsetRequest.EarliestTime(), clientName); ? int numErrors = 0; ? while (a_maxReads > 0) { ? ? if (consumer == null) { ? ? ? consumer = new SimpleConsumer(leadBroker, a_port, 100000, 64 * 1024, clientName); ? ? } ? ? FetchRequest req = new FetchRequestBuilder().clientId(clientName).addFetch(a_topic, a_partition, readOffset, 100000).build(); ? ? FetchResponse fetchResponse = consumer.fetch(req); ? ? if (fetchResponse.hasError()) { ? ? ? numErrors++; ? ? ? // Something went wrong! ? ? ? short code = fetchResponse.errorCode(a_topic, a_partition); ? ? ? System.out.println("Error fetching data from the Broker:" + leadBroker + " Reason: " + code); ? ? ? if (numErrors > 5) ? ? ? ? break; ? ? ? if (code == ErrorMapping.OffsetOutOfRangeCode()) { ? ? ? ? // We asked for an invalid offset. For simple case ask for ? ? ? ? // the last element to reset ? ? ? ? readOffset = getLastOffset(consumer, a_topic, a_partition, kafka.api.OffsetRequest.LatestTime(), clientName); ? ? ? ? continue; ? ? ? } ? ? ? consumer.close(); ? ? ? consumer = null; ? ? ? leadBroker = findNewLeader(leadBroker, a_topic, a_partition, a_port); ? ? ? continue; ? ? } ? ? numErrors = 0; ? ? long numRead = 0; ? ? for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(a_topic, a_partition)) { ? ? ? long currentOffset = messageAndOffset.offset(); ? ? ? if (currentOffset < readOffset) { ? ? ? ? System.out.println("Found an old offset: " + currentOffset + " Expecting: " + readOffset); ? ? ? ? continue; ? ? ? } ? ? ? readOffset = messageAndOffset.nextOffset(); ? ? ? ByteBuffer payload = messageAndOffset.message().payload(); ? ? ? byte[] bytes = new byte[payload.limit()]; ? ? ? payload.get(bytes); ? ? ? System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8")); ? ? ? numRead++; ? ? ? a_maxReads--; ? ? } ? ? if (numRead == 0) { ? ? ? try { ? ? ? ? Thread.sleep(1000); ? ? ? } catch (InterruptedException ie) { ? ? ? } ? ? } ? } ? if (consumer != null) ? ? consumer.close(); } public static long getLastOffset(SimpleConsumer consumer, String topic, int partition, long whichTime, String clientName) { ? TopicAndPartition topicAndPartition = new TopicAndPartition(topic, partition); ? Map requestInfo = new HashMap(); ? requestInfo.put(topicAndPartition, new PartitionOffsetRequestInfo(whichTime, 1)); ? kafka.javaapi.OffsetRequest request = new kafka.javaapi.OffsetRequest(requestInfo, kafka.api.OffsetRequest.CurrentVersion(), clientName); ? OffsetResponse response = consumer.getOffsetsBefore(request); ? if (response.hasError()) { ? ? System.out.println("Error fetching data Offset Data the Broker. Reason: " + response.errorCode(topic, partition)); ? ? return 0; ? } ? long[] offsets = response.offsets(topic, partition); ? return offsets[0]; } private String findNewLeader(String a_oldLeader, String a_topic, int a_partition, int a_port) throws Exception { ? for (int i = 0; i < 3; i++) { ? ? boolean goToSleep = false; ? ? PartitionMetadata metadata = findLeader(m_replicaBrokers, a_port, a_topic, a_partition); ? ? if (metadata == null) { ? ? ? goToSleep = true; ? ? } else if (metadata.leader() == null) { ? ? ? goToSleep = true; ? ? } else if (a_oldLeader.equalsIgnoreCase(metadata.leader().host()) && i == 0) { ? ? ? // first time through if the leader hasn't changed give ? ? ? // ZooKeeper a second to recover ? ? ? // second time, assume the broker did recover before failover, ? ? ? // or it was a non-Broker issue ? ? ? // ? ? ? goToSleep = true; ? ? } else { ? ? ? return metadata.leader().host(); ? ? } ? ? if (goToSleep) { ? ? ? ? ? Thread.sleep(1000); ? ? } ? } ? System.out.println("Unable to find new leader after Broker failure. Exiting"); ? throw new Exception("Unable to find new leader after Broker failure. Exiting"); } private PartitionMetadata findLeader(List a_seedBrokers, int a_port, String a_topic, int a_partition) { ? PartitionMetadata returnMetaData = null; ? loop: ? for (String seed : a_seedBrokers) { ? ? SimpleConsumer consumer = null; ? ? try { ? ? ? consumer = new SimpleConsumer(seed, a_port, 100000, 64 * 1024, "leaderLookup"); ? ? ? List topics = Collections.singletonList(a_topic); ? ? ? TopicMetadataRequest req = new TopicMetadataRequest(topics); ? ? ? kafka.javaapi.TopicMetadataResponse resp = consumer.send(req); ? ? ? List metaData = resp.topicsMetadata(); ? ? ? for (TopicMetadata item : metaData) { ? ? ? ? for (PartitionMetadata part : item.partitionsMetadata()) { ? ? ? ? ? if (part.partitionId() == a_partition) { ? ? ? ? ? ? returnMetaData = part; ? ? ? ? ? ? ? break loop; ? ? ? ? ? } ? ? ? ? } ? ? ? } ? ? } catch (Exception e) { ? ? ? System.out.println("Error communicating with Broker [" + seed + "] to find Leader for [" + a_topic + ", " + a_partition + "] Reason: " + e); ? ? } finally { ? ? ? if (consumer != null) ? ? ? ? consumer.close(); ? ? } ? } ? if (returnMetaData != null) { ? ? m_replicaBrokers.clear(); ? ? for (BrokerEndPoint replica : returnMetaData.replicas()) { ? ? ? m_replicaBrokers.add(replica.host()); ? ? } ? } ? return returnMetaData; }}

5 Kafka producer攔截器(interceptor)

5.1 攔截器原理

Producer攔截器(interceptor)是在Kafka 0.10版本被引入的，主要用于實現clients端的定制化控制邏輯。

對于producer而言，interceptor使得用戶在消息發送前以及producer回調邏輯前有機會對消息做一些定制化需求，比如修改消息等。同時，producer允許用戶指定多個interceptor按序作用于同一條消息從而形成一個攔截鏈(interceptor chain)。Intercetpor的實現接口是org.apache.kafka.clients.producer.ProducerInterceptor，其定義的方法包括：

(1)configure(configs)

獲取配置信息和初始化數據時調用。

(2)onSend(ProducerRecord)：

該方法封裝進KafkaProducer.send方法中，即它運行在用戶主線程中。Producer確保在消息被序列化以及計算分區前調用該方法。用戶可以在該方法中對消息做任何操作，但最好保證不要修改消息所屬的topic和分區，否則會影響目標分區的計算

(3)onAcknowledgement(RecordMetadata, Exception)：

該方法會在消息被應答或消息發送失敗時調用，并且通常都是在producer回調邏輯觸發之前。onAcknowledgement運行在producer的IO線程中，因此不要在該方法中放入很重的邏輯，否則會拖慢producer的消息發送效率

(4)close：

關閉interceptor，主要用于執行一些資源清理工作

如前所述，interceptor可能被運行在多個線程中，因此在具體實現時用戶需要自行確保線程安全。另外倘若指定了多個interceptor，則producer將按照指定順序調用它們，并僅僅是捕獲每個interceptor可能拋出的異常記錄到錯誤日志中而非在向上傳遞。這在使用過程中要特別留意。

5.2 攔截器案例

1)需求：

實現一個簡單的雙interceptor組成的攔截鏈。第一個interceptor會在消息發送前將時間戳信息加到消息value的最前部；第二個interceptor會在消息發送后更新成功發送消息數或失敗發送消息數。

2)案例實操

(1)增加時間戳攔截器

package com.atguigu.kafka.interceptor;import java.util.Map;import org.apache.kafka.clients.producer.ProducerInterceptor;import org.apache.kafka.clients.producer.ProducerRecord;import org.apache.kafka.clients.producer.RecordMetadata; public class TimeInterceptor implements ProducerInterceptor<String, String> { ?@Override ?public void configure(Map<String, ?> configs) { } ?@Override ?public ProducerRecord<String, String> onSend(ProducerRecord<String, String> record) { ? ?// 創建一個新的record，把時間戳寫入消息體的最前部 ? ?return new ProducerRecord(record.topic(), record.partition(), record.timestamp(), record.key(), ? ? ? ? System.currentTimeMillis() + "," + record.value().toString()); } ?@Override ?public void onAcknowledgement(RecordMetadata metadata, Exception exception) { } ?@Override ?public void close() { }}

(2)統計發送消息成功和發送失敗消息數，并在producer關閉時打印這兩個計數器

package com.atguigu.kafka.interceptor;import java.util.Map;import org.apache.kafka.clients.producer.ProducerInterceptor;import org.apache.kafka.clients.producer.ProducerRecord;import org.apache.kafka.clients.producer.RecordMetadata; public class CounterInterceptor implements ProducerInterceptor<String, String>{ ?private int errorCounter = 0; ?private int successCounter = 0; ?@Override ?public void configure(Map<String, ?> configs) { ? ? } ?@Override ?public ProducerRecord<String, String> onSend(ProducerRecord<String, String> record) { ? ? return record; } ?@Override ?public void onAcknowledgement(RecordMetadata metadata, Exception exception) { ? ?// 統計成功和失敗的次數 ? ?if (exception == null) { ? ? ?successCounter++; ? } else { ? ? ?errorCounter++; ? ? } } ?@Override ?public void close() { ? ?// 保存結果 ? ?System.out.println("Successful sent: " + successCounter); ? ?System.out.println("Failed sent: " + errorCounter); }}

(3)producer主程序

package com.atguigu.kafka.interceptor;import java.util.ArrayList;import java.util.List;import java.util.Properties;import org.apache.kafka.clients.producer.KafkaProducer;import org.apache.kafka.clients.producer.Producer;import org.apache.kafka.clients.producer.ProducerConfig;import org.apache.kafka.clients.producer.ProducerRecord; public class InterceptorProducer { ?public static void main(String[] args) throws Exception { ? ?// 1 設置配置信息 ? ?Properties props = new Properties(); ? ?props.put("bootstrap.servers", "hadoop102:9092"); ? ?props.put("acks", "all"); ? ?props.put("retries", 0); ? ?props.put("batch.size", 16384); ? ?props.put("linger.ms", 1); ? ?props.put("buffer.memory", 33554432); ? ?props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ?props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); ? ? ? ?// 2 構建攔截鏈 ? ?List<String> interceptors = new ArrayList<>(); ? interceptors.add("com.atguigu.kafka.interceptor.TimeInterceptor"); interceptors.add("com.atguigu.kafka.interceptor.CounterInterceptor"); ? ?props.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, interceptors); ? ? ? ?String topic = "first"; ? ?Producer<String, String> producer = new KafkaProducer<>(props); ? ? ? ?// 3 發送消息 ? ?for (int i = 0; i < 10; i++) { ? ? ? ? ? ?ProducerRecord<String, String> record = new ProducerRecord<>(topic, "message" + i); ? ? ?producer.send(record); ? } ? ? ? ?// 4 一定要關閉producer，這樣才會調用interceptor的close方法 ? ?producer.close(); }}

3)測試

(1)在kafka上啟動消費者，然后運行客戶端java程序。

[atguigu@hadoop102 kafka]$ bin/kafka-console-consumer.sh \--zookeeper hadoop102:2181 --from-beginning --topic first1501904047034,message01501904047225,message11501904047230,message21501904047234,message31501904047236,message41501904047240,message51501904047243,message61501904047246,message71501904047249,message81501904047252,message9

(2)觀察java平臺控制臺輸出數據如下：

Successful sent: 10Failed sent: 0

6 Kafka Streams

6.1 概述

6.1.1 Kafka Streams

Kafka Streams。Apache Kafka開源項目的一個組成部分。是一個功能強大，易于使用的庫。用于在Kafka上構建高可分布式、拓展性，容錯的應用程序。

6.1.2 Kafka Streams特點

1)功能強大

高擴展性，彈性，容錯

2)輕量級

無需專門的集群

一個庫，而不是框架

3)完全集成

100%的Kafka 0.10.0版本兼容

易于集成到現有的應用程序

4)實時性

毫秒級延遲

并非微批處理

窗口允許亂序數據

允許遲到數據

6.1.3 為什么要有Kafka Stream

當前已經有非常多的流式處理系統，最知名且應用最多的開源流式處理系統有Spark Streaming和Apache Storm。Apache Storm發展多年，應用廣泛，提供記錄級別的處理能力，當前也支持SQL on Stream。而Spark Streaming基于Apache Spark，可以非常方便與圖計算，SQL處理等集成，功能強大，對于熟悉其它Spark應用開發的用戶而言使用門檻低。另外，目前主流的Hadoop發行版，如Cloudera和Hortonworks，都集成了Apache Storm和Apache Spark，使得部署更容易。

既然Apache Spark與Apache Storm擁用如此多的優勢，那為何還需要Kafka Stream呢？主要有如下原因。

第一，Spark和Storm都是流式處理框架，而Kafka Stream提供的是一個基于Kafka的流式處理類庫?？蚣芤箝_發者按照特定的方式去開發邏輯部分，供框架調用。開發者很難了解框架的具體運行方式，從而使得調試成本高，并且使用受限。而Kafka Stream作為流式處理類庫，直接提供具體的類給開發者調用，整個應用的運行方式主要由開發者控制，方便使用和調試。

第二，雖然Cloudera與Hortonworks方便了Storm和Spark的部署，但是這些框架的部署仍然相對復雜。而Kafka Stream作為類庫，可以非常方便的嵌入應用程序中，它對應用的打包和部署基本沒有任何要求。

第三，就流式處理系統而言，基本都支持Kafka作為數據源。例如Storm具有專門的kafka-spout，而Spark也提供專門的spark-streaming-kafka模塊。事實上，Kafka基本上是主流的流式處理系統的標準數據源。換言之，大部分流式系統中都已部署了Kafka，此時使用Kafka Stream的成本非常低。

第四，使用Storm或Spark Streaming時，需要為框架本身的進程預留資源，如Storm的supervisor和Spark on YARN的node manager。即使對于應用實例而言，框架本身也會占用部分資源，如Spark Streaming需要為shuffle和storage預留內存。但是Kafka作為類庫不占用系統資源。

第五，由于Kafka本身提供數據持久化，因此Kafka Stream提供滾動部署和滾動升級以及重新計算的能力。

第六，由于Kafka Consumer Rebalance機制，Kafka Stream可以在線動態調整并行度。

6.2 Kafka Stream數據清洗案例

0)需求：

? 實時處理單詞帶有”>>>”前綴的內容。例如輸入”atguigu>>>ximenqing”，最終處理成“ximenqing”

1)需求分析：

2)案例實操

(1)創建一個工程，并添加jar包

(2)創建主類

package com.atguigu.kafka.stream;import java.util.Properties;import org.apache.kafka.streams.KafkaStreams;import org.apache.kafka.streams.StreamsConfig;import org.apache.kafka.streams.processor.Processor;import org.apache.kafka.streams.processor.ProcessorSupplier;import org.apache.kafka.streams.processor.TopologyBuilder; public class Application { ?public static void main(String[] args) { ? ?// 定義輸入的topic ? ?String from = "first"; ? ?// 定義輸出的topic ? ?String to = "second"; ? ?// 設置參數 ? ?Properties settings = new Properties(); ? ?settings.put(StreamsConfig.APPLICATION_ID_CONFIG, "logFilter"); ? ?settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "hadoop102:9092"); ? ?StreamsConfig config = new StreamsConfig(settings); ? ?// 構建拓撲 ? ?TopologyBuilder builder = new TopologyBuilder(); ? ?builder.addSource("SOURCE", from) ? ? ? .addProcessor("PROCESS", new ProcessorSupplier<byte[], byte[]>() { ? ? ? ? @Override ? ? ? ? ? public Processor<byte[], byte[]> get() { ? ? ? ? ? ? // 具體分析處理 ? ? ? ? ? ? return new LogProcessor(); ? ? ? ? ? } ? ? ? ? }, "SOURCE") ? ? ? .addSink("SINK", to, "PROCESS"); ? ?// 創建kafka stream ? ?KafkaStreams streams = new KafkaStreams(builder, config); ? ?streams.start(); }}

(3)具體業務處理

package com.atguigu.kafka.stream;import org.apache.kafka.streams.processor.Processor;import org.apache.kafka.streams.processor.ProcessorContext;public class LogProcessor implements Processor<byte[], byte[]> { ?private ProcessorContext context; ? ?@Override ?public void init(ProcessorContext context) { ? ?this.context = context; } ?@Override ?public void process(byte[] key, byte[] value) { ? ?String input = new String(value); ? ? ? ?// 如果包含“>>>”則只保留該標記后面的內容 ? ?if (input.contains(">>>")) { ? ? ? input = input.split(">>>")[1].trim(); ? ? ? // 輸出到下一個topic ? ? ? context.forward("logProcessor".getBytes(), input.getBytes()); ? }else{ ? ? ? context.forward("logProcessor".getBytes(), input.getBytes()); ? } } ?@Override ?public void punctuate(long timestamp) { } ?@Override ?public void close() { ? }}

(4)運行程序

(5)在hadoop104上啟動生產者

[atguigu@hadoop104 kafka]$ bin/kafka-console-producer.sh \--broker-list hadoop102:9092 --topic first >hello>>>world>h>>>atguigu>hahaha

(6)在hadoop103上啟動消費者

[atguigu@hadoop103 kafka]$ bin/kafka-console-consumer.sh \--zookeeper hadoop102:2181 --from-beginning --topic second worldatguiguhahaha

7.1 kafka一直在rebalance問題

關于Kafka的三個配置：

session.timeout.ms=10000 // 單位：毫秒，kafka會有一個心跳線程來同步服務端，告訴服務端自己是正常可用的，默認是3秒發送一次心跳，超過session.timeout.ms(默認10秒)服務端沒有收到心跳就會認為當前消費者失效。max.poll.interval.ms=300000 // 單位：毫秒，決定了獲取消息后提交偏移量的最大時間，超過設定的時間(默認5分鐘)，服務端也會認為該消費者失效。max.poll.records=500 //合理設置每次poll的消息消費的數量，默認是500，如果數量過多，導致一次poll的操作返回的消息記錄無法在指定時間內完成，則會出現rebalance，即kafka服務端認為消費者失效，會重新分配分區，導致偏移量沒有提交，從而會導致重復消費。

有朋友遇到了一個問題，發現kafka一直在rebalance，(當rebalance后，之前該consumer擁有的分區和offset信息就失效了，同時導致不斷的報auto offset commit failed。)通過改變如下兩個配置解決了問題，

例如：確保獲取300條消息在400秒之內消費完成，提交偏移量offset，并再次去poll()。這樣就不會出現這個問題。

#把這個調改大些，默認是300秒max.poll.interval.ms=400000#把這個改小些，默認是500條消息max.poll.records=300

好了，本次內容就是這些，學無止境，關注我，我們一起學習進步。如果覺得內容還可以，幫忙點個贊，點個在看唄，謝謝~我們下期見。

資料獲取：關注公眾號【良辰】，回復關鍵字Kafka獲取，可獲取筆記和學習視頻。

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的kafka学习_Kafka学习笔记下的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：成都购车优惠15万？经销商解释：属实但
下一篇：微软痛斥索尼：阻止收购动视暴雪是“自私尝