當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Flume1.5.0的安装、部署、简单应用(含伪分布式、与hadoop2.2.0、hbase0.96的案例)

發布時間：2025/3/15 编程问答 13 豆豆

生活随笔收集整理的這篇文章主要介紹了 Flume1.5.0的安装、部署、简单应用(含伪分布式、与hadoop2.2.0、hbase0.96的案例) 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

原文地址：http://www.cnblogs.com/lion.net/p/3903197.html

目錄： 　　一、什么是Flume? 　　　　1)flume的特點 　　　　2)flume的可靠性 　　　　3)flume的可恢復性 　　　　4)flume 的一些核心概念 　　二、flume的官方網站在哪里？ 　　三、在哪里下載？ 　　四、如何安裝？ 　　五、flume的案例 　　　　1)案例1：Avro 　　　　2)案例2：Spool 　　　　3)案例3：Exec 　　　　4)案例4：Syslogtcp 　　　　5)案例5：JSONHandler 　　　　6)案例6：Hadoop sink 　　　　7)案例7：File Roll Sink 　　　　8)案例8：Replicating Channel Selector 　　　　9)案例9：Multiplexing Channel Selector 　　　　10)案例10：Flume Sink Processors 　　　　11)案例11：Load balancing Sink Processor 　　　　12)案例12：Hbase sink 　　一、什么是Flume? flume 作為 cloudera 開發的實時日志收集系統，受到了業界的認可與廣泛應用。Flume 初始的發行版本目前被統稱為 Flume OG（original generation），屬于 cloudera。但隨著 FLume 功能的擴展，Flume OG 代碼工程臃腫、核心組件設計不合理、核心配置不標準等缺點暴露出來，尤其是在 Flume OG 的最后一個發行版本 0.94.0 中，日志傳輸不穩定的現象尤為嚴重，為了解決這些問題，2011 年 10 月 22 號，cloudera 完成了 Flume-728，對 Flume 進行了里程碑式的改動：重構核心組件、核心配置以及代碼架構，重構后的版本統稱為 Flume NG（next generation）；改動的另一原因是將 Flume 納入 apache 旗下，cloudera Flume 改名為 Apache Flume。 flume的特點： flume是一個分布式、可靠、和高可用的海量日志采集、聚合和傳輸的系統。支持在日志系統中定制各類數據發送方，用于收集數據;同時，Flume提供對數據進行簡單處理，并寫到各種數據接受方(比如文本、HDFS、Hbase等)的能力。 flume的數據流由事件(Event)貫穿始終。事件是Flume的基本數據單位，它攜帶日志數據(字節數組形式)并且攜帶有頭信息，這些Event由Agent外部的Source生成，當Source捕獲事件后會進行特定的格式化，然后Source會把事件推入(單個或多個)Channel中。你可以把Channel看作是一個緩沖區，它將保存事件直到Sink處理完該事件。Sink負責持久化日志或者把事件推向另一個Source。 flume的可靠性? 當節點出現故障時，日志能夠被傳送到其他節點上而不會丟失。Flume提供了三種級別的可靠性保障，從強到弱依次分別為：end-to-end（收到數據agent首先將event寫到磁盤上，當數據傳送成功后，再刪除；如果數據發送失敗，可以重新發送。），Store on failure（這也是scribe采用的策略，當數據接收方crash時，將數據寫到本地，待恢復后，繼續發送），Besteffort（數據發送到接收方后，不會進行確認）。 flume的可恢復性： 還是靠Channel。推薦使用FileChannel，事件持久化在本地文件系統里(性能較差)。? flume的一些核心概念：

Agent使用JVM 運行Flume。每臺機器運行一個agent，但是可以在一個agent中包含多個sources和sinks。

Client生產數據，運行在一個獨立的線程。

Source從Client收集數據，傳遞給Channel。

Sink從Channel收集數據，運行在一個獨立線程。

Channel連接 sources 和 sinks ，這個有點像一個隊列。

Events可以是日志記錄、 avro 對象等。

Flume以agent為最小的獨立運行單位。一個agent就是一個JVM。單agent由Source、Sink和Channel三大組件構成，如下圖：

　　值得注意的是，Flume提供了大量內置的Source、Channel和Sink類型。不同類型的Source,Channel和Sink可以自由組合。組合方式基于用戶設置的配置文件，非常靈活。比如：Channel可以把事件暫存在內存里，也可以持久化到本地硬盤上。Sink可以把日志寫入HDFS, HBase，甚至是另外一個Source等等。Flume支持用戶建立多級流，也就是說，多個agent可以協同工作，并且支持Fan-in、Fan-out、Contextual Routing、Backup Routes，這也正是NB之處。如下圖所示:

　　二、flume的官方網站在哪里？
　　http://flume.apache.org/

　　三、在哪里下載？

　　http://www.apache.org/dyn/closer.cgi/flume/1.5.0/apache-flume-1.5.0-bin.tar.gz

　　四、如何安裝？
　　　　1)將下載的flume包，解壓到/home/hadoop目錄中，你就已經完成了50%：）簡單吧

　　　　2)修改 flume-env.sh 配置文件,主要是JAVA_HOME變量設置

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

root@m1:/home/hadoop/flume-1.5.0-bin# cp conf/flume-env.sh.template conf/flume-env.sh root@m1:/home/hadoop/flume-1.5.0-bin# vi conf/flume-env.sh # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements.? See the NOTICE file # distributed with this work for additional information # regarding copyright ownership.? The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License.? You may obtain a copy of the License at # #???? http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # If this file is placed at FLUME_CONF_DIR/flume-env.sh, it will be sourced # during Flume startup. # Enviroment variables can be set here. JAVA_HOME=/usr/lib/jvm/java-7-oracle # Give Flume more memory and pre-allocate, enable remote monitoring via JMX #JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote" # Note that the Flume conf directory is always included in the classpath. #FLUME_CLASSPATH=""

　　　　3)驗證是否安裝成功

1 2 3 4 5 6 7

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng version Flume 1.5.0 Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git Revision: 8633220df808c4cd0c13d1cf0320454a94f1ea97 Compiled by hshreedharan on Wed May? 7 14:49:18 PDT 2014 From source?with checksum a01fe726e4380ba0c9f7a7d222db961f root@m1:/home/hadoop#

　　　　出現上面的信息，表示安裝成功了　　五、flume的案例 　　　　1)案例1：Avro 　　　　Avro可以發送一個給定的文件給Flume，Avro 源使用AVRO RPC機制。 a)創建agent配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

root@m1:/home/hadoop#vi /home/hadoop/flume-1.5.0-bin/conf/avro.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 4141 # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

b)啟動flume agent a1

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/avro.conf -n a1 -Dflume.root.logger=INFO,console

c)創建指定文件

1	root@m1:/home/hadoop# echo "hello world" > /home/hadoop/flume-1.5.0-bin/log.00

d)使用avro-client發送文件

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng avro-client -c . -H m1 -p 4141 -F /home/hadoop/flume-1.5.0-bin/log.00

f)在m1的控制臺，可以看到以下信息，注意最后一行：

1 2 3 4 5 6 7 8 9 10

root@m1:/home/hadoop/flume-1.5.0-bin/conf# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/avro.conf -n a1 -Dflume.root.logger=INFO,console Info: Sourcing environment configuration script /home/hadoop/flume-1.5.0-bin/conf/flume-env.sh Info: Including Hadoop libraries found via (/home/hadoop/hadoop-2.2.0/bin/hadoop) for?HDFS access Info: Excluding /home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath Info: Excluding /home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath ... 2014-08-10 10:43:25,112 (New I/O??worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x92464c4f, /192.168.1.50:59850 :> /192.168.1.50:4141] UNBOUND 2014-08-10 10:43:25,112 (New I/O??worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:171)] [id: 0x92464c4f, /192.168.1.50:59850 :> /192.168.1.50:4141] CLOSED 2014-08-10 10:43:25,112 (New I/O??worker #1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.channelClosed(NettyServer.java:209)] Connection to /192.168.1.50:59850 disconnected. 2014-08-10 10:43:26,718 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 68 65 6C 6C 6F 20 77 6F 72 6C 64??????????????? hello world }

? 　　　　2)案例2：Spool 　　　　Spool監測配置的目錄下新增的文件，并將文件中的數據讀取出來。需要注意兩點：　　　　1) 拷貝到spool目錄下的文件不可以再打開編輯。　　　　2) spool目錄下不可包含相應的子目錄 a)創建agent配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/spool.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= spooldir a1.sources.r1.channels = c1 a1.sources.r1.spoolDir = /home/hadoop/flume-1.5.0-bin/logs a1.sources.r1.fileHeader = true # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

b)啟動flume agent a1

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/spool.conf -n a1 -Dflume.root.logger=INFO,console

c)追加文件到/home/hadoop/flume-1.5.0-bin/logs目錄

1	root@m1:/home/hadoop# echo "spool test1" > /home/hadoop/flume-1.5.0-bin/logs/spool_text.log

d)在m1的控制臺，可以看到以下相關信息：

1 2 3 4 5 6 7 8 9 10 11

14/08/10?11:37:13 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:13 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:14 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file?/home/hadoop/flume-1.5.0-bin/logs/spool_text.log to /home/hadoop/flume-1.5.0-bin/logs/spool_text.log.COMPLETED 14/08/10?11:37:14 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:14 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:14 INFO sink.LoggerSink: Event: { headers:{file=/home/hadoop/flume-1.5.0-bin/logs/spool_text.log} body: 73 70 6F 6F 6C 20 74 65 73 74 31??????????????? spool test1 } 14/08/10?11:37:15 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:15 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:16 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:16 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown. 14/08/10?11:37:17 INFO source.SpoolDirectorySource: Spooling Directory Source runner has shutdown.

? 　　　　3)案例3：Exec 　　　　EXEC執行一個給定的命令獲得輸出的源,如果要使用tail命令，必選使得file足夠大才能看到輸出內容 ? a)創建agent配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/exec_tail.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= exec a1.sources.r1.channels = c1 a1.sources.r1.command?= tail?-F /home/hadoop/flume-1.5.0-bin/log_exec_tail # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

b)啟動flume agent a1

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/exec_tail.conf -n a1 -Dflume.root.logger=INFO,console

c)生成足夠多的內容在文件里

1	root@m1:/home/hadoop# for i in {1..100};do echo "exec tail$i" >> /home/hadoop/flume-1.5.0-bin/log_exec_tail;echo $i;sleep 0.1;done

e)在m1的控制臺，可以看到以下信息：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2014-08-10 10:59:25,513 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 20 74 65 73 74?????? exec?tail?test?} 2014-08-10 10:59:34,535 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 20 74 65 73 74?????? exec?tail?test?} 2014-08-10 11:01:40,557 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 31?????????????????? exec?tail1 } 2014-08-10 11:01:41,180 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 32?????????????????? exec?tail2 } 2014-08-10 11:01:41,180 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 33?????????????????? exec?tail3 } 2014-08-10 11:01:41,181 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 34?????????????????? exec?tail4 } 2014-08-10 11:01:41,181 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 35?????????????????? exec?tail5 } 2014-08-10 11:01:41,181 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 36?????????????????? exec?tail6 } .... .... .... 2014-08-10 11:01:51,550 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 39 36??????????????? exec?tail96 } 2014-08-10 11:01:51,550 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 39 37??????????????? exec?tail97 } 2014-08-10 11:01:51,551 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 39 38??????????????? exec?tail98 } 2014-08-10 11:01:51,551 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 39 39??????????????? exec?tail99 } 2014-08-10 11:01:51,551 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 65 78 65 63 20 74 61 69 6C 31 30 30???????????? exec?tail100 }

? 　　　　4)案例4：Syslogtcp 　　　　Syslogtcp監聽TCP的端口做為數據源 a)創建agent配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = localhost a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

b)啟動flume agent a1

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf -n a1 -Dflume.root.logger=INFO,console

c)測試產生syslog

1	root@m1:/home/hadoop# echo "hello idoall.org syslog" \| nc localhost 5140

d)在m1的控制臺，可以看到以下信息：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

14/08/10?11:41:45 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/hadoop/flume-1.5.0-bin/conf/syslog_tcp.conf 14/08/10?11:41:45 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1 14/08/10?11:41:45 INFO conf.FlumeConfiguration: Processing:k1 14/08/10?11:41:45 INFO conf.FlumeConfiguration: Processing:k1 14/08/10?11:41:45 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for?agents: [a1] 14/08/10?11:41:45 INFO node.AbstractConfigurationProvider: Creating channels 14/08/10?11:41:45 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type?memory 14/08/10?11:41:45 INFO node.AbstractConfigurationProvider: Created channel c1 14/08/10?11:41:45 INFO source.DefaultSourceFactory: Creating instance of source?r1, type?syslogtcp 14/08/10?11:41:45 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger 14/08/10?11:41:45 INFO node.AbstractConfigurationProvider: Channel c1 connected to [r1, k1] 14/08/10?11:41:45 INFO node.Application: Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.SyslogTcpSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6538b14 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } 14/08/10?11:41:45 INFO node.Application: Starting Channel c1 14/08/10?11:41:45 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: CHANNEL, name: c1: Successfully registered new MBean. 14/08/10?11:41:45 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started 14/08/10?11:41:45 INFO node.Application: Starting Sink k1 14/08/10?11:41:45 INFO node.Application: Starting Source r1 14/08/10?11:41:45 INFO source.SyslogTcpSource: Syslog TCP Source starting... 14/08/10?11:42:15 WARN source.SyslogUtils: Event created from Invalid Syslog data. 14/08/10?11:42:15 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }

? 　　　　5)案例5：JSONHandler a)創建agent配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/post_json.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= org.apache.flume.source.http.HTTPSource a1.sources.r1.port = 8888 a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

b)啟動flume agent a1

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/post_json.conf -n a1 -Dflume.root.logger=INFO,console

c)生成JSON 格式的POST request

1	root@m1:/home/hadoop# curl -X POST -d '[{ "headers" :{"a" : "a1","b" : "b1"},"body" : "idoall.org_body"}]' http://localhost:8888

d)在m1的控制臺，可以看到以下信息：

1 2 3 4 5 6 7 8 9 10 11

14/08/10?11:49:59 INFO node.Application: Starting Channel c1 14/08/10?11:49:59 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: CHANNEL, name: c1: Successfully registered new MBean. 14/08/10?11:49:59 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started 14/08/10?11:49:59 INFO node.Application: Starting Sink k1 14/08/10?11:49:59 INFO node.Application: Starting Source r1 14/08/10?11:49:59 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 14/08/10?11:49:59 INFO mortbay.log: jetty-6.1.26 14/08/10?11:50:00 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:8888 14/08/10?11:50:00 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: SOURCE, name: r1: Successfully registered new MBean. 14/08/10?11:50:00 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 14/08/10?12:14:32 INFO sink.LoggerSink: Event: { headers:{b=b1, a=a1} body: 69 64 6F 61 6C 6C 2E 6F 72 67 5F 62 6F 64 79??? idoall.org_body }

? 　　　　6)案例6：Hadoop sink 　　　　其中關于hadoop2.2.0部分的安裝部署，請參考文章《ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式環境部署》 a)創建agent配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/hdfs_sink.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = localhost a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type?= hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = hdfs://m1:9000/user/flume/syslogtcp a1.sinks.k1.hdfs.filePrefix = Syslog a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = minute # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

b)啟動flume agent a1

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/hdfs_sink.conf -n a1 -Dflume.root.logger=INFO,console

c)測試產生syslog

1	root@m1:/home/hadoop# echo "hello idoall flume -> hadoop testing one" \| nc localhost 5140

d)在m1的控制臺，可以看到以下信息：

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14/08/10?12:20:39 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: CHANNEL, name: c1: Successfully registered new MBean. 14/08/10?12:20:39 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started 14/08/10?12:20:39 INFO node.Application: Starting Sink k1 14/08/10?12:20:39 INFO node.Application: Starting Source r1 14/08/10?12:20:39 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: SINK, name: k1: Successfully registered new MBean. 14/08/10?12:20:39 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started 14/08/10?12:20:39 INFO source.SyslogTcpSource: Syslog TCP Source starting... 14/08/10?12:21:46 WARN source.SyslogUtils: Event created from Invalid Syslog data. 14/08/10?12:21:49 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false 14/08/10?12:21:49 INFO hdfs.BucketWriter: Creating hdfs://m1:9000/user/flume/syslogtcp//Syslog.1407644509504.tmp 14/08/10?12:22:20 INFO hdfs.BucketWriter: Closing hdfs://m1:9000/user/flume/syslogtcp//Syslog.1407644509504.tmp 14/08/10?12:22:20 INFO hdfs.BucketWriter: Close tries incremented 14/08/10?12:22:20 INFO hdfs.BucketWriter: Renaming hdfs://m1:9000/user/flume/syslogtcp/Syslog.1407644509504.tmp to hdfs://m1:9000/user/flume/syslogtcp/Syslog.1407644509504 14/08/10?12:22:20 INFO hdfs.HDFSEventSink: Writer callback called.

e)在m1上再打開一個窗口，去hadoop上檢查文件是否生成

1 2 3 4 5

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hadoop fs -ls /user/flume/syslogtcp Found 1 items -rw-r--r--?? 3 root supergroup??????? 155 2014-08-10 12:22 /user/flume/syslogtcp/Syslog.1407644509504 root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hadoop fs -cat /user/flume/syslogtcp/Syslog.1407644509504 SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable^;>Gv$hello idoall flume -> hadoop testing one

　　　　7)案例7：File Roll Sink a)創建agent配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/file_roll.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= syslogtcp a1.sources.r1.port = 5555 a1.sources.r1.host = localhost a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type?= file_roll a1.sinks.k1.sink.directory = /home/hadoop/flume-1.5.0-bin/logs # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

b)啟動flume agent a1

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/file_roll.conf -n a1 -Dflume.root.logger=INFO,console

c)測試產生log

1 2	root@m1:/home/hadoop# echo "hello idoall.org syslog" \| nc localhost 5555 root@m1:/home/hadoop# echo "hello idoall.org syslog 2" \| nc localhost 5555

d)查看/home/hadoop/flume-1.5.0-bin/logs下是否生成文件,默認每30秒生成一個新文件

1 2 3 4 5 6 7 8 9 10

root@m1:/home/hadoop# ll /home/hadoop/flume-1.5.0-bin/logs 總用量 272 drwxr-xr-x 3 root root?? 4096 Aug 10 12:50 ./ drwxr-xr-x 9 root root?? 4096 Aug 10 10:59 ../ -rw-r--r-- 1 root root???? 50 Aug 10 12:49 1407646164782-1 -rw-r--r-- 1 root root????? 0 Aug 10 12:49 1407646164782-2 -rw-r--r-- 1 root root????? 0 Aug 10 12:50 1407646164782-3 root@m1:/home/hadoop# cat /home/hadoop/flume-1.5.0-bin/logs/1407646164782-1 /home/hadoop/flume-1.5.0-bin/logs/1407646164782-2 hello idoall.org syslog hello idoall.org syslog 2

? 　　　　8)案例8：Replicating Channel Selector 　　　　Flume支持Fan out流從一個源到多個通道。有兩種模式的Fan out，分別是復制和復用。在復制的情況下，流的事件被發送到所有的配置通道。在復用的情況下，事件被發送到可用的渠道中的一個子集。Fan out流需要指定源和Fan out通道的規則。 ? 　　　　這次我們需要用到m1,m2兩臺機器 ? a)在m1創建replicating_Channel_Selector配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector.conf a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # Describe/configure the source a1.sources.r1.type?= syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = localhost a1.sources.r1.channels = c1 c2 a1.sources.r1.selector.type?= replicating # Describe the sink a1.sinks.k1.type?= avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname?= m1 a1.sinks.k1.port = 5555 a1.sinks.k2.type?= avro a1.sinks.k2.channel = c2 a1.sinks.k2.hostname?= m2 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type?= memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100

b)在m1創建replicating_Channel_Selector_avro配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector_avro.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

c)在m1上將2個配置文件復制到m2上一份

1 2

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector.conf root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector_avro.conf<br>

d)打開4個窗口，在m1和m2上同時啟動兩個flume agent

1 2

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector_avro.conf -n a1 -Dflume.root.logger=INFO,console root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/replicating_Channel_Selector.conf -n a1 -Dflume.root.logger=INFO,console

e)然后在m1或m2的任意一臺機器上，測試產生syslog

1	root@m1:/home/hadoop# echo "hello idoall.org syslog" \| nc localhost 5140

f)在m1和m2的sink窗口，分別可以看到以下信息,這說明信息得到了同步：

1 2 3 4 5 6 7 8

14/08/10?14:08:18 INFO ipc.NettyServer: Connection to /192.168.1.51:46844 disconnected. 14/08/10?14:08:52 INFO ipc.NettyServer: [id: 0x90f8fe1f, /192.168.1.50:35873 => /192.168.1.50:5555] OPEN 14/08/10?14:08:52 INFO ipc.NettyServer: [id: 0x90f8fe1f, /192.168.1.50:35873 => /192.168.1.50:5555] BOUND: /192.168.1.50:5555 14/08/10?14:08:52 INFO ipc.NettyServer: [id: 0x90f8fe1f, /192.168.1.50:35873 => /192.168.1.50:5555] CONNECTED: /192.168.1.50:35873 14/08/10?14:08:59 INFO ipc.NettyServer: [id: 0xd6318635, /192.168.1.51:46858 => /192.168.1.50:5555] OPEN 14/08/10?14:08:59 INFO ipc.NettyServer: [id: 0xd6318635, /192.168.1.51:46858 => /192.168.1.50:5555] BOUND: /192.168.1.50:5555 14/08/10?14:08:59 INFO ipc.NettyServer: [id: 0xd6318635, /192.168.1.51:46858 => /192.168.1.50:5555] CONNECTED: /192.168.1.51:46858 14/08/10?14:09:20 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 68 65 6C 6C 6F 20 69 64 6F 61 6C 6C 2E 6F 72 67 hello idoall.org }

? 　　　　9)案例9：Multiplexing Channel Selector a)在m1創建Multiplexing_Channel_Selector配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector.conf a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 # Describe/configure the source a1.sources.r1.type?= org.apache.flume.source.http.HTTPSource a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 c2 a1.sources.r1.selector.type?= multiplexing a1.sources.r1.selector.header = type #映射允許每個值通道可以重疊。默認值可以包含任意數量的通道。 a1.sources.r1.selector.mapping.baidu = c1 a1.sources.r1.selector.mapping.ali = c2 a1.sources.r1.selector.default = c1 # Describe the sink a1.sinks.k1.type?= avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname?= m1 a1.sinks.k1.port = 5555 a1.sinks.k2.type?= avro a1.sinks.k2.channel = c2 a1.sinks.k2.hostname?= m2 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type?= memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100

b)在m1創建Multiplexing_Channel_Selector_avro配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector_avro.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

c)將2個配置文件復制到m2上一份

1 2

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector.conf? root@m2:/home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector.conf root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector_avro.conf

d)打開4個窗口，在m1和m2上同時啟動兩個flume agent

1 2

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector_avro.conf -n a1 -Dflume.root.logger=INFO,console root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/Multiplexing_Channel_Selector.conf -n a1 -Dflume.root.logger=INFO,console

e)然后在m1或m2的任意一臺機器上，測試產生syslog

root@m1:/home/hadoop# curl -X POST -d '[{ "headers" :{"type" : "baidu"},"body" : "idoall_TEST1"}]' http://localhost:5140 && curl -X POST -d '[{ "headers" :{"type" : "ali"},"body" : "idoall_TEST2"}]' http://localhost:5140 && curl -X POST -d '[{ "headers" :{"type" : "qq"},"body" : "idoall_TEST3"}]' http://localhost:5140

f)在m1的sink窗口，可以看到以下信息：

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14/08/10?14:32:21 INFO node.Application: Starting Sink k1 14/08/10?14:32:21 INFO node.Application: Starting Source r1 14/08/10?14:32:21 INFO source.AvroSource: Starting Avro source?r1: { bindAddress: 0.0.0.0, port: 5555 }... 14/08/10?14:32:21 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: SOURCE, name: r1: Successfully registered new MBean. 14/08/10?14:32:21 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 14/08/10?14:32:21 INFO source.AvroSource: Avro source?r1 started. 14/08/10?14:32:36 INFO ipc.NettyServer: [id: 0xcf00eea6, /192.168.1.50:35916 => /192.168.1.50:5555] OPEN 14/08/10?14:32:36 INFO ipc.NettyServer: [id: 0xcf00eea6, /192.168.1.50:35916 => /192.168.1.50:5555] BOUND: /192.168.1.50:5555 14/08/10?14:32:36 INFO ipc.NettyServer: [id: 0xcf00eea6, /192.168.1.50:35916 => /192.168.1.50:5555] CONNECTED: /192.168.1.50:35916 14/08/10?14:32:44 INFO ipc.NettyServer: [id: 0x432f5468, /192.168.1.51:46945 => /192.168.1.50:5555] OPEN 14/08/10?14:32:44 INFO ipc.NettyServer: [id: 0x432f5468, /192.168.1.51:46945 => /192.168.1.50:5555] BOUND: /192.168.1.50:5555 14/08/10?14:32:44 INFO ipc.NettyServer: [id: 0x432f5468, /192.168.1.51:46945 => /192.168.1.50:5555] CONNECTED: /192.168.1.51:46945 14/08/10?14:34:11 INFO sink.LoggerSink: Event: { headers:{type=baidu} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 31???????????? idoall_TEST1 } 14/08/10?14:34:57 INFO sink.LoggerSink: Event: { headers:{type=qq} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 33???????????? idoall_TEST3 }

g)在m2的sink窗口，可以看到以下信息：

1 2 3 4 5 6 7 8 9 10 11 12 13

14/08/10?14:32:27 INFO node.Application: Starting Sink k1 14/08/10?14:32:27 INFO node.Application: Starting Source r1 14/08/10?14:32:27 INFO source.AvroSource: Starting Avro source?r1: { bindAddress: 0.0.0.0, port: 5555 }... 14/08/10?14:32:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: SOURCE, name: r1: Successfully registered new MBean. 14/08/10?14:32:27 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 14/08/10?14:32:27 INFO source.AvroSource: Avro source?r1 started. 14/08/10?14:32:36 INFO ipc.NettyServer: [id: 0x7c2f0aec, /192.168.1.50:38104 => /192.168.1.51:5555] OPEN 14/08/10?14:32:36 INFO ipc.NettyServer: [id: 0x7c2f0aec, /192.168.1.50:38104 => /192.168.1.51:5555] BOUND: /192.168.1.51:5555 14/08/10?14:32:36 INFO ipc.NettyServer: [id: 0x7c2f0aec, /192.168.1.50:38104 => /192.168.1.51:5555] CONNECTED: /192.168.1.50:38104 14/08/10?14:32:44 INFO ipc.NettyServer: [id: 0x3d36f553, /192.168.1.51:48599 => /192.168.1.51:5555] OPEN 14/08/10?14:32:44 INFO ipc.NettyServer: [id: 0x3d36f553, /192.168.1.51:48599 => /192.168.1.51:5555] BOUND: /192.168.1.51:5555 14/08/10?14:32:44 INFO ipc.NettyServer: [id: 0x3d36f553, /192.168.1.51:48599 => /192.168.1.51:5555] CONNECTED: /192.168.1.51:48599 14/08/10?14:34:33 INFO sink.LoggerSink: Event: { headers:{type=ali} body: 69 64 6F 61 6C 6C 5F 54 45 53 54 32???????????? idoall_TEST2 }

　　　　可以看到，根據header中不同的條件分布到不同的channel上 ? 　　　　10)案例10：Flume Sink Processors 　　　　failover的機器是一直發送給其中一個sink，當這個sink不可用的時候，自動發送到下一個sink。 a)在m1創建Flume_Sink_Processors配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors.conf a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 c2 #這個是配置failover的關鍵，需要有一個sink group a1.sinkgroups = g1 a1.sinkgroups.g1.sinks = k1 k2 #處理的類型是failover a1.sinkgroups.g1.processor.type?= failover #優先級，數字越大優先級越高，每個sink的優先級必須不相同 a1.sinkgroups.g1.processor.priority.k1 = 5 a1.sinkgroups.g1.processor.priority.k2 = 10 #設置為10秒，當然可以根據你的實際狀況更改成更快或者很慢 a1.sinkgroups.g1.processor.maxpenalty = 10000 # Describe/configure the source a1.sources.r1.type?= syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 c2 a1.sources.r1.selector.type?= replicating # Describe the sink a1.sinks.k1.type?= avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname?= m1 a1.sinks.k1.port = 5555 a1.sinks.k2.type?= avro a1.sinks.k2.channel = c2 a1.sinks.k2.hostname?= m2 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.channels.c2.type?= memory a1.channels.c2.capacity = 1000 a1.channels.c2.transactionCapacity = 100

b)在m1創建Flume_Sink_Processors_avro配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors_avro.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

c)將2個配置文件復制到m2上一份

1 2

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors.conf? root@m2:/home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors.conf root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors_avro.conf

d)打開4個窗口，在m1和m2上同時啟動兩個flume agent

1 2

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors_avro.conf -n a1 -Dflume.root.logger=INFO,console root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors.conf -n a1 -Dflume.root.logger=INFO,console

e)然后在m1或m2的任意一臺機器上，測試產生log

1	root@m1:/home/hadoop# echo "idoall.org test1 failover" \| nc localhost 5140

f)因為m2的優先級高，所以在m2的sink窗口，可以看到以下信息，而m1沒有：

1 2 3 4 5

14/08/10?15:02:46 INFO ipc.NettyServer: Connection to /192.168.1.51:48692 disconnected. 14/08/10?15:03:12 INFO ipc.NettyServer: [id: 0x09a14036, /192.168.1.51:48704 => /192.168.1.51:5555] OPEN 14/08/10?15:03:12 INFO ipc.NettyServer: [id: 0x09a14036, /192.168.1.51:48704 => /192.168.1.51:5555] BOUND: /192.168.1.51:5555 14/08/10?15:03:12 INFO ipc.NettyServer: [id: 0x09a14036, /192.168.1.51:48704 => /192.168.1.51:5555] CONNECTED: /192.168.1.51:48704 14/08/10?15:03:26 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 }

g)這時我們停止掉m2機器上的sink(ctrl+c)，再次輸出測試數據：

1	root@m1:/home/hadoop# echo "idoall.org test2 failover" \| nc localhost 5140

h)可以在m1的sink窗口，看到讀取到了剛才發送的兩條測試數據：

1 2 3 4 5 6

14/08/10?15:02:46 INFO ipc.NettyServer: Connection to /192.168.1.51:47036 disconnected. 14/08/10?15:03:12 INFO ipc.NettyServer: [id: 0xbcf79851, /192.168.1.51:47048 => /192.168.1.50:5555] OPEN 14/08/10?15:03:12 INFO ipc.NettyServer: [id: 0xbcf79851, /192.168.1.51:47048 => /192.168.1.50:5555] BOUND: /192.168.1.50:5555 14/08/10?15:03:12 INFO ipc.NettyServer: [id: 0xbcf79851, /192.168.1.51:47048 => /192.168.1.50:5555] CONNECTED: /192.168.1.51:47048 14/08/10?15:07:56 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 } 14/08/10?15:07:56 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 }

i)我們再在m2的sink窗口中，啟動sink：

1	root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/Flume_Sink_Processors_avro.conf -n a1 -Dflume.root.logger=INFO,console

j)輸入兩批測試數據：

1	root@m1:/home/hadoop# echo "idoall.org test3 failover" \| nc localhost 5140 && echo "idoall.org test4 failover" \| nc localhost 5140

k)在m2的sink窗口，我們可以看到以下信息，因為優先級的關系，log消息會再次落到m2上：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

14/08/10?15:09:47 INFO node.Application: Starting Sink k1 14/08/10?15:09:47 INFO node.Application: Starting Source r1 14/08/10?15:09:47 INFO source.AvroSource: Starting Avro source?r1: { bindAddress: 0.0.0.0, port: 5555 }... 14/08/10?15:09:47 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for?type: SOURCE, name: r1: Successfully registered new MBean. 14/08/10?15:09:47 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started 14/08/10?15:09:47 INFO source.AvroSource: Avro source?r1 started. 14/08/10?15:09:54 INFO ipc.NettyServer: [id: 0x96615732, /192.168.1.51:48741 => /192.168.1.51:5555] OPEN 14/08/10?15:09:54 INFO ipc.NettyServer: [id: 0x96615732, /192.168.1.51:48741 => /192.168.1.51:5555] BOUND: /192.168.1.51:5555 14/08/10?15:09:54 INFO ipc.NettyServer: [id: 0x96615732, /192.168.1.51:48741 => /192.168.1.51:5555] CONNECTED: /192.168.1.51:48741 14/08/10?15:09:57 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 } 14/08/10?15:10:43 INFO ipc.NettyServer: [id: 0x12621f9a, /192.168.1.50:38166 => /192.168.1.51:5555] OPEN 14/08/10?15:10:43 INFO ipc.NettyServer: [id: 0x12621f9a, /192.168.1.50:38166 => /192.168.1.51:5555] BOUND: /192.168.1.51:5555 14/08/10?15:10:43 INFO ipc.NettyServer: [id: 0x12621f9a, /192.168.1.50:38166 => /192.168.1.51:5555] CONNECTED: /192.168.1.50:38166 14/08/10?15:10:43 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 } 14/08/10?15:10:43 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 }

　　　　11)案例11：Load balancing Sink Processor load balance type和failover不同的地方是，load balance有兩個配置，一個是輪詢，一個是隨機。兩種情況下如果被選擇的sink不可用，就會自動嘗試發送到下一個可用的sink上面。 a)在m1創建Load_balancing_Sink_Processors配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors.conf a1.sources = r1 a1.sinks = k1 k2 a1.channels = c1 #這個是配置Load balancing的關鍵，需要有一個sink group a1.sinkgroups = g1 a1.sinkgroups.g1.sinks = k1 k2 a1.sinkgroups.g1.processor.type?= load_balance a1.sinkgroups.g1.processor.backoff = true a1.sinkgroups.g1.processor.selector = round_robin # Describe/configure the source a1.sources.r1.type?= syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type?= avro a1.sinks.k1.channel = c1 a1.sinks.k1.hostname?= m1 a1.sinks.k1.port = 5555 a1.sinks.k2.type?= avro a1.sinks.k2.channel = c1 a1.sinks.k2.hostname?= m2 a1.sinks.k2.port = 5555 # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100

b)在m1創建Load_balancing_Sink_Processors_avro配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors_avro.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= avro a1.sources.r1.channels = c1 a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 5555 # Describe the sink a1.sinks.k1.type?= logger # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

c)將2個配置文件復制到m2上一份

1 2

root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors.conf? root@m2:/home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors.conf root@m1:/home/hadoop/flume-1.5.0-bin# scp -r /home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors_avro.conf root@m2:/home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors_avro.conf

d)打開4個窗口，在m1和m2上同時啟動兩個flume agent

1 2

root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors_avro.conf -n a1 -Dflume.root.logger=INFO,console root@m1:/home/hadoop# /home/hadoop/flume-1.5.0-bin/bin/flume-ng agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/Load_balancing_Sink_Processors.conf -n a1 -Dflume.root.logger=INFO,console

e)然后在m1或m2的任意一臺機器上，測試產生log，一行一行輸入，輸入太快，容易落到一臺機器上

1 2 3 4

root@m1:/home/hadoop# echo "idoall.org test1" | nc localhost 5140 root@m1:/home/hadoop# echo "idoall.org test2" | nc localhost 5140 root@m1:/home/hadoop# echo "idoall.org test3" | nc localhost 5140 root@m1:/home/hadoop# echo "idoall.org test4" | nc localhost 5140

f)在m1的sink窗口，可以看到以下信息：

1 2

14/08/10?15:35:29 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 32 idoall.org test2 } 14/08/10?15:35:33 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 34 idoall.org test4 }

g)在m2的sink窗口，可以看到以下信息：

1 2

14/08/10?15:35:27 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 31 idoall.org test1 } 14/08/10?15:35:29 INFO sink.LoggerSink: Event: { headers:{Severity=0, flume.syslog.status=Invalid, Facility=0} body: 69 64 6F 61 6C 6C 2E 6F 72 67 20 74 65 73 74 33 idoall.org test3 }

　　　　說明輪詢模式起到了作用。　　　　12)案例12：Hbase sink ? a)在測試之前，請先參考《ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式環境部署》將hbase啟動 b)然后將以下文件復制到flume中：

1 2 3 4 5 6 7 8

cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/protobuf-java-2.5.0.jar /home/hadoop/flume-1.5.0-bin/lib cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-client-0.96.2-hadoop2.jar /home/hadoop/flume-1.5.0-bin/lib cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-common-0.96.2-hadoop2.jar /home/hadoop/flume-1.5.0-bin/lib cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-protocol-0.96.2-hadoop2.jar /home/hadoop/flume-1.5.0-bin/lib cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-server-0.96.2-hadoop2.jar /home/hadoop/flume-1.5.0-bin/lib cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar /home/hadoop/flume-1.5.0-bin/lib cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar /home/hadoop/flume-1.5.0-bin/lib@@@ cp?/home/hadoop/hbase-0.96.2-hadoop2/lib/htrace-core-2.04.jar /home/hadoop/flume-1.5.0-bin/lib

c)確保test_idoall_org表在hbase中已經存在，test_idoall_org表的格式以及字段請參考《ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式環境部署》中關于hbase部分的建表代碼。 d)在m1創建hbase_simple配置文件

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

root@m1:/home/hadoop# vi /home/hadoop/flume-1.5.0-bin/conf/hbase_simple.conf a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type?= syslogtcp a1.sources.r1.port = 5140 a1.sources.r1.host = localhost a1.sources.r1.channels = c1 # Describe the sink a1.sinks.k1.type?= logger a1.sinks.k1.type?= hbase a1.sinks.k1.table = test_idoall_org a1.sinks.k1.columnFamily = name a1.sinks.k1.column = idoall a1.sinks.k1.serializer =? org.apache.flume.sink.hbase.RegexHbaseEventSerializer a1.sinks.k1.channel = memoryChannel # Use a channel which buffers events in memory a1.channels.c1.type?= memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

e)啟動flume agent

1	/home/hadoop/flume-1.5.0-bin/bin/flume-ng?agent -c . -f /home/hadoop/flume-1.5.0-bin/conf/hbase_simple.conf -n a1 -Dflume.root.logger=INFO,console

f)測試產生syslog

1	root@m1:/home/hadoop# echo "hello idoall.org from flume" \| nc localhost 5140

g)這時登錄到hbase中，可以發現新數據已經插入

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell 2014-08-10 16:09:48,984 INFO? [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>'?for?list of supported commands. Type "exit<RETURN>"?to leave the HBase Shell Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014 hbase(main):001:0> list TABLE????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in?[jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in?[jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. hbase2hive_idoall????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? hive2hbase_idoall????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? test_idoall_org??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? 3 row(s) in?2.6880 seconds => ["hbase2hive_idoall", "hive2hbase_idoall", "test_idoall_org"] hbase(main):002:0> scan "test_idoall_org" ROW??????????????????????????????????????????????????? COLUMN+CELL???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ?10086???????????????????????????????????????????????? column=name:idoall, timestamp=1406424831473, value=idoallvalue????????????????????????????????????????????????????????????????????????????????????????????????? 1 row(s) in?0.0550 seconds hbase(main):003:0> scan "test_idoall_org" ROW??????????????????????????????????????????????????? COLUMN+CELL???????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????? ?10086???????????????????????????????????????????????? column=name:idoall, timestamp=1406424831473, value=idoallvalue????????????????????????????????????????????????????????????????????????????????????????????????? ?1407658495588-XbQCOZrKK8-0??????????????????????????? column=name:payload, timestamp=1407658498203, value=hello idoall.org from flume???????????????????????????????????????????????????????????????????????????????? 2 row(s) in?0.0200 seconds hbase(main):004:0> quit

經過這么多flume的例子測試，如果你全部做完后，會發現flume的功能真的很強大，可以進行各種搭配來完成你想要的工作，俗話說師傅領進門，修行在個人，如何能夠結合你的產品業務，將flume更好的應用起來，快去動手實踐吧。

轉載于:https://www.cnblogs.com/AloneSword/p/4875126.html

總結

以上是生活随笔為你收集整理的Flume1.5.0的安装、部署、简单应用(含伪分布式、与hadoop2.2.0、hbase0.96的案例)的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：使用Microsoft Enterpri
下一篇： BZOJ1023 [SHOI2008]c