第四章:Flume的安装和配置
安装Flume
yum install flume
在HDFS中创建flume目录,以存放来自本地的log日志文件(此\/user\/flume就是flume.conf中path的路径)
hadoop fs -mkdir /user/flume
在本地创建一个log日志文件或者txt文件(如在\/tmp下创建一个a.txt文件,随意保存点内容)
进入Flume的默认配置路径修改flume.conf
cd /usr/lib/flume/conf vi flume.conf
一、监控文件夹:
## Name the components on this agent agent1.sources = source1 agent1.sinks = sink1 agent1.channels = ch1 # Describe/configure the source,下面的spoolDir一定要写本地存放log或txt的文件夹名,flume上传会将目录下所有log或txt文件都上传到HDFS中!!!!! agent1.sources.source1.channels = ch1 agent1.sources.source1.type = spooldir agent1.sources.source1.spoolDir =/tmp agent1.sources.source1.ignorePattern = .*dat.* agent1.sources.source1.fileHeader = true agent1.sources.source1.deserializer.outputCharset = UTF-8 # Describe the sink,注意下面的path为Active Name Node!! agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.hdfs.path = hdfs://<Active Name Node IP>:8020/user/flume/ agent1.sinks.sink1.hdfs.hdfs.rollInterval = 60 agent1.sinks.sink1.hdfs.hdfs.rollSize = 1024 agent1.sinks.sink1.channel=ch1 # Use a channel which buffers events in memory agent1.channels.ch1.type = file
退回到usr\/lib\/flume目录下,执行以下flume上传命令
bin/flume-ng agent -n agent1 -c conf -f conf/flume.conf -Dflume.root.logger=INFO,console
这里可能会报以下错误,说明flume.conf文件位置放置错误,将\/usr\/lib\/flume\/conf中的4个配置文件copy到\/usr\/lib\/flume\/bin\/conf目录中即可。
另外还有可能报Error: Could not find or load main class org.apache.flume.tools.GetJavaProperty这个错误,如下图,则说明flume-ng的内容不匹配当前的class文件,解决办法就是将flume-ng中的内容覆盖掉原来的即可。
检查HDFS目录中\/user\/flume是否已经有刚刚上传的a.txt文件
hadoop fs -ls /user/flume hadoop fs -cat /user/flume/*
二、监控文件:
# a.conf:A single-node Flume configuration # Name the components on this agent a2.sources=r2 a2.sinks=k2 a2.channels=c2 # Describe configure the source a2.sources.r2.type=exec a2.sources.r2.command=tail -f /root/flume/1.txt # Describe the sink a2.sinks.k2.type=hdfs a2.sinks.k2.hdfs.path=hdfs://<Active Name Node IP>:8020/user/flume/file a2.sinks.k2.hdfs.filePrefix=data1 a2.sinks.k2.hdfs.round=true a2.sinks.k2.hdfs.rollSize=0 a2.sinks.k2.hdfs.rollCount=0 a2.sinks.k2.hdfs.batchSize=1000 a2.sinks.k2.hdfs.roundValue=1 a2.sinks.k2.hdfs.fileType=DataStream #Use a channel which buffers events in memory a2.channels.c2.type=memory a2.channels.c2.capacity=100000 a2.channels.c2.transactionCapacity=1000 #Bind the source and sink to the channel a2.sources.r2.channels=c2 a2.sinks.k2.channel=c2
退回到usr\/lib\/flume目录下,执行以下flume上传命令
bin/flume-ng agent -n a2 -c conf -f conf/flume.conf -Dflume.root.logger=INFO,console