--- title: Flume updated: 2022-05-24 18:29:31Z created: 2021-05-04 14:58:11Z --- # Apache Flume Streaming data into cluster Developed with Hadoop in mind - Build-in sinks fir HDFS and Hbase - Originally made to handle log aggregation ***Flume is buffering data before delivering to the cluster.*** ## Anatomy of a Flume Agent and Flow ![Flume Agent](https://flume.apache.org/_images/DevGuide_image00.png?) ### Three components of a Flume Agent: - Source - Where data is comming from - Optionally Channel Selectors and Interceptors - Selectors: - based on some selection the data is sent somewhere - Interceptors: - Data can add or reschape the data - Channel - how the data is transferred between Source and Sink (via memory or files) - Sink - Where the data is going - multiple Sinks and can be organized into Sink Groups - A Sink can connect to only ***one*** Channel - Channel is notified te delete a message once the Sink processes it ### Build-in Source Types: - Spooling directory, Avro (specific Hadoop format), Kafka, Exec (command-line), Thrift, Netcat (tcp/ip), HTTP, Custom, etc ### Build-in Sink Types: - HDFS, Hive, HBase, Avro, Thrift, Elasticsearch, Kafka,Custom Flume Example ![example](../_resources/FlumeExample.png) First layer close to source and proces data. ie are in a local datacenter. The second layer collects from and incests into the sink. Between first amnd second layer of agents are source AVRO and Sink AVRO to transfer data very efficient.