2.3 KiB
2.3 KiB
title | updated | created |
---|---|---|
NiFi | 2022-05-24 18:29:11Z | 2022-05-21 13:19:51Z |
What is Apache NiFi used for:
- reliable and secure transfer of data between systems
- delivery of data from sources to analytics platforms => top use case
- enrichment and preparation of data:
- conversion between formats => on thing at the time (json => csv)
- extraction/parsing
- route decisions => get value of json field and make decision on that value: send json to system A other wise to system B
What is Apache NiFi NOT used for?
- distribution computation
- complex event processing
- joins / complex rolling window operations
Hadoop ecosystem integration examples
HDFS ingest
- MergeContent
- merges into appropriately sized files for HDFS
- based on size, number of messages, and time
- UpdateAttribute
- sets the HDFS directory and filename
- use expression language to dynamically bin by date
- PutHDFS
HDFS Retrieval
- ListHDFS
- perioddically perform listing on HDFS directory
- produces FlowFile per HDFS file
- flow only contains HDFS path & filename
- FetchHDFS
HBase integration
- HBAse ingest - single cell =? table, row id, col family and col qualifier
- FlowFile content becomes the cell value
- HBase Ingest - Full row
- Row id can be a field in JSON or FlowFile attribute
Kafka integration
- PutKafka
- Provide Broker and topic name
- publishes FlowFile content as one or more messages
- Ability to send large delimited content, slit into messages bu NiFi
- GetKafka
- Provide ZK connection string and topic name
- produces a FlowFile for each message consumed
Stream Processing Integration
- Spark Streaming - NiFi Spark Receiver
- Storm - NiFi Spout
- Flink - NiFi Source & Sink
- Apex - NiFi Input Operations & Output Operations
- and many more integrations available