Summaries/Apache/NiFi.md

69 lines
2.3 KiB
Markdown
Raw Normal View History

2022-08-09 21:04:44 +02:00
---
title: NiFi
updated: 2022-05-24 18:29:11Z
created: 2022-05-21 13:19:51Z
---
## What is Apache NiFi used for:
- reliable and secure transfer of data between systems
- delivery of data from sources to analytics platforms => top use case
- enrichment and preparation of data:
- conversion between formats => on thing at the time (json => csv)
- extraction/parsing
- route decisions => get value of json field and make decision on that value: send json to system A other wise to system B
## What is Apache NiFi **NOT** used for?
- distribution computation
- complex event processing
- joins / complex rolling window operations
## Hadoop ecosystem integration examples
### HDFS ingest
- MergeContent
- merges into appropriately sized files for HDFS
- based on size, number of messages, and time
- UpdateAttribute
- sets the HDFS directory and filename
- use expression language to dynamically bin by date
- PutHDFS
- write FlowFile content to HDFS
- support conflict resolution strategy and Kerboros authentication
![c45b3dcdac107122793b14d8bdd76a0f.png](../_resources/c45b3dcdac107122793b14d8bdd76a0f.png)
### HDFS Retrieval
- ListHDFS
- perioddically perform listing on HDFS directory
- produces FlowFile per HDFS file
- flow only contains HDFS path & filename
- FetchHDFS
- retriece a file form HDFS
- use incoming FlowFiles to dynamically fetch
![a6ea2a07d58fac8a6739c7379c1b92f6.png](../_resources/a6ea2a07d58fac8a6739c7379c1b92f6.png)
### HBase integration
- HBAse ingest - single cell =? table, row id, col family and col qualifier
- FlowFile content becomes the cell value
- HBase Ingest - Full row
- Row id can be a field in JSON or FlowFile attribute
## Kafka integration
- PutKafka
- Provide Broker and topic name
- publishes FlowFile content as one or more messages
- Ability to send large delimited content, slit into messages bu NiFi
- GetKafka
- Provide ZK connection string and topic name
- produces a FlowFile for each message consumed
## Stream Processing Integration
![1ce08014a43470c07e5314f1d69c6771.png](../_resources/1ce08014a43470c07e5314f1d69c6771.png)
- Spark Streaming - NiFi Spark Receiver
- Storm - NiFi Spout
- Flink - NiFi Source & Sink
- Apex - NiFi Input Operations & Output Operations
- and many more integrations available
[NiFi Videos](https://nifi.apache.org/videos.html)