69 lines
2.3 KiB
Markdown
69 lines
2.3 KiB
Markdown
---
|
|
title: NiFi
|
|
updated: 2022-05-24 18:29:11Z
|
|
created: 2022-05-21 13:19:51Z
|
|
---
|
|
|
|
## What is Apache NiFi used for:
|
|
- reliable and secure transfer of data between systems
|
|
- delivery of data from sources to analytics platforms => top use case
|
|
- enrichment and preparation of data:
|
|
- conversion between formats => on thing at the time (json => csv)
|
|
- extraction/parsing
|
|
- route decisions => get value of json field and make decision on that value: send json to system A other wise to system B
|
|
|
|
## What is Apache NiFi **NOT** used for?
|
|
- distribution computation
|
|
- complex event processing
|
|
- joins / complex rolling window operations
|
|
|
|
## Hadoop ecosystem integration examples
|
|
|
|
|
|
### HDFS ingest
|
|
- MergeContent
|
|
- merges into appropriately sized files for HDFS
|
|
- based on size, number of messages, and time
|
|
- UpdateAttribute
|
|
- sets the HDFS directory and filename
|
|
- use expression language to dynamically bin by date
|
|
- PutHDFS
|
|
- write FlowFile content to HDFS
|
|
- support conflict resolution strategy and Kerboros authentication
|
|
![c45b3dcdac107122793b14d8bdd76a0f.png](../_resources/c45b3dcdac107122793b14d8bdd76a0f.png)
|
|
|
|
### HDFS Retrieval
|
|
- ListHDFS
|
|
- perioddically perform listing on HDFS directory
|
|
- produces FlowFile per HDFS file
|
|
- flow only contains HDFS path & filename
|
|
- FetchHDFS
|
|
- retriece a file form HDFS
|
|
- use incoming FlowFiles to dynamically fetch
|
|
![a6ea2a07d58fac8a6739c7379c1b92f6.png](../_resources/a6ea2a07d58fac8a6739c7379c1b92f6.png)
|
|
|
|
### HBase integration
|
|
- HBAse ingest - single cell =? table, row id, col family and col qualifier
|
|
- FlowFile content becomes the cell value
|
|
- HBase Ingest - Full row
|
|
- Row id can be a field in JSON or FlowFile attribute
|
|
|
|
## Kafka integration
|
|
- PutKafka
|
|
- Provide Broker and topic name
|
|
- publishes FlowFile content as one or more messages
|
|
- Ability to send large delimited content, slit into messages bu NiFi
|
|
- GetKafka
|
|
- Provide ZK connection string and topic name
|
|
- produces a FlowFile for each message consumed
|
|
|
|
## Stream Processing Integration
|
|
![1ce08014a43470c07e5314f1d69c6771.png](../_resources/1ce08014a43470c07e5314f1d69c6771.png)
|
|
|
|
- Spark Streaming - NiFi Spark Receiver
|
|
- Storm - NiFi Spout
|
|
- Flink - NiFi Source & Sink
|
|
- Apex - NiFi Input Operations & Output Operations
|
|
- and many more integrations available
|
|
|
|
[NiFi Videos](https://nifi.apache.org/videos.html) |