83 lines
3.3 KiB
Markdown
83 lines
3.3 KiB
Markdown
|
---
|
||
|
title: ELTvsETL
|
||
|
updated: 2021-05-04 14:58:11Z
|
||
|
created: 2021-05-04 14:58:11Z
|
||
|
---
|
||
|
|
||
|
# **ETL**
|
||
|
- Extraction (from source to staging area)
|
||
|
- Transformation (reformatted for datawarehouse purpose)
|
||
|
- Loaded (from staging area into datawarehous)
|
||
|
|
||
|
![ETL process](images/ETL.png)
|
||
|
## ETL == Pipeline approach
|
||
|
|
||
|
Many tools available: DataStage, Informatica, or SQL Server Integration Services (SSIS) ==> mostly for transforming
|
||
|
All work in a similair way: read from source, perform changes, write to target
|
||
|
ETL steps can be performed multiple times for a particulair load.
|
||
|
Transformation step can add business logic
|
||
|
Also everything is done in one single step.
|
||
|
|
||
|
An ETL process should have data flowing steadily through it. Risk of running out of memory and/or disk space. (sorting is a classic example - holds entire dataset - if already sorted then rightaway in datawarehouse) Should have the possibility of buffering in pipeline.
|
||
|
|
||
|
Many ETL tools facilitate parallel execution == multiple pipelines.
|
||
|
ETL can be better performing, but needs more training and developement cost.
|
||
|
When ETL is used:
|
||
|
- source and target differ in data types
|
||
|
- volumes of data are small
|
||
|
- transformations are compute-intensive
|
||
|
- data is structured
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
# **ELT**
|
||
|
- Extract
|
||
|
- Load
|
||
|
- Transform (so data is transformed **AFTER** loading)
|
||
|
Target system is performing the transformation.
|
||
|
![ELT process](images/ELT.png)
|
||
|
## ELT == NO transformation engine
|
||
|
|
||
|
When ELT is used:
|
||
|
- source and target share same data types (ie DB2 source and target)
|
||
|
- large volumes
|
||
|
- target database is able to handle large data volumes
|
||
|
- data is unstructured
|
||
|
|
||
|
|
||
|
ETL or ELT?? Depends on priorities.
|
||
|
|
||
|
ELT is requires a powerful system in place as target. More such systems available because of analytics.
|
||
|
|
||
|
ie a perfect platform is Hadoop but needs carefully planning
|
||
|
|
||
|
| | ETL | ELT |
|
||
|
| --------------- | ------------------------------------------ | --------------------------------------------- |
|
||
|
| **Maturity** | for 20 years avaliable | not as well adapted |
|
||
|
| | expertise | works not well with structured data |
|
||
|
| **Flexibility** | Older ETL not suited for unstructured data | can handle structured and unstructured data |
|
||
|
| | Remap data => reload all preveous data! | In target data more flexible |
|
||
|
| **Hardware** | mostly own engine hardware | takes cumpute power from existing hardware |
|
||
|
| | modern ETL tools run in cloud | |
|
||
|
| **Better for** | - structured data | - Unstructured data |
|
||
|
| | - Smaller volumes and complex | - large volumes and less complex computations |
|
||
|
| | - On-premise relational databases | - Cloud environment |
|
||
|
| | | - Data lake |
|
||
|
|
||
|
## **Merge ETL and ELT approach**
|
||
|
Extract, Transform, Load, Transform
|
||
|
|
||
|
|
||
|
## **Tools**
|
||
|
1. https://www.getdbt.com/
|
||
|
2. https://fivetran.com/
|
||
|
3. https://www.stitchdata.com/
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
Sources:
|
||
|
https://dzone.com/articles/etl-vs-elt-differences-explained
|