22 lines
1.2 KiB
Text
22 lines
1.2 KiB
Text
DataFusion is an extensible query planning, optimization, and execution
|
|
framework, written in Rust, that uses Apache Arrow as its in-memory format.
|
|
|
|
Features:
|
|
- SQL query planner with support for multiple SQL dialects
|
|
- DataFrame API
|
|
- Parquet, CSV, JSON, and Avro file formats are supported natively. Custom
|
|
file formats can be supported by implementing a `TableProvider` trait.
|
|
- Supports popular object stores, including AWS S3, Azure Blob
|
|
Storage, and Google Cloud Storage. There are extension points for implementing
|
|
custom object stores.
|
|
|
|
Use Cases:
|
|
DataFusion is modular in design with many extension points and can be
|
|
used without modification as an embedded query engine and can also provide
|
|
a foundation for building new systems. Here are some example use cases:
|
|
- DataFusion can be used as a SQL query planner and query optimizer, providing
|
|
optimized logical plans that can then be mapped to other execution engines.
|
|
- DataFusion is used to create modern, fast and efficient data
|
|
pipelines, ETL processes, and database systems, which need the
|
|
performance of Rust and Apache Arrow and want to provide their users
|
|
the convenience of an SQL interface or a DataFrame API.
|