freebsd-ports/misc/arrow-datafusion/pkg-descr

22 lines
1.2 KiB
Text

DataFusion is an extensible query planning, optimization, and execution
framework, written in Rust, that uses Apache Arrow as its in-memory format.
Features:
- SQL query planner with support for multiple SQL dialects
- DataFrame API
- Parquet, CSV, JSON, and Avro file formats are supported natively. Custom
file formats can be supported by implementing a `TableProvider` trait.
- Supports popular object stores, including AWS S3, Azure Blob
Storage, and Google Cloud Storage. There are extension points for implementing
custom object stores.
Use Cases:
DataFusion is modular in design with many extension points and can be
used without modification as an embedded query engine and can also provide
a foundation for building new systems. Here are some example use cases:
- DataFusion can be used as a SQL query planner and query optimizer, providing
optimized logical plans that can then be mapped to other execution engines.
- DataFusion is used to create modern, fast and efficient data
pipelines, ETL processes, and database systems, which need the
performance of Rust and Apache Arrow and want to provide their users
the convenience of an SQL interface or a DataFrame API.