Summaries/Apache/Apache Hive/Hive.md

20 lines
1.2 KiB
Markdown
Raw Normal View History

2022-08-09 21:04:44 +02:00
---
title: Hive
updated: 2022-05-24 18:43:47Z
created: 2022-05-24 18:35:26Z
---
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
Built on top of Apache Hadoop™, Hive provides the following features:
- Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
- A mechanism to impose structure on a variety of data formats
- Access to files stored either directly in **Apache HDFS™** or in other data storage systems such as **Apache HBase™**
- Query execution via **Apache Tez™, Apache Spark™**, or **MapReduce**
- Procedural language with HPL-SQL
- Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.
Hive's SQL can also be extended with user code via user defined functions (**UDF**s), user defined aggregates (UDAFs), and user defined table functions (UDTFs).
Hive is not designed for online transaction processing (OLTP) workloads. It is best used for traditional data warehousing tasks.