---
title: Hive
updated: 2022-05-24 18:43:47Z
created: 2022-05-24 18:35:26Z
---

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Built on top of Apache Hadoop™, Hive provides the following features:

- Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
- A mechanism to impose structure on a variety of data formats
- Access to files stored either directly in **Apache HDFS™** or in other data storage systems such as **Apache HBase™** 
- Query execution via **Apache Tez™, Apache Spark™**, or **MapReduce**
- Procedural language with HPL-SQL
- Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.

Hive's SQL can also be extended with user code via user defined functions (**UDF**s), user defined aggregates (UDAFs), and user defined table functions (UDTFs).

Hive is not designed for online transaction processing (OLTP) workloads. It is best used for traditional data warehousing tasks.