Summaries/Apache/Apache Hive/Hive.md

1.2 KiB

title updated created
Hive 2022-05-24 18:43:47Z 2022-05-24 18:35:26Z

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Built on top of Apache Hadoop™, Hive provides the following features:

  • Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
  • A mechanism to impose structure on a variety of data formats
  • Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
  • Query execution via Apache Tez™, Apache Spark™, or MapReduce
  • Procedural language with HPL-SQL
  • Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.

Hive's SQL can also be extended with user code via user defined functions (UDFs), user defined aggregates (UDAFs), and user defined table functions (UDTFs).

Hive is not designed for online transaction processing (OLTP) workloads. It is best used for traditional data warehousing tasks.