Summaries/Apache/Apache Hive/Hive.md at 37c133aabadea349572dd735f7eba4d5448095c6

1.2 KiB

Raw Blame History

title	updated	created
Hive	2022-05-24 18:43:47Z	2022-05-24 18:35:26Z

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

Built on top of Apache Hadoop™, Hive provides the following features:

Tools to enable easy access to data via SQL, thus enabling data warehousing tasks such as extract/transform/load (ETL), reporting, and data analysis.
A mechanism to impose structure on a variety of data formats
Access to files stored either directly in Apache HDFS™ or in other data storage systems such as Apache HBase™
Query execution via Apache Tez™, Apache Spark™, or MapReduce
Procedural language with HPL-SQL
Sub-second query retrieval via Hive LLAP, Apache YARN and Apache Slider.

Hive's SQL can also be extended with user code via user defined functions (UDFs), user defined aggregates (UDAFs), and user defined table functions (UDTFs).

Hive is not designed for online transaction processing (OLTP) workloads. It is best used for traditional data warehousing tasks.

1.2 KiB Raw Blame History

1.2 KiB

Raw Blame History