Summaries/Apache/Apache Hive/_ Select.md

3.6 KiB

title updated created
# Select 2022-04-07 19:29:20Z 2022-04-03 13:25:15Z
select name, -- regular column
work_place[0], -- array
gender_age.gender, -- struct
skills_score['DB'], -- map
depart_title[0] -- map with array
from employee
select name, work_place,
cities
from employee
LATERAL VIEW explode(work_place) C AS cities;
select name, work_place,
depart_title['Product'],
jobs
from employee
LATERAL VIEW explode(depart_title['Product']) C AS jobs;
SELECT
name,
dept_num as deptno,
salary,
count(*) OVER (PARTITION BY dept_num) as cnt,
count(distinct dept_num) OVER (PARTITION BY dept_num) as dcnt,
sum(salary) OVER(PARTITION BY dept_num ORDER BY dept_num) as sum1,
sum(salary) OVER(ORDER BY dept_num) as sum2,
sum(salary) OVER(ORDER BY dept_num, name) as sum3
FROM employee_contract
ORDER BY deptno, name;
with r1 as (select name from employee),
r2 as (select name from employee)
select * from r1
union all
select * from r2
SELECT
CASE WHEN gender_age.gender = 'Female' THEN 'Ms.'
ELSE 'Mr.' END as title,
name,
IF(array_contains(work_place, 'New York'), 'US', 'CA') as country
FROM employee;
SELECT
name, gender_age.gender as gender
FROM (
SELECT * FROM employee WHERE gender_age.gender = 'Male'
) t1  -- t1 here is mandatory
SELECT name, gender_age FROM employee WHERE gender_age.age in (27, 30)
SELECT
name, gender_age
FROM employee
WHERE (gender_age.gender, gender_age.age) IN
(('Female', 27), ('Male', 27 + 3)) -- expression support version > v2.1.0
Join type Logic Rows returned
table_m JOIN table_n This returns all rows matched in both tables. m ∩ n
table_m LEFT JOIN table_n This returns all rows in the left table and matched rows in the right table. If there is no match in the right table, it returns NULL in the right table. m
table_m RIGHT JOIN table_n This returns all rows in the right table and matched rows in the left table. If there is no match in the left table, it returns NULL in the left table. n
table_m FULL JOIN table_n This returns all rows in both tables and matched rows in both tables. If there is no match in the left or right table, it returns NULL instead. m + n - m ∩ n
table_m CROSS JOIN table_n This returns all row combinations in both the tables to produce a Cartesian product. m * n

Special joins for HiveQL

  • MAPJOIN: The MapJoin statement reads all the data from the small table to memory and broadcasts to all maps. During the map phase, the join operation is performed by comparing each row of data in the big table with small tables against the join conditions. Because there is no reduce needed, such kinds of join usually have better performance. In the newer version of Hive, Hive automatically converts join to MapJoin at runtime if possible. However, you can also manually specify the broadcast table by providing a join. hint, /*+ MAPJOIN(table_name) */. The MapJoin operation does not support the following: Using MapJoin after UNION ALL, LATERAL VIEW, GROUP BY/JOIN/SORT BY/CLUSTER, and BY/DISTRIBUTE BY Using MapJoin before UNION, JOIN, and another MapJoin
SELECT
/*+ MAPJOIN(employee) */ emp.name, emph.sin_number
FROM employee emp
CROSS JOIN employee_hr emph
WHERE emp.name <> emph.name;
  • LEFT SEMI JOIN statement is also a type of MapJoin. It is the same as a subquery with IN/EXISTS after v0.13.0 of Hive. However, it is not recommended for use since it is not part of standard SQL
SELECT a.name FROM employee a
LEFT SEMI JOIN employee_id b ON a.name = b.name;