--- title: '# Select' updated: 2022-04-07 19:29:20Z created: 2022-04-03 13:25:15Z --- ```hive select name, -- regular column work_place[0], -- array gender_age.gender, -- struct skills_score['DB'], -- map depart_title[0] -- map with array from employee ``` ```hive select name, work_place, cities from employee LATERAL VIEW explode(work_place) C AS cities; ``` ```hive select name, work_place, depart_title['Product'], jobs from employee LATERAL VIEW explode(depart_title['Product']) C AS jobs; ``` ```hive SELECT name, dept_num as deptno, salary, count(*) OVER (PARTITION BY dept_num) as cnt, count(distinct dept_num) OVER (PARTITION BY dept_num) as dcnt, sum(salary) OVER(PARTITION BY dept_num ORDER BY dept_num) as sum1, sum(salary) OVER(ORDER BY dept_num) as sum2, sum(salary) OVER(ORDER BY dept_num, name) as sum3 FROM employee_contract ORDER BY deptno, name; ``` ```hive with r1 as (select name from employee), r2 as (select name from employee) select * from r1 union all select * from r2 ``` ```hive SELECT CASE WHEN gender_age.gender = 'Female' THEN 'Ms.' ELSE 'Mr.' END as title, name, IF(array_contains(work_place, 'New York'), 'US', 'CA') as country FROM employee; ``` ```hive SELECT name, gender_age.gender as gender FROM ( SELECT * FROM employee WHERE gender_age.gender = 'Male' ) t1 -- t1 here is mandatory ``` ```hive SELECT name, gender_age FROM employee WHERE gender_age.age in (27, 30) ``` ```hive SELECT name, gender_age FROM employee WHERE (gender_age.gender, gender_age.age) IN (('Female', 27), ('Male', 27 + 3)) -- expression support version > v2.1.0 ``` |Join type | Logic | Rows returned | |---|---|---| |table_m JOIN table_n | This returns all rows matched in both tables.| m ∩ n| |table_m LEFT JOIN table_n | This returns all rows in the left table and matched rows in the right table. If there is no match in the right table, it returns NULL in the right table.| m | |table_m RIGHT JOIN table_n | This returns all rows in the right table and matched rows in the left table. If there is no match in the left table, it returns NULL in the left table.| n | |table_m FULL JOIN table_n| This returns all rows in both tables and matched rows in both tables. If there is no match in the left or right table, it returns NULL instead. | m + n - m ∩ n | |table_m CROSS JOIN table_n | This returns all row combinations in both the tables to produce a Cartesian product.| m * n | ### Special joins for HiveQL - MAPJOIN: The MapJoin statement reads all the data from the small table to memory and broadcasts to all maps. During the map phase, the join operation is performed by comparing each row of data in the big table with small tables against the join conditions. Because there is no reduce needed, such kinds of join usually have better performance. In the newer version of Hive, Hive automatically converts join to MapJoin at runtime if possible. However, you can also manually specify the broadcast table by providing a join. hint, /*+ MAPJOIN(table_name) */. The MapJoin operation does not support the following: Using MapJoin after UNION ALL, LATERAL VIEW, GROUP BY/JOIN/SORT BY/CLUSTER, and BY/DISTRIBUTE BY Using MapJoin before UNION, JOIN, and another MapJoin ```hive SELECT /*+ MAPJOIN(employee) */ emp.name, emph.sin_number FROM employee emp CROSS JOIN employee_hr emph WHERE emp.name <> emph.name; ``` - LEFT SEMI JOIN statement is also a type of MapJoin. It is the same as a subquery with IN/EXISTS after v0.13.0 of Hive. However, it is not recommended for use since it is not part of standard SQL ```hive SELECT a.name FROM employee a LEFT SEMI JOIN employee_id b ON a.name = b.name; ```