Spark SQL supports several types of joins, each suited to different use cases. Below is a detailed explanation of each join type, including syntax examples and comparisons.
Types of Joins in Spark SQL
- Inner Join
- Left (Outer) Join
- Right (Outer) Join
- Full (Outer) Join
- Left Semi Join
- Left Anti Join
- Cross Join
1. Inner Join
An inner join returns only the rows that have matching values in both tables.
Syntax:
SELECT a.*, b.*
FROM tableA a
INNER JOIN tableB b
ON a.id = b.id;
Example:
SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
INNER JOIN departments
ON employees.dept_id = departments.dept_id;
2. Left (Outer) Join
A left join returns all rows from the left table and the matched rows from the right table. If no match is found, NULLs are returned for columns from the right table.
Syntax:
SELECT a.*, b.*
FROM tableA a
LEFT JOIN tableB b
ON a.id = b.id;
Example:
SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
LEFT JOIN departments
ON employees.dept_id = departments.dept_id;
3. Right (Outer) Join
A right join returns all rows from the right table and the matched rows from the left table. If no match is found, NULLs are returned for columns from the left table.
Syntax:
SELECT a.*, b.*
FROM tableA a
RIGHT JOIN tableB b
ON a.id = b.id;
Example:
SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
RIGHT JOIN departments
ON employees.dept_id = departments.dept_id;
4. Full (Outer) Join
A full outer join returns all rows when there is a match in either left or right table. Rows without a match in one of the tables will have NULLs in the columns of the non-matching table.
Syntax:
SELECT a.*, b.*
FROM tableA a
FULL OUTER JOIN tableB b
ON a.id = b.id;
Example:
SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
FULL OUTER JOIN departments
ON employees.dept_id = departments.dept_id;
5. Left Semi Join
A left semi join returns only the rows from the left table for which there is a match in the right table. It is equivalent to using an IN
clause.
Syntax:
SELECT a.*
FROM tableA a
LEFT SEMI JOIN tableB b
ON a.id = b.id;
Example:
SELECT employees.emp_id, employees.emp_name
FROM employees
LEFT SEMI JOIN departments
ON employees.dept_id = departments.dept_id;
6. Left Anti Join
A left anti join returns only the rows from the left table for which there is no match in the right table. It is equivalent to using a NOT IN
clause.
Syntax:
SELECT a.*
FROM tableA a
LEFT ANTI JOIN tableB b
ON a.id = b.id;
Example:
SELECT employees.emp_id, employees.emp_name
FROM employees
LEFT ANTI JOIN departments
ON employees.dept_id = departments.dept_id;
7. Cross Join
A cross join returns the Cartesian product of the two tables, meaning every row from the left table is joined with every row from the right table.
Syntax:
SELECT a.*, b.*
FROM tableA a
CROSS JOIN tableB b;
Example:
SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
CROSS JOIN departments;
Comparison
Join Type | Returns Rows from Left Table | Returns Rows from Right Table | Returns Matched Rows | Returns Unmatched Rows with NULLs |
---|---|---|---|---|
Inner Join | Yes | Yes | Yes | No |
Left Join | Yes | No | Yes | Yes (for left table) |
Right Join | No | Yes | Yes | Yes (for right table) |
Full Outer Join | Yes | Yes | Yes | Yes (for both tables) |
Left Semi Join | Yes | No | Yes | No |
Left Anti Join | Yes | No | No | Yes (only for non-matching rows) |
Cross Join | Yes | Yes | N/A | N/A |
Examples From Official Doc
-- Use employee and department tables to demonstrate different type of joins.
SELECT * FROM employee;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe| 5|
|103| Paul| 3|
|101| John| 1|
|102| Lisa| 2|
|104| Evan| 4|
|106| Amy| 6|
+---+-----+------+
SELECT * FROM department;
+------+-----------+
|deptno| deptname|
+------+-----------+
| 3|Engineering|
| 2| Sales|
| 1| Marketing|
+------+-----------+
-- Use employee and department tables to demonstrate inner join.
SELECT id, name, employee.deptno, deptname
FROM employee INNER JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|103| Paul| 3|Engineering|
|101| John| 1| Marketing|
|102| Lisa| 2| Sales|
+---+-----+------+-----------|
-- Use employee and department tables to demonstrate left join.
SELECT id, name, employee.deptno, deptname
FROM employee LEFT JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|105|Chloe| 5| NULL|
|103| Paul| 3|Engineering|
|101| John| 1| Marketing|
|102| Lisa| 2| Sales|
|104| Evan| 4| NULL|
|106| Amy| 6| NULL|
+---+-----+------+-----------|
-- Use employee and department tables to demonstrate right join.
SELECT id, name, employee.deptno, deptname
FROM employee RIGHT JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|103| Paul| 3|Engineering|
|101| John| 1| Marketing|
|102| Lisa| 2| Sales|
+---+-----+------+-----------|
-- Use employee and department tables to demonstrate full join.
SELECT id, name, employee.deptno, deptname
FROM employee FULL JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|101| John| 1| Marketing|
|106| Amy| 6| NULL|
|103| Paul| 3|Engineering|
|105|Chloe| 5| NULL|
|104| Evan| 4| NULL|
|102| Lisa| 2| Sales|
+---+-----+------+-----------|
-- Use employee and department tables to demonstrate cross join.
SELECT id, name, employee.deptno, deptname FROM employee CROSS JOIN department;
+---+-----+------+-----------|
| id| name|deptno| deptname|
+---+-----+------+-----------|
|105|Chloe| 5|Engineering|
|105|Chloe| 5| Marketing|
|105|Chloe| 5| Sales|
|103| Paul| 3|Engineering|
|103| Paul| 3| Marketing|
|103| Paul| 3| Sales|
|101| John| 1|Engineering|
|101| John| 1| Marketing|
|101| John| 1| Sales|
|102| Lisa| 2|Engineering|
|102| Lisa| 2| Marketing|
|102| Lisa| 2| Sales|
|104| Evan| 4|Engineering|
|104| Evan| 4| Marketing|
|104| Evan| 4| Sales|
|106| Amy| 4|Engineering|
|106| Amy| 4| Marketing|
|106| Amy| 4| Sales|
+---+-----+------+-----------|
-- Use employee and department tables to demonstrate semi join.
SELECT * FROM employee SEMI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|103| Paul| 3|
|101| John| 1|
|102| Lisa| 2|
+---+-----+------+
-- Use employee and department tables to demonstrate anti join.
SELECT * FROM employee ANTI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe| 5|
|104| Evan| 4|
|106| Amy| 6|
+---+-----+------+
Leave a Reply