Spark SQL supports several types of joins, each suited to different use cases. Below is a detailed explanation of each join type, including syntax examples and comparisons.

Types of Joins in Spark SQL

  1. Inner Join
  2. Left (Outer) Join
  3. Right (Outer) Join
  4. Full (Outer) Join
  5. Left Semi Join
  6. Left Anti Join
  7. Cross Join

1. Inner Join

An inner join returns only the rows that have matching values in both tables.

Syntax:

SELECT a.*, b.*
FROM tableA a
INNER JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
INNER JOIN departments
ON employees.dept_id = departments.dept_id;

2. Left (Outer) Join

A left join returns all rows from the left table and the matched rows from the right table. If no match is found, NULLs are returned for columns from the right table.

Syntax:

SELECT a.*, b.*
FROM tableA a
LEFT JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
LEFT JOIN departments
ON employees.dept_id = departments.dept_id;

3. Right (Outer) Join

A right join returns all rows from the right table and the matched rows from the left table. If no match is found, NULLs are returned for columns from the left table.

Syntax:

SELECT a.*, b.*
FROM tableA a
RIGHT JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
RIGHT JOIN departments
ON employees.dept_id = departments.dept_id;

4. Full (Outer) Join

A full outer join returns all rows when there is a match in either left or right table. Rows without a match in one of the tables will have NULLs in the columns of the non-matching table.

Syntax:

SELECT a.*, b.*
FROM tableA a
FULL OUTER JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
FULL OUTER JOIN departments
ON employees.dept_id = departments.dept_id;

5. Left Semi Join

A left semi join returns only the rows from the left table for which there is a match in the right table. It is equivalent to using an IN clause.

Syntax:

SELECT a.*
FROM tableA a
LEFT SEMI JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name
FROM employees
LEFT SEMI JOIN departments
ON employees.dept_id = departments.dept_id;

6. Left Anti Join

A left anti join returns only the rows from the left table for which there is no match in the right table. It is equivalent to using a NOT IN clause.

Syntax:

SELECT a.*
FROM tableA a
LEFT ANTI JOIN tableB b
ON a.id = b.id;

Example:

SELECT employees.emp_id, employees.emp_name
FROM employees
LEFT ANTI JOIN departments
ON employees.dept_id = departments.dept_id;

7. Cross Join

A cross join returns the Cartesian product of the two tables, meaning every row from the left table is joined with every row from the right table.

Syntax:

SELECT a.*, b.*
FROM tableA a
CROSS JOIN tableB b;

Example:

SELECT employees.emp_id, employees.emp_name, departments.dept_name
FROM employees
CROSS JOIN departments;

Comparison

Join TypeReturns Rows from Left TableReturns Rows from Right TableReturns Matched RowsReturns Unmatched Rows with NULLs
Inner JoinYesYesYesNo
Left JoinYesNoYesYes (for left table)
Right JoinNoYesYesYes (for right table)
Full Outer JoinYesYesYesYes (for both tables)
Left Semi JoinYesNoYesNo
Left Anti JoinYesNoNoYes (only for non-matching rows)
Cross JoinYesYesN/AN/A

Examples From Official Doc

-- Use employee and department tables to demonstrate different type of joins.
SELECT * FROM employee;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe|     5|
|103| Paul|     3|
|101| John|     1|
|102| Lisa|     2|
|104| Evan|     4|
|106|  Amy|     6|
+---+-----+------+

SELECT * FROM department;
+------+-----------+
|deptno|   deptname|
+------+-----------+
|     3|Engineering|
|     2|      Sales|
|     1|  Marketing|
+------+-----------+

-- Use employee and department tables to demonstrate inner join.
SELECT id, name, employee.deptno, deptname
    FROM employee INNER JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|103| Paul|     3|Engineering|
|101| John|     1|  Marketing|
|102| Lisa|     2|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate left join.
SELECT id, name, employee.deptno, deptname
    FROM employee LEFT JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|105|Chloe|     5|       NULL|
|103| Paul|     3|Engineering|
|101| John|     1|  Marketing|
|102| Lisa|     2|      Sales|
|104| Evan|     4|       NULL|
|106|  Amy|     6|       NULL|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate right join.
SELECT id, name, employee.deptno, deptname
    FROM employee RIGHT JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|103| Paul|     3|Engineering|
|101| John|     1|  Marketing|
|102| Lisa|     2|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate full join.
SELECT id, name, employee.deptno, deptname
    FROM employee FULL JOIN department ON employee.deptno = department.deptno;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|101| John|     1|  Marketing|
|106|  Amy|     6|       NULL|
|103| Paul|     3|Engineering|
|105|Chloe|     5|       NULL|
|104| Evan|     4|       NULL|
|102| Lisa|     2|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate cross join.
SELECT id, name, employee.deptno, deptname FROM employee CROSS JOIN department;
+---+-----+------+-----------|
| id| name|deptno|   deptname|
+---+-----+------+-----------|
|105|Chloe|     5|Engineering|
|105|Chloe|     5|  Marketing|
|105|Chloe|     5|      Sales|
|103| Paul|     3|Engineering|
|103| Paul|     3|  Marketing|
|103| Paul|     3|      Sales|
|101| John|     1|Engineering|
|101| John|     1|  Marketing|
|101| John|     1|      Sales|
|102| Lisa|     2|Engineering|
|102| Lisa|     2|  Marketing|
|102| Lisa|     2|      Sales|
|104| Evan|     4|Engineering|
|104| Evan|     4|  Marketing|
|104| Evan|     4|      Sales|
|106|  Amy|     4|Engineering|
|106|  Amy|     4|  Marketing|
|106|  Amy|     4|      Sales|
+---+-----+------+-----------|

-- Use employee and department tables to demonstrate semi join.
SELECT * FROM employee SEMI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|103| Paul|     3|
|101| John|     1|
|102| Lisa|     2|
+---+-----+------+

-- Use employee and department tables to demonstrate anti join.
SELECT * FROM employee ANTI JOIN department ON employee.deptno = department.deptno;
+---+-----+------+
| id| name|deptno|
+---+-----+------+
|105|Chloe|     5|
|104| Evan|     4|
|106|  Amy|     6|
+---+-----+------+

Discover more from AI HintsToday

Subscribe to get the latest posts sent to your email.

Discover more from AI HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading