Examples

1. Rank Employees by Salary in Each Department

SELECT EmpID, 
       Emp_name, 
       Department, 
       Salary, 
       RANK() OVER (PARTITION BY Department ORDER BY Salary DESC) AS rank
FROM Employee;

Explanation:

  • PARTITION BY Department: Separate ranking by department.
  • ORDER BY Salary DESC: Ranks employees by salary in descending order.

2. Calculate Cumulative Sum of Sales

SELECT CustomerID, 
OrderDate,
ProductID,
SUM(Sales) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS cumulative_sales
FROM Orders;

Explanation:

  • PARTITION BY CustomerID: Cumulative sales calculated per customer.
  • ORDER BY OrderDate: Sales are accumulated in chronological order.

3. Find Running Average Salary

SELECT EmpID, 
       Emp_name, 
       Department, 
       AVG(Salary) OVER (PARTITION BY Department ORDER BY Salary ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS running_avg
FROM Employee;

Explanation:

  • ROWS BETWEEN 2 PRECEDING AND CURRENT ROW: Calculates the average salary of the current row and the two preceding rows.

4. Lead and Lag Example

SELECT EmpID, 
       Emp_name, 
       Department, 
       Salary, 
       LAG(Salary, 1, 0) OVER (PARTITION BY Department ORDER BY Salary) AS previous_salary,
       LEAD(Salary, 1, 0) OVER (PARTITION BY Department ORDER BY Salary) AS next_salary
FROM Employee;

Explanation:

  • LAG(Salary, 1, 0): Fetches the previous salary within the partition. Default is 0 if there is no previous value.
  • LEAD(Salary, 1, 0): Fetches the next salary within the partition. Default is 0 if there is no next value.

5. Find Employees Above Average Salary in Their Department

SELECT EmpID, 
       Emp_name, 
       Department, 
       Salary, 
       AVG(Salary) OVER (PARTITION BY Department) AS avg_salary
FROM Employee
WHERE Salary > AVG(Salary) OVER (PARTITION BY Department);

Explanation:

  • AVG(Salary) OVER (PARTITION BY Department): Calculates average salary for each department.
  • The WHERE clause filters employees whose salary is above the department average.

Notes

  1. Spark SQL does not support DISTINCT in window functions.
    • For example, COUNT(DISTINCT column) OVER (...) is not allowed in Spark SQL.
  2. Optimized for Distributed Computing:
    • Unlike traditional SQL, Spark SQL handles large-scale datasets by distributing computations across the cluster.
  3. Frame Specification:
    • Use ROWS for row-level computations.
    • Use RANGE for value-based computations.

Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Pages ( 2 of 3 ): « Previous1 2 3Next »

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading