Window functions in Oracle Pl/Sql and Hive explained and compared with examples

Window functions, also known as analytic functions, perform calculations across a set of table rows that are somehow related to the current row. This is different from regular aggregate functions, which aggregate results for the entire set of rows. Both Oracle PL/SQL and Apache Hive support window functions, but there are some differences in their implementation and usage.

Window Functions in Oracle PL/SQL

Oracle provides a comprehensive set of window functions that can be used to perform complex calculations.

Syntax:

function_name(expression) OVER (
  [PARTITION BY partition_expression]
  [ORDER BY order_expression]

)

Common Oracle Window Functions:

Function	Description	Example Usage
ROW_NUMBER	Assigns a unique number to each row within the partition.	SELECT employee_id, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num FROM employees;
RANK	Assigns a rank to each row within the partition of a result set.	SELECT employee_id, RANK() OVER (ORDER BY salary DESC) AS rank FROM employees;
DENSE_RANK	Similar to RANK but without gaps in ranking.	SELECT employee_id, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;
NTILE	Divides rows into a specified number of approximately equal groups.	SELECT employee_id, NTILE(4) OVER (ORDER BY salary DESC) AS quartile FROM employees;
LAG	Provides access to a row at a specified physical offset before that row.	SELECT employee_id, LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary FROM employees;
LEAD	Provides access to a row at a specified physical offset after that row.	SELECT employee_id, LEAD(salary, 1) OVER (ORDER BY hire_date) AS next_salary FROM employees;
FIRST_VALUE	Returns the first value in an ordered set of values.	SELECT employee_id, FIRST_VALUE(salary) OVER (ORDER BY hire_date) AS first_salary FROM employees;
LAST_VALUE	Returns the last value in an ordered set of values.	SELECT employee_id, LAST_VALUE(salary) OVER (ORDER BY hire_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_salary FROM employees;
SUM	Calculates the sum of a set of values.	SELECT department_id, SUM(salary) OVER (PARTITION BY department_id) AS dept_total_salary FROM employees;
AVG	Calculates the average of a set of values.	SELECT department_id, AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary FROM employees;

Window Functions in Apache Hive

Hive also supports window functions, although its implementation may have slight differences compared to Oracle.

Syntax:

function_name(expression) OVER (
  [PARTITION BY partition_expression]
  [ORDER BY order_expression]
  [ROWS|RANGE BETWEEN start_expression AND end_expression]
)

Common Hive Window Functions:

Function	Description	Example Usage
ROW_NUMBER	Assigns a unique number to each row within the partition.	SELECT employee_id, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num FROM employees;
RANK	Assigns a rank to each row within the partition of a result set.	SELECT employee_id, RANK() OVER (ORDER BY salary DESC) AS rank FROM employees;
DENSE_RANK	Similar to RANK but without gaps in ranking.	SELECT employee_id, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank FROM employees;
NTILE	Divides rows into a specified number of approximately equal groups.	SELECT employee_id, NTILE(4) OVER (ORDER BY salary DESC) AS quartile FROM employees;
LAG	Provides access to a row at a specified physical offset before that row.	SELECT employee_id, LAG(salary, 1) OVER (ORDER BY hire_date) AS prev_salary FROM employees;
LEAD	Provides access to a row at a specified physical offset after that row.	SELECT employee_id, LEAD(salary, 1) OVER (ORDER BY hire_date) AS next_salary FROM employees;
FIRST_VALUE	Returns the first value in an ordered set of values.	SELECT employee_id, FIRST_VALUE(salary) OVER (ORDER BY hire_date) AS first_salary FROM employees;
LAST_VALUE	Returns the last value in an ordered set of values.	SELECT employee_id, LAST_VALUE(salary) OVER (ORDER BY hire_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_salary FROM employees;
SUM	Calculates the sum of a set of values.	SELECT department_id, SUM(salary) OVER (PARTITION BY department_id) AS dept_total_salary FROM employees;
AVG	Calculates the average of a set of values.	SELECT department_id, AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary FROM employees;

Example Comparison

Oracle PL/SQL Example:

SELECT employee_id,
       department_id,
       salary,
       SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS running_total
FROM employees;

Hive Example:

SELECT employee_id,
       department_id,
       salary,
       SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS running_total
FROM employees;

Both Oracle PL/SQL and Hive support window functions with similar syntax and capabilities. However, Oracle generally offers more comprehensive and robust support for advanced SQL features due to its long history and broad usage in enterprise environments. Hive, while powerful, is tailored more towards big data processing in Hadoop ecosystems, which may affect performance and feature sets depending on the version and underlying infrastructure.

Complex Use Cases for Window Functions

Window functions go beyond basic ranking and aggregations, enabling powerful data manipulations within a result set. Here are some complex use cases that showcase their capabilities:

1. Identifying Customer Churn:

Scenario: You want to predict customer churn by analyzing purchase behavior.
Process:
- Use ROW_NUMBER() to assign a sequential number to each customer purchase record, ordered by purchase date (most recent first).
- Calculate the difference between the current purchase date and the previous purchase date using LAG() with an offset of 1. This gives the time gap between purchases.
- Identify customers with a significant increase in time gap between purchases compared to their historical buying pattern. This could indicate potential churn.

2. Flagging Stock Price Volatility:

Scenario: You want to identify periods of unusual stock price volatility.
Process:
- Calculate the daily rolling standard deviation of the stock price using STDDEV() over a window of the past X days (e.g., 30 days).
- Compare the daily standard deviation with a historical average or a threshold. Flag days where the standard deviation is significantly higher, indicating potential volatility.

3. Analyzing User Engagement in Web Applications:

Scenario: You want to understand user engagement patterns in a web application.
Process:
- Use DENSE_RANK() to assign a rank to each user session based on the total time spent on the site within a specific timeframe (e.g., day or week). This identifies the most engaged users.
- Calculate the cumulative number of page views per user session using SUM() over a window defined by the session ID. This helps analyze browsing depth.

4. Comparing Sales Performance Across Different Locations:

Scenario: You want to compare sales performance between stores in different regions while accounting for overall trends.
Process:
- Calculate the percentage change in daily sales for each store using PERCENT_CHANGE() over a window of the previous N days. This removes the influence of seasonal trends.
- Use RANK() to compare the percentage change for each store within a specific region, identifying top and bottom performers relative to their regional peers.

5. Identifying Data Anomalies (Outliers):

Scenario: You want to detect outliers in a dataset that might indicate errors or suspicious activity.
Process:
- Calculate the interquartile range (IQR) for a specific column using custom logic or dedicated functions (if available). The IQR represents the middle 50% of the data distribution.
- Identify rows where the value falls outside the range defined by Q1 – 1.5 * IQR and Q3 + 1.5 * IQR. These could be potential outliers.

Additional Considerations:

When dealing with complex calculations, ensure proper handling of null values within window functions to avoid unexpected results.
Frame clauses (ROWS BETWEEN or CURRENT ROW) can be used to define the specific window size or offset for calculations within the window.
Explore advanced window functions like LEAD() and LAG() with custom offsets for more intricate data manipulations.

Write a query using a window function to calculate a running total of sales for each salesperson in your dataset. Explain How Each row is processed with Window Functions!!

let’s consider a dataset named sales with the following columns: salesperson, sale_date, and amount.

To calculate a running total of sales for each salesperson, we can use the SUM window function along with the PARTITION BY clause to group by each salesperson, and ORDER BY to ensure the sales are accumulated in chronological order.

SQL Query

SELECT
    salesperson,
    sale_date,
    amount,
    SUM(amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total
FROM
    sales
ORDER BY
    salesperson,
    sale_date;

Explanation of the Window Function

Window Function Basics:
- A window function performs a calculation across a set of table rows that are somehow related to the current row. This is different from a regular aggregate function, which performs a calculation across a set of rows but returns a single value.
SUM(amount) OVER (PARTITION BY salesperson ORDER BY sale_date):
- SUM(amount): This is the window function that calculates the sum of the amount column.
- OVER: The OVER clause defines the window of rows over which the window function operates.
- PARTITION BY salesperson: This clause divides the result set into partitions to which the window function is applied. Each partition contains all rows for a particular salesperson.
- ORDER BY sale_date: This clause orders the rows within each partition. The running total is calculated in this order.
Processing Each Row:
- The window function processes each row within the defined partitions. For each row, it calculates the sum of amount for all rows from the start of the partition up to the current row (inclusive), based on the ORDER BY clause.
- For example, if the salesperson “Alice” has three sales on different dates, the window function calculates the running total as follows:
  - For the first sale, the running total is simply the amount of the first sale.
  - For the second sale, the running total is the sum of the first and second sales.
  - For the third sale, the running total is the sum of the first, second, and third sales.

Example Dataset

Let’s say we have the following sales table:

salesperson	sale_date	amount
Alice	2023-01-01	100
Alice	2023-01-02	200
Bob	2023-01-01	150
Alice	2023-01-03	50
Bob	2023-01-02	100

Result Set

After running the query, we would get the following result:

salesperson	sale_date	amount	running_total
Alice	2023-01-01	100	100
Alice	2023-01-02	200	300
Alice	2023-01-03	50	350
Bob	2023-01-01	150	150
Bob	2023-01-02	100	250

Detailed Step-by-Step Processing

Partitioning: The data is divided into two partitions: one for “Alice” and one for “Bob”.
Ordering: Within each partition, the rows are ordered by sale_date.
Calculating Running Total:
- For “Alice”:
  - 1st row (2023-01-01): running_total = 100
  - 2nd row (2023-01-02): running_total = 100 + 200 = 300
  - 3rd row (2023-01-03): running_total = 300 + 50 = 350
- For “Bob”:
  - 1st row (2023-01-01): running_total = 150
  - 2nd row (2023-01-02): running_total = 150 + 100 = 250

Using this method, the window function efficiently calculates a running total for each salesperson, providing a cumulative sum of sales up to each point in time.

One response to “Window functions in Oracle Pl/Sql and Hive explained and compared with examples”

Spark SQL windows Function and Best Usecases – HintsToday
June 11, 2025
[…] For Better understanding on Spark SQL windows Function and Best Usecases do refer our post Window functions in Oracle Pl/Sql and Hive explained and compared with examples. […]
Loading…

HintsToday

recent posts

about