Given a table ‘stock_prices’ with columns (date, stock_symbol, closing_price). What’s the cumulative change in stock price compared to the starting price of the year?
To calculate the cumulative change in stock price compared to the starting price of the year for each stock symbol, you can use window functions in both MySQL and Spark SQL. Below are the queries for each database system:
MySQL
In MySQL, you can use the FIRST_VALUE()
window function to get the starting price of the year and then calculate the cumulative change:
SELECT
date,
stock_symbol,
closing_price,
closing_price - FIRST_VALUE(closing_price) OVER (PARTITION BY stock_symbol, YEAR(date) ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cumulative_change
FROM
stock_prices
ORDER BY
stock_symbol, date;
Explanation:
- SELECT Clause:
date
: Selects the date.stock_symbol
: Selects the stock symbol.closing_price
: Selects the closing price.closing_price - FIRST_VALUE(closing_price) OVER (...)
: Calculates the cumulative change by subtracting the starting price of the year from the current closing price.
- Window Function:
FIRST_VALUE(closing_price) OVER (PARTITION BY stock_symbol, YEAR(date) ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
: Gets the first closing price of the year for each stock symbol.
- ORDER BY Clause:
- Orders the result set by
stock_symbol
anddate
.
- Orders the result set by
Spark SQL
In Spark SQL, the approach is similar, using the FIRST_VALUE()
window function to get the starting price of the year and then calculate the cumulative change:
SELECT
date,
stock_symbol,
closing_price,
closing_price - FIRST_VALUE(closing_price) OVER (PARTITION BY stock_symbol, YEAR(date) ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cumulative_change
FROM
stock_prices
ORDER BY
stock_symbol, date;
Explanation:
The Spark SQL query is almost identical to the MySQL query. It calculates the cumulative change in stock price compared to the starting price of the year using the same window function and frame specification.
You have two tables: ‘products’ with columns (product_id, product_name, category_id, price) and ‘categories’ with columns (category_id, category_name). What is the price difference between each product and the next most expensive product in that category?
To find the price difference between each product and the next most expensive product within the same category, you can use window functions in both MySQL and Spark SQL. Here are the queries for each database system:
MySQL
In MySQL, you can use the LEAD()
window function to get the price of the next most expensive product within each category:
SELECT
p.product_id,
p.product_name,
p.category_id,
p.price,
c.category_name,
LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS next_higher_price,
p.price - LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS price_difference
FROM
products p
JOIN
categories c ON p.category_id = c.category_id
ORDER BY
p.category_id, p.price DESC;
Explanation:
- SELECT Clause:
- Selects the necessary columns from the
products
andcategories
tables. LEAD(p.price) OVER (...)
retrieves the price of the next most expensive product within the same category.- Calculates the price difference between the current product and the next most expensive product.
- Selects the necessary columns from the
- JOIN Clause:
- Joins the
products
andcategories
tables oncategory_id
.
- Joins the
- Window Function:
PARTITION BY p.category_id ORDER BY p.price DESC
partitions the data bycategory_id
and orders it byprice
in descending order.
- ORDER BY Clause:
- Orders the result set by
category_id
andprice
in descending order.
- Orders the result set by
Spark SQL
In Spark SQL, the approach is similar, using the LEAD()
window function to get the price of the next most expensive product within each category:
SELECT
p.product_id,
p.product_name,
p.category_id,
p.price,
c.category_name,
LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS next_higher_price,
p.price - LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS price_difference
FROM
products p
JOIN
categories c ON p.category_id = c.category_id
ORDER BY
p.category_id, p.price DESC;
Explanation:
The Spark SQL query is almost identical to the MySQL query. It calculates the price difference between each product and the next most expensive product within the same category using the same window function and frame specification.