Challenging Interview Questions in MySQL, Spark SQl

Given a table ‘stock_prices’ with columns (date, stock_symbol, closing_price). What’s the cumulative change in stock price compared to the starting price of the year?

To calculate the cumulative change in stock price compared to the starting price of the year for each stock symbol, you can use window functions in both MySQL and Spark SQL. Below are the queries for each database system:

MySQL

In MySQL, you can use the FIRST_VALUE() window function to get the starting price of the year and then calculate the cumulative change:

SELECT
    date,
    stock_symbol,
    closing_price,
    closing_price - FIRST_VALUE(closing_price) OVER (PARTITION BY stock_symbol, YEAR(date) ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cumulative_change
FROM
    stock_prices
ORDER BY
    stock_symbol, date;

Explanation:

  1. SELECT Clause:
    • date: Selects the date.
    • stock_symbol: Selects the stock symbol.
    • closing_price: Selects the closing price.
    • closing_price - FIRST_VALUE(closing_price) OVER (...): Calculates the cumulative change by subtracting the starting price of the year from the current closing price.
  2. Window Function:
    • FIRST_VALUE(closing_price) OVER (PARTITION BY stock_symbol, YEAR(date) ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING): Gets the first closing price of the year for each stock symbol.
  3. ORDER BY Clause:
    • Orders the result set by stock_symbol and date.

Spark SQL

In Spark SQL, the approach is similar, using the FIRST_VALUE() window function to get the starting price of the year and then calculate the cumulative change:

SELECT
    date,
    stock_symbol,
    closing_price,
    closing_price - FIRST_VALUE(closing_price) OVER (PARTITION BY stock_symbol, YEAR(date) ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cumulative_change
FROM
    stock_prices
ORDER BY
    stock_symbol, date;

Explanation:

The Spark SQL query is almost identical to the MySQL query. It calculates the cumulative change in stock price compared to the starting price of the year using the same window function and frame specification.

You have two tables: ‘products’ with columns (product_id, product_name, category_id, price) and ‘categories’ with columns (category_id, category_name). What is the price difference between each product and the next most expensive product in that category?

To find the price difference between each product and the next most expensive product within the same category, you can use window functions in both MySQL and Spark SQL. Here are the queries for each database system:

MySQL

In MySQL, you can use the LEAD() window function to get the price of the next most expensive product within each category:

SELECT
    p.product_id,
    p.product_name,
    p.category_id,
    p.price,
    c.category_name,
    LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS next_higher_price,
    p.price - LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS price_difference
FROM
    products p
JOIN
    categories c ON p.category_id = c.category_id
ORDER BY
    p.category_id, p.price DESC;

Explanation:

  1. SELECT Clause:
    • Selects the necessary columns from the products and categories tables.
    • LEAD(p.price) OVER (...) retrieves the price of the next most expensive product within the same category.
    • Calculates the price difference between the current product and the next most expensive product.
  2. JOIN Clause:
    • Joins the products and categories tables on category_id.
  3. Window Function:
    • PARTITION BY p.category_id ORDER BY p.price DESC partitions the data by category_id and orders it by price in descending order.
  4. ORDER BY Clause:
    • Orders the result set by category_id and price in descending order.

Spark SQL

In Spark SQL, the approach is similar, using the LEAD() window function to get the price of the next most expensive product within each category:

SELECT
    p.product_id,
    p.product_name,
    p.category_id,
    p.price,
    c.category_name,
    LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS next_higher_price,
    p.price - LEAD(p.price) OVER (PARTITION BY p.category_id ORDER BY p.price DESC) AS price_difference
FROM
    products p
JOIN
    categories c ON p.category_id = c.category_id
ORDER BY
    p.category_id, p.price DESC;

Explanation:

The Spark SQL query is almost identical to the MySQL query. It calculates the price difference between each product and the next most expensive product within the same category using the same window function and frame specification.

Pages ( 6 of 6 ): « Previous1 ... 45 6