Challenging Interview Questions in MySQL, Spark SQl

Rolling Sum and Nth salary based questions in MySQL or Spark SQL

Here are examples of how to perform rolling sums and find the nth salary using both MySQL and Spark SQL.

Rolling Sum

MySQL

To calculate a rolling sum in MySQL, you can use the SUM() function with a window frame specification.

-- Create a sample table
CREATE TABLE salaries (
    id INT PRIMARY KEY,
    employee_name VARCHAR(50),
    salary INT,
    transaction_date DATE
);

-- Insert sample data
INSERT INTO salaries (id, employee_name, salary, transaction_date) VALUES
(1, 'Alice', 5000, '2023-01-01'),
(2, 'Bob', 6000, '2023-01-02'),
(3, 'Charlie', 7000, '2023-01-03'),
(4, 'Alice', 8000, '2023-01-04'),
(5, 'Bob', 9000, '2023-01-05');

-- Rolling sum query
SELECT
    id,
    employee_name,
    salary,
    transaction_date,
    SUM(salary) OVER (ORDER BY transaction_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_sum
FROM
    salaries
ORDER BY
    transaction_date;

Spark SQL

To calculate a rolling sum in Spark SQL, use the SUM() function with a window specification.

-- Create a sample table
CREATE TABLE salaries (
    id INT,
    employee_name STRING,
    salary INT,
    transaction_date DATE
);

-- Insert sample data
INSERT INTO salaries VALUES
(1, 'Alice', 5000, '2023-01-01'),
(2, 'Bob', 6000, '2023-01-02'),
(3, 'Charlie', 7000, '2023-01-03'),
(4, 'Alice', 8000, '2023-01-04'),
(5, 'Bob', 9000, '2023-01-05');

-- Rolling sum query
SELECT
    id,
    employee_name,
    salary,
    transaction_date,
    SUM(salary) OVER (ORDER BY transaction_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_sum
FROM
    salaries
ORDER BY
    transaction_date;

Nth Salary

MySQL

To find the nth highest salary in MySQL, you can use the DENSE_RANK() function.

-- Nth highest salary query (e.g., 3rd highest salary)
SET @N = 3;

SELECT DISTINCT
    salary
FROM
    salaries
ORDER BY
    salary DESC
LIMIT 1 OFFSET @N - 1;

Alternatively, using DENSE_RANK():

WITH RankedSalaries AS (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
FROM
salaries
)
SELECT
salary
FROM
RankedSalaries
WHERE
rank = @N;

Spark SQL

To find the nth highest salary in Spark SQL, you can use the DENSE_RANK() function.

-- Nth highest salary query (e.g., 3rd highest salary)
SET @N = 3;

SELECT DISTINCT
    salary
FROM
    salaries
ORDER BY
    salary DESC
LIMIT @N;

Alternatively, using DENSE_RANK():

WITH RankedSalaries AS (
    SELECT
        salary,
        DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
    FROM
        salaries
)
SELECT
    salary
FROM
    RankedSalaries
WHERE
    rank = @N;
  • Rolling Sum: Both MySQL and Spark SQL use the SUM() function with a window specification to calculate a rolling sum.
  • Nth Salary: Both MySQL and Spark SQL can use the DENSE_RANK() function to find the nth highest salary. The LIMIT and OFFSET combination can also be used in MySQL for simpler cases.

These queries illustrate the power of SQL window functions in both MySQL and Spark SQL for advanced data processing tasks.

Mysql Query to find employees who earn more than their managers

To find employees who earn more than their managers, we need to assume that we have a hierarchical table structure where each employee has a reference to their manager’s ID. Below is an example schema and the corresponding MySQL query:

Schema Example

Assume we have the following table employees:

CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
salary DECIMAL(10, 2),
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(id)
);

Sample Data

INSERT INTO employees (id, name, salary, manager_id) VALUES
(1, 'Alice', 90000, NULL),      -- Alice is the CEO, no manager
(2, 'Bob', 70000, 1),           -- Bob reports to Alice
(3, 'Charlie', 80000, 1),       -- Charlie reports to Alice
(4, 'David', 60000, 2),         -- David reports to Bob
(5, 'Eva', 75000, 2),           -- Eva reports to Bob
(6, 'Frank', 85000, 3),         -- Frank reports to Charlie
(7, 'Grace', 50000, 3);         -- Grace reports to Charlie

Query to Find Employees Who Earn More Than Their Managers

SELECT e.id, e.name, e.salary, e.manager_id, m.name AS manager_name, m.salary AS manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Explanation

  • FROM employees e: This selects all employees as e.
  • JOIN employees m ON e.manager_id = m.id: This joins the employees table with itself to get each employee’s manager. m is an alias for the manager.
  • WHERE e.salary > m.salary: This filters the results to include only those employees whose salary is greater than their manager’s salary.

Result

Running the query on the sample data will produce the employees who earn more than their managers. The expected result for the provided sample data is:

idnamesalarymanager_idmanager_namemanager_salary
3Charlie800001Alice90000
5Eva750002Bob70000
6Frank850003Charlie80000

Finding Employees with the Highest Salary in Each Department

MySQL:

SELECT department_id, employee_id, salary
FROM (
SELECT department_id, employee_id, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as rank
FROM employees
) ranked_employees
WHERE rank = 1;

Spark SQL:

SELECT department_id, employee_id, salary
FROM (
SELECT department_id, employee_id, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as rank
FROM employees
) ranked_employees
WHERE rank = 1;

Finding the Second Highest Salary in the Company

MySQL:

SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Spark SQL:

SELECT MAX(salary) AS second_highest_salary
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

Finding Employees with More Than Average Salary in Their Department

MySQL:

SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
JOIN (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) d ON e.department_id = d.department_id
WHERE e.salary > d.avg_salary;

Spark SQL:

SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
JOIN (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) d ON e.department_id = d.department_id
WHERE e.salary > d.avg_salary;

Finding the Cumulative Salary by Department Ordered by Employee ID

MySQL:

SELECT employee_id, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id ORDER BY employee_id) AS cumulative_salary
FROM employees;

Spark SQL:

SELECT employee_id, department_id, salary,
SUM(salary) OVER (PARTITION BY department_id ORDER BY employee_id) AS cumulative_salary
FROM employees;

Finding the Manager with the Maximum Number of Direct Reports

MySQL:

SELECT manager_id, COUNT(*) AS direct_reports
FROM employees
WHERE manager_id IS NOT NULL
GROUP BY manager_id
ORDER BY direct_reports DESC
LIMIT 1;

Spark SQL:

SELECT manager_id, COUNT(*) AS direct_reports
FROM employees
WHERE manager_id IS NOT NULL
GROUP BY manager_id
ORDER BY direct_reports DESC
LIMIT 1;

Identifying Gaps in Employee IDs

MySQL:

SELECT id + 1 AS start_id, MIN(id) - 1 AS end_id
FROM employees
WHERE (id + 1) NOT IN (SELECT id FROM employees)
GROUP BY id
ORDER BY start_id;

Spark SQL:

SELECT id + 1 AS start_id, MIN(id) - 1 AS end_id
FROM employees
WHERE (id + 1) NOT IN (SELECT id FROM employees)
GROUP BY id
ORDER BY start_id;

Finding Employees Who Have Worked on All Projects

MySQL:

SELECT e.employee_id, e.name
FROM employees e
JOIN project_assignments pa ON e.employee_id = pa.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(DISTINCT pa.project_id) = (SELECT COUNT(DISTINCT project_id) FROM projects);

Spark SQL:

SELECT e.employee_id, e.name
FROM employees e
JOIN project_assignments pa ON e.employee_id = pa.employee_id
GROUP BY e.employee_id, e.name
HAVING COUNT(DISTINCT pa.project_id) = (SELECT COUNT(DISTINCT project_id) FROM projects);

Finding the Average Salary by Job Title and Department

MySQL:

SELECT job_title, department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY job_title, department_id;

Spark SQL:

SELECT job_title, department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY job_title, department_id;

These queries should provide a comprehensive challenge and help deepen your understanding of both MySQL and Spark SQL.

Source for Below Questions

Pages ( 4 of 6 ): « Previous123 4 56Next »