Challenging Interview Questions in MySQL, Spark SQl

Difference of scores for each subjects from Predecessor year for each subject ,difference of scores for each subject from successor year for each subject , difference of total scores for each student from predecessor and successor from previous year for a student in SQL, Spark sql?

let’s break this down into different parts and provide solutions in both standard SQL and Spark SQL. We’ll assume we have a table Scores with columns StudentID, Year, Subject, and Score.

1. Difference of Scores for Each Subject from Predecessor

For each student, calculate the difference in scores for each subject from the previous year (predecessor).

WITH RankedScores AS (
    SELECT 
        StudentID,
        Subject,
        Year,
        Score,
        LAG(Score) OVER (PARTITION BY StudentID, Subject ORDER BY Year) AS PrevScore
    FROM Scores
)
SELECT
    StudentID,
    Subject,
    Year,
    Score,
    Score - PrevScore AS ScoreDiffFromPredecessor
FROM RankedScores
WHERE PrevScore IS NOT NULL;

Spark SQL

SELECT 
    StudentID,
    Subject,
    Year,
    Score,
    Score - LAG(Score) OVER (PARTITION BY StudentID, Subject ORDER BY Year) AS ScoreDiffFromPredecessor
FROM Scores
WHERE LAG(Score) OVER (PARTITION BY StudentID, Subject ORDER BY Year) IS NOT NULL;

2. Difference of Scores for Each Subject from Successor

For each student, calculate the difference in scores for each subject from the next year (successor).

SQL

WITH RankedScores AS (
    SELECT 
        StudentID,
        Subject,
        Year,
        Score,
        LEAD(Score) OVER (PARTITION BY StudentID, Subject ORDER BY Year) AS NextScore
    FROM Scores
)
SELECT
    StudentID,
    Subject,
    Year,
    Score,
    Score - NextScore AS ScoreDiffFromSuccessor
FROM RankedScores
WHERE NextScore IS NOT NULL;

Spark SQL

SELECT 
    StudentID,
    Subject,
    Year,
    Score,
    Score - LEAD(Score) OVER (PARTITION BY StudentID, Subject ORDER BY Year) AS ScoreDiffFromSuccessor
FROM Scores
WHERE LEAD(Score) OVER (PARTITION BY StudentID, Subject ORDER BY Year) IS NOT NULL;

3. Difference of Total Scores for Each Student from Predecessor and Successor

For each student, calculate the difference in total scores from the previous and next year.

WITH TotalScores AS (
    SELECT
        StudentID,
        Year,
        SUM(Score) AS TotalScore
    FROM Scores
    GROUP BY StudentID, Year
),
RankedTotalScores AS (
    SELECT
        StudentID,
        Year,
        TotalScore,
        LAG(TotalScore) OVER (PARTITION BY StudentID ORDER BY Year) AS PrevTotalScore,
        LEAD(TotalScore) OVER (PARTITION BY StudentID ORDER BY Year) AS NextTotalScore
    FROM TotalScores
)
SELECT
    StudentID,
    Year,
    TotalScore,
    TotalScore - PrevTotalScore AS TotalScoreDiffFromPredecessor,
    TotalScore - NextTotalScore AS TotalScoreDiffFromSuccessor
FROM RankedTotalScores
WHERE PrevTotalScore IS NOT NULL OR NextTotalScore IS NOT NULL;

Spark SQL

WITH TotalScores AS (
    SELECT
        StudentID,
        Year,
        SUM(Score) AS TotalScore
    FROM Scores
    GROUP BY StudentID, Year
)
SELECT
    StudentID,
    Year,
    TotalScore,
    TotalScore - LAG(TotalScore) OVER (PARTITION BY StudentID ORDER BY Year) AS TotalScoreDiffFromPredecessor,
    TotalScore - LEAD(TotalScore) OVER (PARTITION BY StudentID ORDER BY Year) AS TotalScoreDiffFromSuccessor
FROM TotalScores
WHERE LAG(TotalScore) OVER (PARTITION BY StudentID ORDER BY Year) IS NOT NULL
   OR LEAD(TotalScore) OVER (PARTITION BY StudentID ORDER BY Year) IS NOT NULL;

4. Difference of Scores from Previous Year for a Student

For each student, calculate the difference in scores from the previous year.

SQL

WITH RankedScores AS (
    SELECT 
        StudentID,
        Year,
        SUM(Score) AS TotalScore,
        LAG(SUM(Score)) OVER (PARTITION BY StudentID ORDER BY Year) AS PrevTotalScore
    FROM Scores
    GROUP BY StudentID, Year
)
SELECT
    StudentID,
    Year,
    TotalScore,
    TotalScore - PrevTotalScore AS ScoreDiffFromPreviousYear
FROM RankedScores
WHERE PrevTotalScore IS NOT NULL;

Spark SQL

SELECT 
    StudentID,
    Year,
    SUM(Score) AS TotalScore,
    SUM(Score) - LAG(SUM(Score)) OVER (PARTITION BY StudentID ORDER BY Year) AS ScoreDiffFromPreviousYear
FROM Scores
GROUP BY StudentID, Year
HAVING LAG(SUM(Score)) OVER (PARTITION BY StudentID ORDER BY Year) IS NOT NULL;

Explanation

  • LAG(): Retrieves the value of a column from a preceding row within the same partition.
  • LEAD(): Retrieves the value of a column from a following row within the same partition.
  • PARTITION BY: Divides the result set into partitions to which the window function is applied.
  • ORDER BY: Defines the logical order of the rows within each partition.
Pages ( 2 of 6 ): « Previous1 2 34 ... 6Next »