Challenging Interview Questions in MySQL, Spark SQl

Difference of scores for each subjects from Predecessor only for each subject in same class, difference of scores for each subject from successor only for each subject in same class , difference of total scores for a student from his predecessor and successor in a same class, also difference of scores from previous year for a student in SQL, Spark sql?

Let’s break down the requirements and then implement them in SQL and Spark SQL. We’ll assume we have a table with the following structure:

CREATE TABLE StudentScores (
    StudentID INT,
    Year INT,
    Class INT,
    Subject VARCHAR(50),
    Score INT
);

Requirements

  1. Difference of scores for each subject from predecessor for each subject.
  2. Difference of scores for each subject from successor for each subject.
  3. Difference of total scores for a student from their predecessor and successor in the same class.
  4. Difference of scores from the previous year for a student.

SQL Implementation

Let’s assume we are working with a relational database supporting standard SQL. We’ll use window functions for some of these requirements.

1. Difference of scores for each subject from predecessor for each subject

SELECT 
    StudentID,
    Year,
    Class,
    Subject,
    Score,
    Score - LAG(Score) OVER (PARTITION BY Class, Subject ORDER BY Year) AS DiffFromPredecessor
FROM 
    StudentScores;

2. Difference of scores for each subject from successor for each subject

SELECT 
    StudentID,
    Year,
    Class,
    Subject,
    Score,
    LEAD(Score) OVER (PARTITION BY Class, Subject ORDER BY Year) - Score AS DiffFromSuccessor
FROM 
    StudentScores;

3. Difference of total scores for a student from their predecessor and successor in the same class

First, let’s compute the total scores per student per year:

WITH TotalScores AS (
    SELECT 
        StudentID,
        Year,
        Class,
        SUM(Score) AS TotalScore
    FROM 
        StudentScores
    GROUP BY 
        StudentID, Year, Class
)
SELECT 
    StudentID,
    Year,
    Class,
    TotalScore,
    TotalScore - LAG(TotalScore) OVER (PARTITION BY Class ORDER BY Year) AS DiffFromPredecessor,
    LEAD(TotalScore) OVER (PARTITION BY Class ORDER BY Year) - TotalScore AS DiffFromSuccessor
FROM 
    TotalScores;

4. Difference of scores from the previous year for a student

SELECT 
    StudentID,
    Year,
    Class,
    Subject,
    Score,
    Score - LAG(Score) OVER (PARTITION BY StudentID, Subject ORDER BY Year) AS DiffFromPreviousYear
FROM 
    StudentScores;

Spark SQL Implementation

The implementation in Spark SQL would be similar. Here’s how you can do it:

Setup

First, let’s create a DataFrame df with the appropriate schema.

from pyspark.sql import SparkSession
from pyspark.sql.window import Window
import pyspark.sql.functions as F

spark = SparkSession.builder.appName("StudentScores").getOrCreate()

data = [
    (1, 2023, 10, 'Math', 85),
    (1, 2022, 10, 'Math', 80),
    (2, 2023, 10, 'Math', 75),
    (2, 2022, 10, 'Math', 70),
    (1, 2023, 10, 'Science', 90),
    (1, 2022, 10, 'Science', 85),
    (2, 2023, 10, 'Science', 88),
    (2, 2022, 10, 'Science', 84)
]

columns = ['StudentID', 'Year', 'Class', 'Subject', 'Score']

df = spark.createDataFrame(data, columns)
df.createOrReplaceTempView("StudentScores")

1. Difference of scores for each subject from predecessor for each subject

window_spec = Window.partitionBy('Class', 'Subject').orderBy('Year')

df.withColumn('DiffFromPredecessor', F.col('Score') - F.lag('Score').over(window_spec)).show()

2. Difference of scores for each subject from successor for each subject

df.withColumn('DiffFromSuccessor', F.lead('Score').over(window_spec) - F.col('Score')).show()

3. Difference of total scores for a student from their predecessor and successor in the same class

First, compute the total scores per student per year:

total_scores = df.groupBy('StudentID', 'Year', 'Class').agg(F.sum('Score').alias('TotalScore'))
total_scores.createOrReplaceTempView("TotalScores")

window_spec_total = Window.partitionBy('Class').orderBy('Year')

total_scores.withColumn('DiffFromPredecessor', F.col('TotalScore') - F.lag('TotalScore').over(window_spec_total)) 
            .withColumn('DiffFromSuccessor', F.lead('TotalScore').over(window_spec_total) - F.col('TotalScore')) 
            .show()

4. Difference of scores from the previous year for a student

window_spec_student = Window.partitionBy('StudentID', 'Subject').orderBy('Year')

df.withColumn('DiffFromPreviousYear', F.col('Score') - F.lag('Score').over(window_spec_student)).show()

These implementations use window functions to calculate the required differences in both standard SQL and Spark SQL. Adjust column names and table names as per your actual schema.

Pages ( 3 of 6 ): « Previous12 3 456Next »