• Data Definition Language (DDL) – to define and modify the structure of a database.
  • Data Manipulation Language (DML) – to access, manipulate, and modify data in a database.
  • Data Control Language (DCL) – to control user access to the data in the database and give or revoke privileges to a specific user or a group of users.
  • Transaction Control Language (TCL) – to control transactions in a database.
  • Data Query Language (DQL) – to perform queries on the data in a database to retrieve the necessary information from it.
  • DDL: CREATEALTER TABLEDROPTRUNCATE, and ADD COLUMN
  • DML: UPDATEDELETE, and INSERT
  • DCL: GRANT and REVOKE
  • TCL: COMMITSET TRANSACTIONROLLBACK, and SAVEPOINT
  • DQL: – SELECT
                +---------------------+
                |   SQL Commands      |
                +----------+----------+
                           |
                           v
                +----------+----------+
                |     DDL (Data      |
                | Definition Language)|
                +----------+----------+
                           |
                 +---------+---------+
                 |                   |
            +----+----+        +-----+-----+
            | CREATE  |        |  ALTER   |
            | TABLE   |        |  TABLE   |
            +----+----+        +-----+-----+
                 |                   |
                 v                   v
        +--------+---------+   +-----+-------+
        |                  |   |             |
 +------|   DROP TABLE     |   |  MODIFY     |
 |      |                  |   |  TABLE      |
 |      +------------------+   +-------------+
 |                                        |
 |                                        v
 |                               +--------+--------+
 |                               |                  |
 +-------------------------------|  RENAME TABLE    |
                                 |                  |
                                 +------------------+

                +----------+----------+
                |    DML (Data       |
                | Manipulation Language)|
                +----------+----------+
                           |
                 +---------+---------+
                 |                   |
            +----+----+        +-----+-----+
            | INSERT  |        |  UPDATE   |
            | INTO    |        |  TABLE    |
            +----+----+        +-----+-----+
                 |                   |
                 v                   v
        +--------+---------+   +-----+-------+
        |                  |   |             |
 +------|   DELETE FROM    |   |  MERGE INTO  |
 |      |                  |   |    TABLE    |
 |      +------------------+   +-------------+
 |                                        |
 |                                        v
 |                               +--------+--------+
 |                               |                  |
 +-------------------------------|  TRUNCATE TABLE  |
                                 |                  |
                                 +------------------+

                +----------+----------+
                |    DCL (Data       |
                | Control Language)   |
                +----------+----------+
                           |
                 +---------+---------+
                 |                   |
            +----+----+        +-----+-----+
            | GRANT  |        |  REVOKE   |
            |        |        |  PRIVILEGES|
            +----+----+        +-----+-----+
                                       |
                                       v
                              +--------+--------+
                              |                  |
                              |  SET TRANSACTION|
                              |                  |
                              +------------------+

Here’s a breakdown of the main SQL command categories and their purposes, including examples of commands commonly used within each:


1. Data Definition Language (DDL)

DDL commands define and modify the structure of database objects like tables, schemas, and indexes. They generally affect the schema or database structure rather than the data itself.

CommandPurposeExample
CREATECreates a new database object, such as a table, view, or index.CREATE TABLE students (id INT, name VARCHAR(50), age INT);
ALTER TABLEModifies an existing database object by adding or modifying columns.ALTER TABLE students ADD COLUMN address VARCHAR(100);
DROPDeletes an existing database object permanently.DROP TABLE students;
TRUNCATERemoves all data from a table but keeps the table structure intact.TRUNCATE TABLE students;
ADD COLUMNAdds a new column to an existing table.ALTER TABLE students ADD COLUMN grade VARCHAR(2);

2. Data Manipulation Language (DML)

DML commands allow for manipulation of data within existing database objects, making it possible to insert, update, or delete records.

CommandPurposeExample
INSERTAdds new records to a table.INSERT INTO students (id, name, age) VALUES (1, 'John Doe', 20);
UPDATEModifies existing records in a table.UPDATE students SET age = 21 WHERE id = 1;
DELETERemoves existing records from a table.DELETE FROM students WHERE age < 18;

3. Data Control Language (DCL)

DCL commands manage access to database objects, controlling permissions and access for different users.

CommandPurposeExample
GRANTProvides specific privileges to a user or role.GRANT SELECT, INSERT ON students TO user123;
REVOKERemoves previously granted privileges from a user or role.REVOKE INSERT ON students FROM user123;

4. Transaction Control Language (TCL)

TCL commands are used to manage database transactions, allowing for control over transactions’ commit, rollback, and savepoints.

CommandPurposeExample
COMMITSaves all changes made in the current transaction.COMMIT;
ROLLBACKUndoes all changes made in the current transaction.ROLLBACK;
SAVEPOINTSets a point within a transaction that can be rolled back to without affecting the entire transaction.SAVEPOINT save1;
SET TRANSACTIONSets characteristics for the current transaction.SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

5. Data Query Language (DQL)

DQL commands are used for querying the database and retrieving data.

CommandPurposeExample
SELECTRetrieves specific data from one or more tables.SELECT name, age FROM students WHERE age > 18;

Summary Table of SQL Command Categories

CategoryCommandsMain Purpose
DDLCREATE, ALTER, DROP, TRUNCATE, ADD COLUMNDefine and modify database structure
DMLINSERT, UPDATE, DELETEAccess, manipulate, and modify data
DCLGRANT, REVOKEControl user access and permissions
TCLCOMMIT, ROLLBACK, SAVEPOINT, SET TRANSACTIONManage transactions in the database
DQLSELECTRetrieve data from the database

Examples of Combined Usage

In a typical database operation, these commands might work together as follows:

  1. DDL: Create a new table for storing student data.
  2. DML: Insert data about new students into the table.
  3. DQL: Retrieve student information for analysis.
  4. TCL: Use COMMIT to save changes or ROLLBACK to undo mistakes.
  5. DCL: Grant access to specific users for querying or modifying student records.

These SQL commands are essential for efficient database management, enabling the organization and security of data, alongside flexibility in accessing and updating data as needed.


In Spark SQL, commands are organized similarly to traditional SQL, but there are some specifics due to Spark’s distributed nature and focus on big data processing. Here’s a breakdown of Spark SQL commands into DDL, DML, DCL, TCL, and DQL categories, with examples and use cases relevant to Spark’s ecosystem:


1. Data Definition Language (DDL)

DDL in Spark SQL defines or alters the structure of tables, databases, and views.

CommandPurposeExample
CREATE TABLECreates a new table in a database with specified schema.CREATE TABLE students (id INT, name STRING, age INT);
ALTER TABLEModifies an existing table by adding or renaming columns.ALTER TABLE students ADD COLUMNS (grade STRING);
DROP TABLEDeletes an existing table from the database permanently.DROP TABLE students;
TRUNCATE TABLERemoves all rows from a table but retains the table schema.TRUNCATE TABLE students;
CREATE VIEWCreates a virtual table based on a query.CREATE VIEW student_ages AS SELECT id, age FROM students;

2. Data Manipulation Language (DML)

DML in Spark SQL is used to manage data within tables, allowing for data insertion, deletion, or updates.

CommandPurposeExample
INSERT INTOAdds new data into an existing table.INSERT INTO students VALUES (1, 'John Doe', 20);
INSERT OVERWRITEReplaces existing data in a table.INSERT OVERWRITE TABLE students SELECT * FROM new_students;
UPDATEUpdates records in a table (available only for Delta tables in Spark 3.0+).UPDATE students SET age = 21 WHERE id = 1;
DELETEDeletes specific records from a table (Delta tables only).DELETE FROM students WHERE age < 18;

3. Data Control Language (DCL)

In Spark SQL, DCL commands, like GRANT and REVOKE, are primarily used in environments with user authentication and access control, such as Spark on Hive or Spark in cloud settings with specific authorization.

CommandPurposeExample
GRANTGrants access privileges to a user.GRANT SELECT ON students TO user123;
REVOKERemoves access privileges from a user.REVOKE SELECT ON students FROM user123;

4. Transaction Control Language (TCL)

TCL commands are primarily supported in Delta Lake for transaction management, offering transaction control for Spark data processing.

CommandPurposeExample
COMMITSaves all changes made within a transaction.(Auto-committed in Spark)
ROLLBACKReverts all changes made in the current transaction (Delta tables).ROLLBACK;
SAVEPOINTSets a savepoint within a transaction for rollback (Delta tables).SAVEPOINT save1;

5. Data Query Language (DQL)

DQL in Spark SQL includes querying data from tables and performing complex data transformations.

CommandPurposeExample
SELECTRetrieves specific columns and rows from a table.SELECT name, age FROM students WHERE age > 18;
GROUP BYAggregates data based on column grouping.SELECT age, COUNT(*) FROM students GROUP BY age;
ORDER BYSorts results based on specified columns.SELECT * FROM students ORDER BY age DESC;
LIMITRestricts the number of rows returned.SELECT * FROM students LIMIT 10;

Additional Operations in Spark SQL

CategoryCommandPurposeExample
Join TypesINNER, LEFT, RIGHT, FULL OUTER JOINJoins tables based on column relationships.SELECT * FROM students s JOIN courses c ON s.id = c.student_id;
Union TypesUNION, UNION ALLCombines results of two queries.SELECT * FROM students UNION ALL SELECT * FROM alumni;
Set OperationsEXCEPT, INTERSECTPerforms set operations on query results.SELECT * FROM students EXCEPT SELECT * FROM alumni;
DescribeDESCRIBE TABLEShows metadata of a table.DESCRIBE students;
Show OptionsSHOW TABLES, SHOW COLUMNSLists tables or columns in a database.SHOW TABLES;
SchemaCAST, CHANGE COLUMNAlters data types or schemas of a table.ALTER TABLE students CHANGE COLUMN age STRING;

Optimization Techniques in Spark SQL

OptimizationCommand or TechniqueExample
CachingCACHE TABLECACHE TABLE students;
Broadcast JoinBROADCAST hintSELECT /*+ BROADCAST(small_table) */ * FROM large_table JOIN small_table ON condition;
PartitioningRepartition or Coalescestudents.repartition(10);
File FormatUse optimized formats like Parquet, ORCCREATE TABLE students USING parquet OPTIONS (...);

User-Defined Functions (UDFs) in Spark SQL

UDFs allow for custom functions within SQL queries for more complex transformations.

UDF TypeExamplePurpose
Python UDFspark.udf.register("my_func", lambda x: x * 2)Registers a Python function to double a value.
SQL CallSELECT my_func(age) FROM students;Calls UDF in Spark SQL.

Common Data Types in Spark SQL

Data TypeDescriptionExample
STRINGSequence of characters.VARCHAR(100)
INT, BIGINTInteger values.INT or BIGINT
DECIMALFixed-point number.DECIMAL(10, 2)
ARRAY, MAPCollection data types.ARRAY<STRING>, MAP<STRING, INT>

This comprehensive Spark SQL cheatsheet covers essential SQL categories, commands, and advanced techniques, making it a quick-reference guide for various Spark SQL operations.


Discover more from AI HitsToday

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About the HintsToday

AI HintsToday is One Stop Adda to learn All about AI, Data, ML, Stat Learning, SAS, SQL, Python, Pyspark. AHT is Future!

Explore the Posts

Latest Comments

Latest posts

Discover more from AI HitsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading