- Data Definition Language (DDL) – to define and modify the structure of a database.
- Data Manipulation Language (DML) – to access, manipulate, and modify data in a database.
- Data Control Language (DCL) – to control user access to the data in the database and give or revoke privileges to a specific user or a group of users.
- Transaction Control Language (TCL) – to control transactions in a database.
- Data Query Language (DQL) – to perform queries on the data in a database to retrieve the necessary information from it.
- DDL:
CREATE
,ALTER
TABLE
,DROP
,TRUNCATE
, andADD COLUMN
- DML:
UPDATE
,DELETE
, andINSERT
- DCL:
GRANT
andREVOKE
- TCL:
COMMIT
,SET TRANSACTION
,ROLLBACK
, andSAVEPOINT
- DQL: –
SELECT
+---------------------+
| SQL Commands |
+----------+----------+
|
v
+----------+----------+
| DDL (Data |
| Definition Language)|
+----------+----------+
|
+---------+---------+
| |
+----+----+ +-----+-----+
| CREATE | | ALTER |
| TABLE | | TABLE |
+----+----+ +-----+-----+
| |
v v
+--------+---------+ +-----+-------+
| | | |
+------| DROP TABLE | | MODIFY |
| | | | TABLE |
| +------------------+ +-------------+
| |
| v
| +--------+--------+
| | |
+-------------------------------| RENAME TABLE |
| |
+------------------+
+----------+----------+
| DML (Data |
| Manipulation Language)|
+----------+----------+
|
+---------+---------+
| |
+----+----+ +-----+-----+
| INSERT | | UPDATE |
| INTO | | TABLE |
+----+----+ +-----+-----+
| |
v v
+--------+---------+ +-----+-------+
| | | |
+------| DELETE FROM | | MERGE INTO |
| | | | TABLE |
| +------------------+ +-------------+
| |
| v
| +--------+--------+
| | |
+-------------------------------| TRUNCATE TABLE |
| |
+------------------+
+----------+----------+
| DCL (Data |
| Control Language) |
+----------+----------+
|
+---------+---------+
| |
+----+----+ +-----+-----+
| GRANT | | REVOKE |
| | | PRIVILEGES|
+----+----+ +-----+-----+
|
v
+--------+--------+
| |
| SET TRANSACTION|
| |
+------------------+
Here’s a breakdown of the main SQL command categories and their purposes, including examples of commands commonly used within each:
1. Data Definition Language (DDL)
DDL commands define and modify the structure of database objects like tables, schemas, and indexes. They generally affect the schema or database structure rather than the data itself.
Command | Purpose | Example |
---|---|---|
CREATE | Creates a new database object, such as a table, view, or index. | CREATE TABLE students (id INT, name VARCHAR(50), age INT); |
ALTER TABLE | Modifies an existing database object by adding or modifying columns. | ALTER TABLE students ADD COLUMN address VARCHAR(100); |
DROP | Deletes an existing database object permanently. | DROP TABLE students; |
TRUNCATE | Removes all data from a table but keeps the table structure intact. | TRUNCATE TABLE students; |
ADD COLUMN | Adds a new column to an existing table. | ALTER TABLE students ADD COLUMN grade VARCHAR(2); |
2. Data Manipulation Language (DML)
DML commands allow for manipulation of data within existing database objects, making it possible to insert, update, or delete records.
Command | Purpose | Example |
---|---|---|
INSERT | Adds new records to a table. | INSERT INTO students (id, name, age) VALUES (1, 'John Doe', 20); |
UPDATE | Modifies existing records in a table. | UPDATE students SET age = 21 WHERE id = 1; |
DELETE | Removes existing records from a table. | DELETE FROM students WHERE age < 18; |
3. Data Control Language (DCL)
DCL commands manage access to database objects, controlling permissions and access for different users.
Command | Purpose | Example |
---|---|---|
GRANT | Provides specific privileges to a user or role. | GRANT SELECT, INSERT ON students TO user123; |
REVOKE | Removes previously granted privileges from a user or role. | REVOKE INSERT ON students FROM user123; |
4. Transaction Control Language (TCL)
TCL commands are used to manage database transactions, allowing for control over transactions’ commit, rollback, and savepoints.
Command | Purpose | Example |
---|---|---|
COMMIT | Saves all changes made in the current transaction. | COMMIT; |
ROLLBACK | Undoes all changes made in the current transaction. | ROLLBACK; |
SAVEPOINT | Sets a point within a transaction that can be rolled back to without affecting the entire transaction. | SAVEPOINT save1; |
SET TRANSACTION | Sets characteristics for the current transaction. | SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; |
5. Data Query Language (DQL)
DQL commands are used for querying the database and retrieving data.
Command | Purpose | Example |
---|---|---|
SELECT | Retrieves specific data from one or more tables. | SELECT name, age FROM students WHERE age > 18; |
Summary Table of SQL Command Categories
Category | Commands | Main Purpose |
---|---|---|
DDL | CREATE, ALTER, DROP, TRUNCATE, ADD COLUMN | Define and modify database structure |
DML | INSERT, UPDATE, DELETE | Access, manipulate, and modify data |
DCL | GRANT, REVOKE | Control user access and permissions |
TCL | COMMIT, ROLLBACK, SAVEPOINT, SET TRANSACTION | Manage transactions in the database |
DQL | SELECT | Retrieve data from the database |
Examples of Combined Usage
In a typical database operation, these commands might work together as follows:
- DDL: Create a new table for storing student data.
- DML: Insert data about new students into the table.
- DQL: Retrieve student information for analysis.
- TCL: Use
COMMIT
to save changes orROLLBACK
to undo mistakes. - DCL: Grant access to specific users for querying or modifying student records.
These SQL commands are essential for efficient database management, enabling the organization and security of data, alongside flexibility in accessing and updating data as needed.
In Spark SQL, commands are organized similarly to traditional SQL, but there are some specifics due to Spark’s distributed nature and focus on big data processing. Here’s a breakdown of Spark SQL commands into DDL, DML, DCL, TCL, and DQL categories, with examples and use cases relevant to Spark’s ecosystem:
1. Data Definition Language (DDL)
DDL in Spark SQL defines or alters the structure of tables, databases, and views.
Command | Purpose | Example |
---|---|---|
CREATE TABLE | Creates a new table in a database with specified schema. | CREATE TABLE students (id INT, name STRING, age INT); |
ALTER TABLE | Modifies an existing table by adding or renaming columns. | ALTER TABLE students ADD COLUMNS (grade STRING); |
DROP TABLE | Deletes an existing table from the database permanently. | DROP TABLE students; |
TRUNCATE TABLE | Removes all rows from a table but retains the table schema. | TRUNCATE TABLE students; |
CREATE VIEW | Creates a virtual table based on a query. | CREATE VIEW student_ages AS SELECT id, age FROM students; |
2. Data Manipulation Language (DML)
DML in Spark SQL is used to manage data within tables, allowing for data insertion, deletion, or updates.
Command | Purpose | Example |
---|---|---|
INSERT INTO | Adds new data into an existing table. | INSERT INTO students VALUES (1, 'John Doe', 20); |
INSERT OVERWRITE | Replaces existing data in a table. | INSERT OVERWRITE TABLE students SELECT * FROM new_students; |
UPDATE | Updates records in a table (available only for Delta tables in Spark 3.0+). | UPDATE students SET age = 21 WHERE id = 1; |
DELETE | Deletes specific records from a table (Delta tables only). | DELETE FROM students WHERE age < 18; |
3. Data Control Language (DCL)
In Spark SQL, DCL commands, like GRANT and REVOKE, are primarily used in environments with user authentication and access control, such as Spark on Hive or Spark in cloud settings with specific authorization.
Command | Purpose | Example |
---|---|---|
GRANT | Grants access privileges to a user. | GRANT SELECT ON students TO user123; |
REVOKE | Removes access privileges from a user. | REVOKE SELECT ON students FROM user123; |
4. Transaction Control Language (TCL)
TCL commands are primarily supported in Delta Lake for transaction management, offering transaction control for Spark data processing.
Command | Purpose | Example |
---|---|---|
COMMIT | Saves all changes made within a transaction. | (Auto-committed in Spark) |
ROLLBACK | Reverts all changes made in the current transaction (Delta tables). | ROLLBACK; |
SAVEPOINT | Sets a savepoint within a transaction for rollback (Delta tables). | SAVEPOINT save1; |
5. Data Query Language (DQL)
DQL in Spark SQL includes querying data from tables and performing complex data transformations.
Command | Purpose | Example |
---|---|---|
SELECT | Retrieves specific columns and rows from a table. | SELECT name, age FROM students WHERE age > 18; |
GROUP BY | Aggregates data based on column grouping. | SELECT age, COUNT(*) FROM students GROUP BY age; |
ORDER BY | Sorts results based on specified columns. | SELECT * FROM students ORDER BY age DESC; |
LIMIT | Restricts the number of rows returned. | SELECT * FROM students LIMIT 10; |
Additional Operations in Spark SQL
Category | Command | Purpose | Example |
---|---|---|---|
Join Types | INNER, LEFT, RIGHT, FULL OUTER JOIN | Joins tables based on column relationships. | SELECT * FROM students s JOIN courses c ON s.id = c.student_id; |
Union Types | UNION, UNION ALL | Combines results of two queries. | SELECT * FROM students UNION ALL SELECT * FROM alumni; |
Set Operations | EXCEPT, INTERSECT | Performs set operations on query results. | SELECT * FROM students EXCEPT SELECT * FROM alumni; |
Describe | DESCRIBE TABLE | Shows metadata of a table. | DESCRIBE students; |
Show Options | SHOW TABLES, SHOW COLUMNS | Lists tables or columns in a database. | SHOW TABLES; |
Schema | CAST, CHANGE COLUMN | Alters data types or schemas of a table. | ALTER TABLE students CHANGE COLUMN age STRING; |
Optimization Techniques in Spark SQL
Optimization | Command or Technique | Example |
---|---|---|
Caching | CACHE TABLE | CACHE TABLE students; |
Broadcast Join | BROADCAST hint | SELECT /*+ BROADCAST(small_table) */ * FROM large_table JOIN small_table ON condition; |
Partitioning | Repartition or Coalesce | students.repartition(10); |
File Format | Use optimized formats like Parquet, ORC | CREATE TABLE students USING parquet OPTIONS (...); |
User-Defined Functions (UDFs) in Spark SQL
UDFs allow for custom functions within SQL queries for more complex transformations.
UDF Type | Example | Purpose |
---|---|---|
Python UDF | spark.udf.register("my_func", lambda x: x * 2) | Registers a Python function to double a value. |
SQL Call | SELECT my_func(age) FROM students; | Calls UDF in Spark SQL. |
Common Data Types in Spark SQL
Data Type | Description | Example |
---|---|---|
STRING | Sequence of characters. | VARCHAR(100) |
INT, BIGINT | Integer values. | INT or BIGINT |
DECIMAL | Fixed-point number. | DECIMAL(10, 2) |
ARRAY, MAP | Collection data types. | ARRAY<STRING> , MAP<STRING, INT> |
This comprehensive Spark SQL cheatsheet covers essential SQL categories, commands, and advanced techniques, making it a quick-reference guide for various Spark SQL operations.
Leave a Reply