Hereβs a comprehensive, logically structured, and interactive guide to SQL Indexing, consolidating and enhancing all the content you’ve shared, complete with real examples, platform-specific insights, and advanced use cases:
π§ Mastering Indexing in SQL: A Complete Guide
π What is Indexing in SQL?
Indexing is a performance optimization technique that allows fast retrieval of rows from a table, similar to a book index that helps locate topics quickly.
π― Core Idea
Instead of scanning every row (full table scan), the database engine can jump directly to matching rows using a sorted data structure like a B-tree or hash table.
βοΈ How Does Indexing Work?
Index Creation: You specify one or more columns to index.
Index Structure: Data is stored in a B-tree, bitmap, hash, or other structures depending on DB type.
Query Execution: On query, the DB engine checks the query plan and uses the index if available and efficient.
Row Lookup: Index holds pointers (row IDs or offsets) to locate the actual data.
π Types of Indexes (with Syntax & Use Cases)
Type
Description
Syntax
Example Use Case
Single-column Index
Index on one column
CREATE INDEX idx_col ON table(col);
Fast lookup on a specific column
Composite Index
Index on multiple columns
CREATE INDEX idx_multi ON table(col1, col2);
Filtering by both columns
Unique Index
Ensures values are unique
CREATE UNIQUE INDEX idx_unique ON table(col);
Enforce no duplicate emails
Clustered Index
Reorders actual rows
CREATE CLUSTERED INDEX idx_cl ON table(col);
Only one allowed per table
Non-clustered Index
Separate structure; references rows
CREATE NONCLUSTERED INDEX idx ON table(col);
Most common index type
Function-Based Index
Index on function output
CREATE INDEX idx_lower ON users(LOWER(email));
Used in search-insensitive queries
Partial Index (PostgreSQL)
Index on filtered rows
CREATE INDEX idx_active ON users(last_login) WHERE is_active = true;
Speed up selective queries
Full-Text Index
For full-text search
CREATE FULLTEXT INDEX idx_text ON docs(content);
Search documents by keyword
Spatial Index
For geospatial data
CREATE SPATIAL INDEX idx_geo ON locations(geom);
Optimized for GIS queries
π When to Use Indexes (Use Cases)
β Best Use Cases for Indexes
Query Pattern
Why Index Helps
Example
WHERE clause
Directly locates matching rows
SELECT * FROM orders WHERE order_id = 12345;
JOIN conditions
Fast row match between tables
JOIN orders ON customer_id
ORDER BY / GROUP BY
Efficient sorting/grouping
ORDER BY created_at
Foreign key columns
Used frequently in joins
customer_id in orders
Search in expressions
If functional index used
LOWER(email) = 'abc@test.com'
Columns with high cardinality
Distinct values benefit more
email, SSN, transaction_id
β οΈ Cases Where Indexes May Not Help
Small tables: full scan is often faster.
Columns with low cardinality: e.g., gender = ‘M/F’.
Queries without WHERE/JOIN/ORDER BY.
Filtering by non-leading column of composite index.
π¨ Creating Indexes β Syntax Examples
-- Basic Index
CREATE INDEX idx_email ON employees(email);
-- Composite Index
CREATE INDEX idx_email_dept ON employees(email, department);
-- Unique Index
CREATE UNIQUE INDEX idx_unique_email ON users(email);
-- Function-based Index (PostgreSQL)
CREATE INDEX idx_lower_email ON users(LOWER(email));
-- Partial Index (PostgreSQL)
CREATE INDEX idx_active_users ON users(last_login) WHERE is_active = true;
-- Full-text Index (MySQL, PostgreSQL)
CREATE FULLTEXT INDEX idx_text_content ON documents(content);
-- Drop Index
DROP INDEX idx_email; -- PostgreSQL
DROP INDEX idx_email ON employees; -- MySQL
π¬ Platform-Specific Hands-On Demos
π PostgreSQL Indexing Demo
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
city VARCHAR(50),
created_at DATE
);
-- Insert sample data
INSERT INTO customers (name, email, city, created_at)
SELECT
'User' || i,
'user' || i || '@example.com',
CASE WHEN i % 2 = 0 THEN 'Delhi' ELSE 'Mumbai' END,
CURRENT_DATE - (i || ' days')::interval
FROM generate_series(1, 100000) AS i;
-- Create index
CREATE INDEX idx_city ON customers(city);
-- Analyze performance
EXPLAIN ANALYZE SELECT * FROM customers WHERE city = 'Delhi';
π¬ MySQL Indexing Demo
CREATE TABLE orders (
id INT AUTO_INCREMENT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10,2),
order_date DATE
);
-- Insert test data using procedure
DELIMITER $$
CREATE PROCEDURE insert_orders()
BEGIN
DECLARE i INT DEFAULT 1;
WHILE i <= 100000 DO
INSERT INTO orders (customer_id, amount, order_date)
VALUES (
FLOOR(RAND()*1000),
RAND()*1000,
DATE_SUB(CURDATE(), INTERVAL FLOOR(RAND()*365) DAY)
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL insert_orders();
-- Index and performance check
CREATE INDEX idx_order_date ON orders(order_date);
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-01';
π₯ PySpark + Delta Lake Z-Ordering
# Create delta table
from pyspark.sql.functions import expr
df = spark.range(0, 1000000).withColumn("order_id", expr("id % 1000")) \
.withColumn("country", expr("CASE WHEN id % 2 = 0 THEN 'India' ELSE 'US' END")) \
.withColumn("amount", expr("rand() * 1000"))
df.write.format("delta").mode("overwrite").save("/mnt/delta/orders")
spark.sql("CREATE TABLE delta_orders USING DELTA LOCATION '/mnt/delta/orders'")
# Optimize with Z-Ordering
spark.sql("OPTIMIZE delta_orders ZORDER BY (country)")
# Query performance
spark.sql("SELECT * FROM delta_orders WHERE country = 'India'").explain(True)
π How to Check if Index is Used
PostgreSQL: EXPLAIN ANALYZE SELECT ... β Look for Index Scan or Bitmap Index Scan
MySQL: EXPLAIN SELECT ... β Look for Using index or ref
Spark: Use .explain(True) or Query Execution Plan
π Disadvantages of Indexing
Concern
Explanation
Slower Writes
INSERT/UPDATE/DELETE are slower due to index maintenance
Disk Space
Indexes consume additional space
Over-indexing
Too many indexes confuse the optimizer and bloat storage
Fragmentation
May reduce performance over time
π§ Advanced Tips & Best Practices
Start with frequently queried columns (WHERE/JOIN).
Use composite indexes carefully β Always filter using left-most column.
Don’t index everything β Focus on high-cardinality, high-impact queries.
Regularly analyze and drop unused indexes β Monitor via query plans.
Monitor index usage β Most RDBMSs provide pg_stat_user_indexes, SHOW INDEX, or query performance tables.
Leave a Reply