Here’s a comprehensive, logically structured, and interactive guide to SQL Indexing, consolidating and enhancing all the content you’ve shared, complete with real examples, platform-specific insights, and advanced use cases:


🧠 Mastering Indexing in SQL: A Complete Guide

πŸ” What is Indexing in SQL?

Indexing is a performance optimization technique that allows fast retrieval of rows from a table, similar to a book index that helps locate topics quickly.

🎯 Core Idea

Instead of scanning every row (full table scan), the database engine can jump directly to matching rows using a sorted data structure like a B-tree or hash table.


βš™οΈ How Does Indexing Work?

  1. Index Creation: You specify one or more columns to index.
  2. Index Structure: Data is stored in a B-tree, bitmap, hash, or other structures depending on DB type.
  3. Query Execution: On query, the DB engine checks the query plan and uses the index if available and efficient.
  4. Row Lookup: Index holds pointers (row IDs or offsets) to locate the actual data.

πŸ“‚ Types of Indexes (with Syntax & Use Cases)

TypeDescriptionSyntaxExample Use Case
Single-column IndexIndex on one columnCREATE INDEX idx_col ON table(col);Fast lookup on a specific column
Composite IndexIndex on multiple columnsCREATE INDEX idx_multi ON table(col1, col2);Filtering by both columns
Unique IndexEnsures values are uniqueCREATE UNIQUE INDEX idx_unique ON table(col);Enforce no duplicate emails
Clustered IndexReorders actual rowsCREATE CLUSTERED INDEX idx_cl ON table(col);Only one allowed per table
Non-clustered IndexSeparate structure; references rowsCREATE NONCLUSTERED INDEX idx ON table(col);Most common index type
Function-Based IndexIndex on function outputCREATE INDEX idx_lower ON users(LOWER(email));Used in search-insensitive queries
Partial Index (PostgreSQL)Index on filtered rowsCREATE INDEX idx_active ON users(last_login) WHERE is_active = true;Speed up selective queries
Full-Text IndexFor full-text searchCREATE FULLTEXT INDEX idx_text ON docs(content);Search documents by keyword
Spatial IndexFor geospatial dataCREATE SPATIAL INDEX idx_geo ON locations(geom);Optimized for GIS queries

πŸš€ When to Use Indexes (Use Cases)

βœ… Best Use Cases for Indexes

Query PatternWhy Index HelpsExample
WHERE clauseDirectly locates matching rowsSELECT * FROM orders WHERE order_id = 12345;
JOIN conditionsFast row match between tablesJOIN orders ON customer_id
ORDER BY / GROUP BYEfficient sorting/groupingORDER BY created_at
Foreign key columnsUsed frequently in joinscustomer_id in orders
Search in expressionsIf functional index usedLOWER(email) = 'abc@test.com'
Columns with high cardinalityDistinct values benefit moreemail, SSN, transaction_id

⚠️ Cases Where Indexes May Not Help

  • Small tables: full scan is often faster.
  • Columns with low cardinality: e.g., gender = ‘M/F’.
  • Queries without WHERE/JOIN/ORDER BY.
  • Filtering by non-leading column of composite index.

πŸ”¨ Creating Indexes β€” Syntax Examples

-- Basic Index
CREATE INDEX idx_email ON employees(email);

-- Composite Index
CREATE INDEX idx_email_dept ON employees(email, department);

-- Unique Index
CREATE UNIQUE INDEX idx_unique_email ON users(email);

-- Function-based Index (PostgreSQL)
CREATE INDEX idx_lower_email ON users(LOWER(email));

-- Partial Index (PostgreSQL)
CREATE INDEX idx_active_users ON users(last_login) WHERE is_active = true;

-- Full-text Index (MySQL, PostgreSQL)
CREATE FULLTEXT INDEX idx_text_content ON documents(content);

-- Drop Index
DROP INDEX idx_email; -- PostgreSQL
DROP INDEX idx_email ON employees; -- MySQL

πŸ”¬ Platform-Specific Hands-On Demos

🐘 PostgreSQL Indexing Demo

CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100),
  email VARCHAR(100),
  city VARCHAR(50),
  created_at DATE
);

-- Insert sample data
INSERT INTO customers (name, email, city, created_at)
SELECT 
  'User' || i, 
  'user' || i || '@example.com', 
  CASE WHEN i % 2 = 0 THEN 'Delhi' ELSE 'Mumbai' END,
  CURRENT_DATE - (i || ' days')::interval
FROM generate_series(1, 100000) AS i;

-- Create index
CREATE INDEX idx_city ON customers(city);

-- Analyze performance
EXPLAIN ANALYZE SELECT * FROM customers WHERE city = 'Delhi';

🐬 MySQL Indexing Demo

CREATE TABLE orders (
  id INT AUTO_INCREMENT PRIMARY KEY,
  customer_id INT,
  amount DECIMAL(10,2),
  order_date DATE
);

-- Insert test data using procedure
DELIMITER $$
CREATE PROCEDURE insert_orders()
BEGIN
  DECLARE i INT DEFAULT 1;
  WHILE i <= 100000 DO
    INSERT INTO orders (customer_id, amount, order_date)
    VALUES (
      FLOOR(RAND()*1000),
      RAND()*1000,
      DATE_SUB(CURDATE(), INTERVAL FLOOR(RAND()*365) DAY)
    );
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

CALL insert_orders();

-- Index and performance check
CREATE INDEX idx_order_date ON orders(order_date);
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-01-01';

πŸ”₯ PySpark + Delta Lake Z-Ordering

# Create delta table
from pyspark.sql.functions import expr
df = spark.range(0, 1000000).withColumn("order_id", expr("id % 1000")) \
                             .withColumn("country", expr("CASE WHEN id % 2 = 0 THEN 'India' ELSE 'US' END")) \
                             .withColumn("amount", expr("rand() * 1000"))

df.write.format("delta").mode("overwrite").save("/mnt/delta/orders")
spark.sql("CREATE TABLE delta_orders USING DELTA LOCATION '/mnt/delta/orders'")

# Optimize with Z-Ordering
spark.sql("OPTIMIZE delta_orders ZORDER BY (country)")

# Query performance
spark.sql("SELECT * FROM delta_orders WHERE country = 'India'").explain(True)

πŸ“ˆ How to Check if Index is Used

  • PostgreSQL: EXPLAIN ANALYZE SELECT ... β†’ Look for Index Scan or Bitmap Index Scan
  • MySQL: EXPLAIN SELECT ... β†’ Look for Using index or ref
  • Spark: Use .explain(True) or Query Execution Plan

πŸ“‰ Disadvantages of Indexing

ConcernExplanation
Slower WritesINSERT/UPDATE/DELETE are slower due to index maintenance
Disk SpaceIndexes consume additional space
Over-indexingToo many indexes confuse the optimizer and bloat storage
FragmentationMay reduce performance over time

🧠 Advanced Tips & Best Practices

  • Start with frequently queried columns (WHERE/JOIN).
  • Use composite indexes carefully – Always filter using left-most column.
  • Don’t index everything – Focus on high-cardinality, high-impact queries.
  • Regularly analyze and drop unused indexes – Monitor via query plans.
  • Monitor index usage – Most RDBMSs provide pg_stat_user_indexes, SHOW INDEX, or query performance tables.

🧊 Final Summary

FeaturePurposeExample
πŸ”Ž Fast lookupsSingle-column indexCREATE INDEX ON employees(email)
🧩 Multiple filtersComposite indexCREATE INDEX ON orders(customer_id, order_date)
πŸ” UniquenessUnique indexCREATE UNIQUE INDEX ON users(email)
🧠 Function queriesFunction-based indexCREATE INDEX ON users(LOWER(email))
🎯 ConditionalPartial indexCREATE INDEX ON users(last_login) WHERE is_active = true
πŸ”„ Maintain speedZ-Order (Delta Lake)OPTIMIZE delta_orders ZORDER BY (country)


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Leave a Reply

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading