Database Internals: Indexing Strategies for Performance

database sql performance backend system-design

February 5, 2026---

Comments

How Data is Stored: The Heap

By default (in most relational databases like PostgreSQL/MySQL), data is stored in a heap structure. Rows are appended to the file in no particular order. Finding a specific row requires a Full Table Scan—reading every page from disk until the record is found (O(N)). This is disastrous for performance on large tables.

The B-Tree Index

The most common index type is the Balanced Tree (B-Tree). It generalizes the binary search tree but allows nodes to have more than two children.

How it works:

The B-Tree keeps keys in sorted order.

It consists of a Root Node, Branch Nodes, and Leaf Nodes.

Leaf Nodes contain the actual pointer to the heap location (Resource ID / TID) of the data.

Because the tree is balanced, all leaf nodes are at the same depth.

Looking up a record in a table with millions of rows might only require 3 or 4 disk jumps (O(log N)).

Clustered vs. Non-Clustered Indexes

Clustered Index: The physical order of rows on the disk is the same as the index order. A table can typically have only one clustered index (usually the Primary Key). In MySQL (InnoDB), the leaf nodes are the actual data pages. This makes primary key lookups extremely fast.

Non-Clustered (Secondary) Index: A separate structure from the data rows. The leaf node contains the key value and a pointer (or Primary Key) to the actual data row. Using a secondary index often involves two lookups: finding the pointer in the index, then fetching the data.

Composite Indexes and the Leftmost Prefix Rule

You can create an index on multiple columns: CREATE INDEX idx_name_age ON users(last_name, age).

The order matters immensely! The database sorts first by last_name, and then by age. The query matters:

WHERE last_name = 'Smith' -> Uses Index

WHERE last_name = 'Smith' AND age > 20 -> Uses Index

WHERE age > 20 -> Index NOT Used (or inefficient scan)

This is because the tree is not sorted by age globally, only locally within a specific last name.

Optimizing Implementation

Cardinality: Index columns with high cardinality (many unique values). Indexing a 'Gender' column (M/F) is usually useless because the DB engine will prefer a sequential scan over jumping back and forth for 50% of the table.

Covering Indexes: If an index contains all the columns required by a query (e.g., SELECT age FROM users WHERE last_name='Smith'), the DB doesn't need to look up the heap at all. It answers purely from the index (Index Only Scan). This is a massive performance win.

Write Penalty: Every INSERT, UPDATE, or DELETE requires updating every index on the table. Over-indexing makes writes slow.

Database Internals: Indexing Strategies for Performance

Comments

Database Internals: Indexing Strategies for Performance

How Data is Stored: The Heap

The B-Tree Index

Clustered vs. Non-Clustered Indexes

Composite Indexes and the Leftmost Prefix Rule

Other Index Types

Optimizing Implementation

Conclusion