What are the options for storing hierarchical data in a relational database?
Good Overviews
Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:
- One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I've found.
- Models for hierarchical data: slides with good explanations of tradeoffs and example usage
- Representing hierarchies in MySQL: very good overview of Nested Set in particular
- Hierarchical data in RDBMSs: most comprehensive and well-organized set of links I've seen, but not much in the way of explanation
Options
Ones I am aware of and general features:
- Columns: ID, ParentID
- Easy to implement.
- Cheap node moves, inserts, and deletes.
- Expensive to find the level, ancestry & descendants, path
- Avoid N+1 via Common Table Expressions in databases that support them
- Columns: Left, Right
- Cheap ancestry, descendants
- Very expensive
O(n/2)
moves, inserts, deletes due to volatile encoding
- Bridge Table (a.k.a. Closure Table /w triggers)
- Uses separate join table with: ancestor, descendant, depth (optional)
- Cheap ancestry and descendants
- Writes costs
O(log n)
(size of subtree) for insert, updates, deletes - Normalized encoding: good for RDBMS statistics & query planner in joins
- Requires multiple rows per node
- Lineage Column (a.k.a. Materialized Path, Path Enumeration)
- Column: lineage (e.g. /parent/child/grandchild/etc...)
- Cheap descendants via prefix query (e.g.
LEFT(lineage, #) = '/enumerated/path'
) - Writes costs
O(log n)
(size of subtree) for insert, updates, deletes - Non-relational: relies on Array datatype or serialized string format
- Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
- Has real/float/decimal representation/precision issues
- Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with added trickiness of linear algebra.
- A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
- Cheap to iterate/paginate over
- Expensive move and delete
- Good Use: threaded discussion - forums / blog comments
- Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
- Cheap ancestors, descendants, level
- Cheap insert, delete, move of the leaves
- Expensive insert, delete, move of the internal nodes
- Hard limit to how deep the hierarchy can be
Database Specific Notes
MySQL
Oracle
- Use CONNECT BY to traverse Adjacency Lists
PostgreSQL
- ltree datatype for Materialized Path
SQL Server
- General summary
- 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.