What are the options for storing hierarchical data in a relational database?
Asked 07 September, 2021
Viewed 2.1K times
  • 63
Votes

Good Overviews

Generally speaking, you're making a decision between fast read times (for example, nested set) or fast write times (adjacency list). Usually, you end up with a combination of the options below that best fit your needs. The following provides some in-depth reading:

Options

Ones I am aware of and general features:

  1. Adjacency List:
  • Columns: ID, ParentID
  • Easy to implement.
  • Cheap node moves, inserts, and deletes.
  • Expensive to find the level, ancestry & descendants, path
  • Avoid N+1 via Common Table Expressions in databases that support them
  1. Nested Set (a.k.a Modified Preorder Tree Traversal)
  • Columns: Left, Right
  • Cheap ancestry, descendants
  • Very expensive O(n/2) moves, inserts, deletes due to volatile encoding
  1. Bridge Table (a.k.a. Closure Table /w triggers)
  • Uses separate join table with: ancestor, descendant, depth (optional)
  • Cheap ancestry and descendants
  • Writes costs O(log n) (size of subtree) for insert, updates, deletes
  • Normalized encoding: good for RDBMS statistics & query planner in joins
  • Requires multiple rows per node
  1. Lineage Column (a.k.a. Materialized Path, Path Enumeration)
  • Column: lineage (e.g. /parent/child/grandchild/etc...)
  • Cheap descendants via prefix query (e.g. LEFT(lineage, #) = '/enumerated/path')
  • Writes costs O(log n) (size of subtree) for insert, updates, deletes
  • Non-relational: relies on Array datatype or serialized string format
  1. Nested Intervals
  • Like nested set, but with real/float/decimal so that the encoding isn't volatile (inexpensive move/insert/delete)
  • Has real/float/decimal representation/precision issues
  • Matrix encoding variant adds ancestor encoding (materialized path) for "free", but with added trickiness of linear algebra.
  1. Flat Table
  • A modified Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
  • Cheap to iterate/paginate over
  • Expensive move and delete
  • Good Use: threaded discussion - forums / blog comments
  1. Multiple lineage columns
  • Columns: one for each lineage level, refers to all the parents up to the root, levels down from the item's level are set to NULL
  • Cheap ancestors, descendants, level
  • Cheap insert, delete, move of the leaves
  • Expensive insert, delete, move of the internal nodes
  • Hard limit to how deep the hierarchy can be

Database Specific Notes

MySQL

Oracle

PostgreSQL

SQL Server

  • General summary
  • 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.

7 Answer