Efficient tree structures modeling in T-SQL

In almost every enterprise scale system there are some tree structures to be held in the database. It could be a system menu structure, an enterprise organisation structure or a user hierarchy. Lets assume that we need to model insurance agents hierarchy for insurance policies processing system.

Thinking about agents as children and parents and creating such structures seems to be natural:

Agents table

Column Type
ID Int, not null
ParentID Int, null
Role/Range Some example attribute

Structure Visualization

If we use this model we need to use recurrency to get some part of such tree i.e. to get agent with ID “2” and all its descendants. Problem is – recurrency in DBs is generally a bad thing because of its performance.

But it is not said that the database structure must be 100% normalized – sometimes redundancy is our only chance to gain some performance.  Lets add two additional, redundant columns and keep them up to date by recalculating their values whenever our three changes (new objects are added, existing ones are removed or replaced):

Improved Agents table

Column Type
ID Int, not null
ParentID Int, null
Role/Range Some example attribute
Depth Depth of element in the tree
Path Path of element in  the tree – i.e. slash delimited identifiers of an agent ancestors

Structure Vizualisation

The advantage of this structure is that we are able to select a part of the tree structure in single (and simple) T-SQL query.

To select agent with ID “2” we can execute the following query:

select * from Agents
where Path like '%/2/%' 
   or ID = 2

To select all agent’s with ID “1”  children (not descendants) we can execute the following:

select * from Agents
where Path like '%/1/%' 
    and Depth = 1

Simple and efficient. In order not to be groundless I will mention that Umbraco CMS successfully uses this approach.

And it works great with Linq2SQL 😉

context.GetTable<Agent>()
    .Where(x => x.Depth = 1 && x.Path.Contains("/2/"))
    .ToList();
Advertisements

6 Responses to Efficient tree structures modeling in T-SQL

  1. Third approach was described in this article(pl):
    http://misiekmisiek.pl/Home/Details/4

  2. Anh says:

    Good approach. But how do you get to ID:5 with all ancestors in 1 line result with a delimiter (semi-colon)? ID1;ID2;ID5

    • karczas says:

      To get all ancestors of an element you need to split its “path”. If you use Linq 2 SQL your code might look like this:

      var element = context.GetTable().First(x => x.Id == 5);
      int[] ancestorIds = element.Path.Split(‘/’).Where(x => !String.IsNullOrEmpty(x)).Select(x => Convert.ToInt32(x)).ToArray(); //you can simplify this code by creating extension method

      var ancestors = context.GetTable().Where(x => ancestorIds.Contains(x.Id));

      Was this helpful?

      • Anh says:

        Is there a way to do get the data in tsql? Maybe CTE?

      • karczas says:

        Parsing “path” of an element is achievable in t-sql code but I’d strongly recommend to create sql query on client side – by client I mean application that is trying to get these elements from your DB – by using ORM/Linq/or just creating string SQL. Your query would look like this:

        select * from Agents where Id in (1, 2, 5)

        After “in” clause you need to specify Ids of ancestors – you can get them by parsing elements “path” but I have no sample code which shows how to parse it in t-sql but I’m sure it’s possible.

        The tree modeling approach I’ve blogged about enables you to get tree structures by simple selects which are (in most of the cases) lightning fast. IMO you should consider using CTE when your elements have only parent id – then recursive CTE is one of your options. I’m sure that there are lot of examples of using recursive CTEs on the Web.

      • Anh says:

        I’ve tried this but it doesn’t show the full path/ancestor.

        SELECT ISNULL(C.[Name] + ‘/’, ‘a’) + ISNULL(B.[Name] + ‘/’, ‘b’) + A.[Name], A.[Name]
        FROM [Agents] A LEFT OUTER JOIN [dbo].[Agents] B
        ON A.[ParentID] = B.[ID]
        LEFT OUTER JOIN [dbo].[Agents] C
        ON B.[ParentID] = C.[ID]
        WHERE A.IsCapitol = ‘YES’

        Id: ParentId: IsCapitol: Name:
        1 NULL Earth
        2 1 North America
        3 2 USA
        4 3 Texas
        5 4 Houston
        6 4 Dallas
        7 3 YES District Columbia
        8 4 YES Austin

        I’m looking for this:
        Earth/North America/USA/Texas/Austin
        Earth/North America/USA/District Columbia

        But it gave me this:
        North America/USA/District Columbia
        USA/Texas/Austin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: