en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP 


 


Microsoft® Community Contributor 


Microsoft® Community Contributor


 Read first, before you you use the blog! Maximize
דצמ11

Written by: ronen ariely
11/12/2013 03:03 RssIcon

Hierarchical data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children, but each child has no more than one parent (one-to-many relationship). Our records have a unique column (named "ID" in our case). Parents and children are tied together by "pointer" column. In our case we will call our pointer column "ParentID".

* Hierarchical databases are widely used in Information Management Systems. 

History

In late 2000 I was developing my first Content Management System (CMS), and I came across the need of using several hierarchical data sets. At first I used recursive functions in order to organize the data as hierarchical tree. This worked fine on small amount of data and low number of users, but very shortly I understood that I need to find a faster solution which will not require me to execute this complex functions each time that I bring the data from the database. My conclusion was that I need to found a better database structure, which allowing me to get the data in direct way (without any recursive).

In early 2001 I developed my first forum interface using hierarchical data structure, at that time I had already, my first basic structure idea. I published my ideas for the first time, as part of a tutorial on how to develop hierarchical forum GUI (in the client GUI side we get hierarchical structure of the messages) using "none hierarchical data structure". This was published at http://k4school.com website in Hebrew.

This post is based on my original post, with some additions from my second version which was publish once SQL Server had a new feature "Recursive Common Table Expression (CTE)". We are using recursive CTE only for one-time-job to build the content of the column, which we will use for the sorting...

My basic idea for hierarchical data structure was adding a new column (for example column named "OrderBy"), which will use my unique format (as I will explain in this article). This column should let me get the data from the database directly using simple SELECT query. The data should returned sorted in the order of hierarchical data structure, without using any loop, but simple SELECT query like:

select *
from MyTable
order by [OrderBy]

Once I had my hierarchical structure logic, I needed to convert my existing Hierarchical tables into the hierarchical structure "OrderBy" column. Originally for this one-time-action I develop a recursive function.

Around the year 2006, I changed my one-time-action queries a bit, to use a recursive common table expression (CTE). This article is basically a translation of my earlier publications. It presents one of my hierarchical structure logic, which I use for the last 13 years.

* Note! I will try to get the time to publish other "hierarchical structure's logics" in my future publications. There are several nice hierarchical database structures which fitting different needs. For example RANGEs logic can fit better for some cases.

* Note! In SQL 2008 Microsoft add a new column type named HierarchyID, that helps solve some of the problems in modeling and querying hierarchical information. This column actually use a very similar logic as my hierarchical column. I will add a simple example on how to use this type for getting the same result in appendix B.

Our case study

For the sake of this post we are going to use a very simple Parent-Child data with only 3 columns: ID (as our unique record identifier), Name (Just as representing a free data), and ParentID (for the pointer to the father record).

Our requirement is to sort the data, which is based on the Child-Parent hierarchy, according to the tree structure. Each child must be under its parent tree, and all children with the same father, sorted by their ID order (on Appendix A we will show how to sort by any other column).

------------------------------------------------ DDL
CREATE TABLE HierarchicalTable (
    ID int unique,
    ParentID int,
    Name nvarchar(10)
)
GO
 
------------------------------------------------ DML
-- TRUNCATE TABLE HierarchicalTable
insert into HierarchicalTable(ID,Name,ParentID)
select 1,'Alex',0
UNION ALL select 2,'John',1
UNION ALL select 3,'Mathew',2
UNION ALL select 4,'Philip',1
UNION ALL select 5,'Shone',0
UNION ALL select 6,'Shine',2
UNION ALL select 7,'Tom',2
UNION ALL select 8,'George',1
UNION ALL select 9,'Jim',5
UNION ALL select 10,'Jim',1
UNION ALL select 11,'Jim',1
UNION ALL select 12,'Jim',1
UNION ALL select 13,'Jim',1
UNION ALL select 14,'Jim',1
UNION ALL select 15,'Jim',1
UNION ALL select 16,'Jim',1
UNION ALL select 17,'Jim',1
UNION ALL select 18,'Jim',8
UNION ALL select 19,'Jim',8
UNION ALL select 20,'Jim',8
UNION ALL select 21,'Jim',8
UNION ALL select 22,'Jim',8
UNION ALL select 23,'Jim',8
UNION ALL select 24,'Jim',0
UNION ALL select 25,'Jim',0
UNION ALL select 26,'Jim',0
UNION ALL select 27,'Jim',0
UNION ALL select 28,'Jim',0
GO
 
SELECT * FROM HierarchicalTable
GO

Our data:

ID          ParentID    Name
---------- ----------- ----------

1            0            Alex
2            1            John
3            2            Mathew
4            1            Philip
5            0            Shone
6            2            Shine
7            2            Tom
8            1            George
9            5            Jim
10          1            Jim
11          1            Jim
12          1            Jim
13          1            Jim
14          1            Jim
15          1            Jim
16          1            Jim
17          1            Jim
18          8            Jim
19          8            Jim
20          8            Jim
21          8            Jim
22          8            Jim
23          8            Jim
24          0            Jim
25          0            Jim
26          0            Jim
27          0            Jim
28          0            Jim

Our Requested Results Set

ID          Name       ParentID    Tree-Like Structure
----------- ----------  ----------- ------------------------
1            Alex         0            Alex
2            John        1           _____ John
3            Mathew   2           __________ Mathew
6            Shine       2           __________ Shine
7            Tom         2           __________ Tom
4            Philip       1           _____ Philip
8            George    1           _____ George
18          Jim           8           __________ Jim
19          Jim           8           __________ Jim
20          Jim           8           __________ Jim
21          Jim           8           __________ Jim
22          Jim           8           __________ Jim
23          Jim           8           __________ Jim
10          Jim           1           _____ Jim
11          Jim           1           _____ Jim
12          Jim           1           _____ Jim
13          Jim           1           _____ Jim
14          Jim           1           _____ Jim
15          Jim           1           _____ Jim
16          Jim           1           _____ Jim
17          Jim           1           _____ Jim
5            Shone      0            Shone
9            Jim           5           _____ Jim
24          Jim           0            Jim
25          Jim           0            Jim
26          Jim           0            Jim
27          Jim           0            Jim
28          Jim           0            Jim

     

Explanation Step By Step from the basic Idea to the Final Queries

In solving the problem we are facing two major problems: (1) Sorting the data by the Hierarchical level, and (2) Sorting each Level by the ID column.

Step 1: Sorting the data by the Hierarchical level - Each Child will be under his father.

In order to get this we are going to use the following logic rule

Any string X will come before any string Y that start with string X.
in other words, If (Y LIKE 'X%') then X smaller then Y.

In this query we are going to use hierarchical CTE in order to read the data and build the content to a new the column which I will add to the original table (as I mentioned we can name this new column "OrderBy" or [Path] as I am using here). During the recursive task, the column [Path] gets his father string + the current ID (with the separator '@')

-- 1. Sorting the data by the Hierarchical level
;with MyCTE1 as(
    select
        ID,Name,ParentID,N'@' + CONVERT(nvarchar(MAX),ID) + N'@' as [Path]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,[Path] + CONVERT(nvarchar(MAX),t.ID) + N'@'
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
select *
from MyCTE1
order by [Path]

We can see that each child is under his father, but checking out each level we might notice that the data is not sorting correctly. For example: ID 5 is smaller than ID 28 and yet for most of us it returns underneath it.

28          Jim           0           @28@
5            Shone      0           @5@

The reason is that our numbers are sorting as string, and string is sorting left to right character after character (a string is basically an array of characters, char[]). The first character of '28' is 2 and the first character of '5' is 5 there for 28 is smaller than 5 in "string sorting".

Step 2: Sorting the rows in each Level by the ID column - Children on the same level get sorted by ID

There are several ways to format the data in order to get the correct order.

The first option and the most common one is adding several leading zero characters, in order to push the smaller numbers in front of the bigger numbers. 

For example let's build a string length 5 characters, by adding some zero characters for each number. The number 5 returns the string '00005' and the number 28 returns the string '00028'. 

Now when we will sort the value as string, then it will be sorted correctly, since the first 3 characters are the same, and the forth character is 2 for the number 28 and 0 for the number 5 (2>0).

Using this logic, we can fix the query of step 1, and get the below query, which will work great for us:

-- 2. Sorting each Level by the ID column
-- 2.1 Promote the smaller records by adding leading zeros & using constant length string
;with MyCTE1 as(
    select
        ID,Name,ParentID,N'@' + RIGHT('00000' + CONVERT(nvarchar(MAX),ID), 5) + N'@' as [Path]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,[Path] + RIGHT('00000' + CONVERT(nvarchar(MAX),t.ID), 5) + N'@'
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
select *
from MyCTE1
order by [Path]

Note! There are big disadvantages using this solution! We are limiting ourselves to a maximum of 99999 records in the table. Even if we change the length of the string to 10 or 100 we are still have a very big limitation. Moreover, our result set use very long string, which is a waste of memory, disk, IO and so on! We really need to continue reading :-)

We can move into second option now and think on a better format for our string.

In the solution above I converted the numbers to a string with length 5. Using 5 characters the maximum number that I can present will be 99999, but we want to have a lot more messages in the forum. If we are using INT data type for example, then the maximum value in SQL Server is 2147483647, which mean that I will need to convert all the numbers to a string with length 10.

In general a string with length X can hold maximum number [9*1 + 9*10 + … + 9*10*(X-1) ]. This is because the maximum number on base 10 is 9… ops… this is the answer… our next improvement will be to CONVERT our numbers to base 16 instead of base 10. Using higher base we can represent bigger number using the same length. For example using string with 5 characters we will be able to store the value FFFFF, which is 1048575 in base ten! This means that we can work with 1048575 rows instead of only 99999 :-)

In addition since we use constant length for each number, we do not need to separator which we used before (the character @)

-- 2.2 Promote the smaller records by CONVERTING TO BINARY & using constant length string
;with MyCTE1 as(
    select
        ID,Name,ParentID,CONVERT(varbinary(max),ID) as [Path]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,[Path] + CONVERT(binary(5),t.ID)
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
select *
from MyCTE1
order by [Path]

* Note! For several different reasons related in the use and analysis of our results I prefer to use an explicit separator between the values ​​of different levels. once we will move to dynamic chain length, it will become necessary for data analysis (eg Quick Find all data on a certain level, etc...). There for I will convert the data info VARCHAR data type using the build in function fn_varbintohexstr. This next query is not an improvement, but a preparation work for next dynamic solutions. Do not use it if you are OK with using this solution, since storing a binary data type is much better than storing the VARCHAR data type, for indexing purposes and more!

-- 2.2 Promote the smaller records by CONVERTING TO BINARY & using constant length string
;with MyCTE1 as(
    select
        ID,Name,ParentID,N'@' + RIGHT(sys.fn_varbintohexstr(CONVERT(binary(5),ID)),10) + N'@' as [Path]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,[Path] + RIGHT(sys.fn_varbintohexstr(CONVERT(binary(5),t.ID)),10) + N'@'
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
select *
from MyCTE1
order by [Path]

Our third option will base on the preview solutions, and will improve the value we store by moving to a dynamic length solution. 

This time we are going to find the maximum length of each level separately. I will use this value as the constant length of our data. On the one hand we solved the problem of maximum number of records in the table (as we use dynamic length) & and our final string will be much much shorter if we have low number of rows (since we don't need longer length then the maximum length of the data value), but on the other hand this query cost more.

* Note! all the queries in this post only executed one time in order to find the value to use for the sorting! Once we find the value we insert it to a new column and from that time any new rows that is inserted to the table gets the new value using triggers or computed column.

* Note! I will demonstrate this idea using numbers on base 10, but you can use this idea of dynamic length on the preview solution using binary data and get a better solution (the idea is the same, but it is easier to explain on base 10)

-- 2.3 Promote the smaller records by adding leading zero & using dynamic length
--     (i will show the idea using decimal numbers but we can use the BINARY solution here too)
;with MyCTE1 as(
    select
        ID, Name, ParentID,1 as [Level]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,[Level]+1
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
,MyCTE2 as(
    select ID,Name,ParentID, [Level], MAX(len(ID)) over (partition by [Level]) ML
    from MyCTE1
)
,MyCTE3 as(
    select
        ID,Name,ParentID,[Level], RIGHT(REPLICATE('0',ML)+CONVERT(NVARCHAR(MAX), ID),ML) as [OrderBy]
    from MyCTE2
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,t.[Level], [OrderBy] + N'@' + RIGHT(REPLICATE('0',ML)+CONVERT(nvarchar(MAX),t.ID),ML)
    from MyCTE3
    join MyCTE2 t on MyCTE3.ID = t.ParentID
)
select * from MyCTE3 order by OrderBy

All preview solutions based on the same logic:  We are Promoting the smaller numbers over the bigger numbers by adding leading zeros, and using constant length string (constant per level or in general).

At time, when I got those solutions, I looked for a better solution and I got a nice idea: instead of promoting the smaller numbers (Operation that requires cost query or/and produce long string), I will demote the larger numbers.

Final Queries to build our new column content

Our concept based on the fact that the character "@" is smaller than numbers characters, and there for we are using it as a separator between levels. We also based on the fact that the character "0" is smaller than any other number, and there for we can use it to push smaller numbers before the bigger numbers.

My forth solution based on the fact that the character "a" is bigger than numbers, and there for we can use it to push bigger numbers after the smaller numbers.

-- 2.4 Postpone the bigger records by adding leading letters.
;with MyCTE1 as(
    select
        ID,Name,ParentID,N'@' + REPLICATE('a',LEN(CONVERT(nvarchar(10),ID)) - 1) +CONVERT(nvarchar(10),ID) + N'@' as [Path]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,[Path] + REPLICATE('a',LEN(CONVERT(nvarchar(10),t.ID)) - 1) + CONVERT(nvarchar(10),t.ID) + N'@'
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
select *
from MyCTE1
order by [Path]

   

And now we can show the result with indent of space or any other character to make it more readable:

-- My main idea is to use this query one time in order to add a column which will hold the datya
-- in this format.
-- for this short tutrial, I will add a [LEVEL] column only for displaying the result in a Tree-Strubcture
;with MyCTE1 as(
    select
        ID,Name,ParentID, 0 as [Level],N'@' + REPLICATE('a',LEN(CONVERT(nvarchar(10),ID)) - 1) +CONVERT(nvarchar(10),ID) + N'@' as [Path]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID, [Level] + 1,[Path] + REPLICATE('a',LEN(CONVERT(nvarchar(10),t.ID)) - 1) + CONVERT(nvarchar(10),t.ID) + N'@'
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
select
    ID,Name,ParentID,[Path],
    REPLICATE('_____', [Level]) + ' ' + [Name] as [Tree-Like Structure]
from MyCTE1
order by [Path]

Our Result SET

  

Conclusion & Comments

This article show the way of sorting the data On-The-Fly, using a recursive query. Those solutions are very useful for a One-Time-Query, but if you are dealing with hierarchical data then the only solutions that will fit your need is to design your database structure to fit Hierarchical Data Structure. One of the solution is to use the structure from this tutorial for a special column which will be used for the sorting "order by" as mention above. You can use the queries in this tutorial in order to insert the first values to the new column. Once youe database is ready with the new column and the values you can use trigger or computed column. You should build index on this column and in your application use simple SELECT query with ORDER BY the new column.

Note! The dynamic solutions in this tutorial (like the third option) can't help us to build a new column as the data will change during time! I recommend to use the forth solution.

I hope this article was useful and fun to read :-)

  --- The End ---

Appendix A: Sorting by Different Column

In some situation we need to sort the hierarchical data using different column. For example in a forum interface we probably want sort the data by the time that the thread was last update. In that case we can add a timestamp column and we will want to sort the data by this column and not by the ID column. Another example will be when we have users name column and we shell want to sort by the user name column.

The logic is the same, and the only different is that instead of using ID we will use the function ROW_NUMBER to produce a column for the sorting.

-- Apependix A: order by different column
;with cte as (
    select 
        ID,
        ParentID,
        name,
        N'@' + REPLICATE('a',LEN(row_number() over(partition by parentid order by name))) + CONVERT(nvarchar(MAX), row_number() over(partition by parentid order by name)) + N'@' as [Path]
    from    HierarchicalTable
    where   ParentID = 0
    union all
    select 
        t.ID,
        t.ParentID,
        t.name,
        --row_number() over(partition by t.parentid order by t.name)
        [Path] + REPLICATE('a',LEN(row_number() over(partition by t.parentid order by t.name))) + CONVERT(nvarchar(MAX), row_number() over(partition by t.parentid order by t.name)) + N'@'
    from cte
    join HierarchicalTable t on cte.ID = t.ParentID
)
select *, REPLICATE('____', LEN([Path]) - LEN(REPLACE([Path],'@','')) - 2) + ' ' + [name]
from cte
order by [Path]

  

Appendix B: Using HierarchyID data type [SQL 2008+]

All the solutions till this point can be execute using any database. as ii mantion above I have use this logic 13 year ago. The only unique element of SQL Server that we used was the CTE and that an be replace by subquery or recursive function. In SQL Server 2008 Microsoft add a new column type, named HierarchyID, that helps solve some of the problems in modeling and querying hierarchical information. This column actually use a very similar logic as my hierarchical column.

The next solution can be execute only on SQL Server version 2008+, as we are going to use a build in column type HierarchyID.

-- Appendix B: Using HierarchyID data type [SQL 2008+]
;with MyCTE1 as(
    select
        ID,Name,ParentID,N'/' + CONVERT(nvarchar(MAX),ID) + N'/' as [Path]
    from HierarchicalTable
    where ParentID = 0
    union all
    select
        t.ID,t.Name,t.ParentID,[Path] + CONVERT(nvarchar(MAX),t.ID) + N'/'
    from MyCTE1
    join HierarchicalTable t on MyCTE1.ID = t.ParentID
)
select ID,Name,ParentID,[Path],convert(hierarchyid ,[Path])
From MyCTE1
order by convert(hierarchyid ,[Path])

  

Appendix C: maximum recursion has been exhausted

* By default SQL Server limit the recursive level to 100. If we will try to work with recursive CTE as in our solutions for more then 100 levels then we will get this error massage:

Msg 530, Level 16, State 1, Line X
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

The solution is simply Specify the maxrecursion option at the end of the query:

….
from XXX
option (maxrecursion 0)

  

Resources

This article is just a translation of Ronen Ariely's tutorials, posted about 13 years ago on the site http://k4school.com (old site which is already close for long time).

Tree structure
http://en.wikipedia.org/wiki/Tree_structure

WITH common_table_expression (Transact-SQL)
http://msdn.microsoft.com/en-us/library/ms175972.aspx