Hierarchical data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children, but each child has no more than one parent (one-to-many relationship). Our records have a unique column (named "ID" in our case). Parents and children are tied together by "pointer" column. In our case we will call our pointer column "ParentID".
In late 2000 I was developing my first Content Management System (CMS), and I came across the need of using several hierarchical data sets. At first I used recursive functions in order to organize the data as hierarchical tree. This worked fine on small amount of data and low number of users, but very shortly I understood that I need to find a faster solution which will not require me to execute this complex functions each time that I bring the data from the database. My conclusion was that I need to found a better database structure, which allowing me to get the data in direct way (without any recursive).
In early 2001 I developed my first forum interface using hierarchical data structure, at that time I had already, my first basic structure idea. I published my ideas for the first time, as part of a tutorial on how to develop hierarchical forum GUI (in the client GUI side we get hierarchical structure of the messages) using "none hierarchical data structure". This was published at http://k4school.com website in Hebrew.
This post is based on my original post, with some additions from my second version which was publish once SQL Server had a new feature "Recursive Common Table Expression (CTE)". We are using recursive CTE only for one-time-job to build the content of the column, which we will use for the sorting...
Once I had my hierarchical structure logic, I needed to convert my existing Hierarchical tables into the hierarchical structure "OrderBy" column. Originally for this one-time-action I develop a recursive function.
For the sake of this post we are going to use a very simple Parent-Child data with only 3 columns: ID (as our unique record identifier), Name (Just as representing a free data), and ParentID (for the pointer to the father record).
Our requirement is to sort the data, which is based on the Child-Parent hierarchy, according to the tree structure. Each child must be under its parent tree, and all children with the same father, sorted by their ID order (on Appendix A we will show how to sort by any other column).
------------------------------------------------ DDL
CREATE
TABLE
HierarchicalTable (
ID
int
unique
,
ParentID
int
,
Name
nvarchar(10)
)
GO
------------------------------------------------ DML
-- TRUNCATE TABLE HierarchicalTable
insert
into
HierarchicalTable(ID,
Name
,ParentID)
select
1,
'Alex'
,0
UNION
ALL
select
2,
'John'
,1
UNION
ALL
select
3,
'Mathew'
,2
UNION
ALL
select
4,
'Philip'
,1
UNION
ALL
select
5,
'Shone'
,0
UNION
ALL
select
6,
'Shine'
,2
UNION
ALL
select
7,
'Tom'
,2
UNION
ALL
select
8,
'George'
,1
UNION
ALL
select
9,
'Jim'
,5
UNION
ALL
select
10,
'Jim'
,1
UNION
ALL
select
11,
'Jim'
,1
UNION
ALL
select
12,
'Jim'
,1
UNION
ALL
select
13,
'Jim'
,1
UNION
ALL
select
14,
'Jim'
,1
UNION
ALL
select
15,
'Jim'
,1
UNION
ALL
select
16,
'Jim'
,1
UNION
ALL
select
17,
'Jim'
,1
UNION
ALL
select
18,
'Jim'
,8
UNION
ALL
select
19,
'Jim'
,8
UNION
ALL
select
20,
'Jim'
,8
UNION
ALL
select
21,
'Jim'
,8
UNION
ALL
select
22,
'Jim'
,8
UNION
ALL
select
23,
'Jim'
,8
UNION
ALL
select
24,
'Jim'
,0
UNION
ALL
select
25,
'Jim'
,0
UNION
ALL
select
26,
'Jim'
,0
UNION
ALL
select
27,
'Jim'
,0
UNION
ALL
select
28,
'Jim'
,0
GO
SELECT
*
FROM
HierarchicalTable
GO
Explanation Step By Step from the basic Idea to the Final Queries
In solving the problem we are facing two major problems: (1) Sorting the data by the Hierarchical level, and (2) Sorting each Level by the ID column.
Step 1: Sorting the data by the Hierarchical level - Each Child will be under his father.
In order to get this we are going to use the following logic rule:
Any string X will come before any string Y that start with string X.
in other words, If (Y LIKE 'X%') then X smaller then Y.
In this query we are going to use hierarchical CTE in order to read the data and build the content to a new the column which I will add to the original table (as I mentioned we can name this new column "OrderBy" or [Path] as I am using here). During the recursive task, the column [Path] gets his father string + the current ID (with the separator '@')
-- 1. Sorting the data by the Hierarchical level
;
with
MyCTE1
as
(
select
ID,
Name
,ParentID,N
'@'
+
CONVERT
(nvarchar(
MAX
),ID) + N
'@'
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,[Path] +
CONVERT
(nvarchar(
MAX
),t.ID) + N
'@'
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
select
*
from
MyCTE1
order
by
[Path]
We can see that each child is under his father, but checking out each level we might notice that the data is not sorting correctly. For example: ID 5 is smaller than ID 28 and yet for most of us it returns underneath it.
28 Jim 0 @28@
5 Shone 0 @5@
The reason is that our numbers are sorting as string, and string is sorting left to right character after character (a string is basically an array of characters, char[]). The first character of '28' is 2 and the first character of '5' is 5 there for 28 is smaller than 5 in "string sorting".
Step 2: Sorting the rows in each Level by the ID column - Children on the same level get sorted by ID
There are several ways to format the data in order to get the correct order.
The first option and the most common one is adding several leading zero characters, in order to push the smaller numbers in front of the bigger numbers.
For example let's build a string length 5 characters, by adding some zero characters for each number. The number 5 returns the string '00005' and the number 28 returns the string '00028'.
Now when we will sort the value as string, then it will be sorted correctly, since the first 3 characters are the same, and the forth character is 2 for the number 28 and 0 for the number 5 (2>0).
Using this logic, we can fix the query of step 1, and get the below query, which will work great for us:
-- 2. Sorting each Level by the ID column
-- 2.1 Promote the smaller records by adding leading zeros & using constant length string
;
with
MyCTE1
as
(
select
ID,
Name
,ParentID,N
'@'
+
RIGHT
(
'00000'
+
CONVERT
(nvarchar(
MAX
),ID), 5) + N
'@'
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,[Path] +
RIGHT
(
'00000'
+
CONVERT
(nvarchar(
MAX
),t.ID), 5) + N
'@'
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
select
*
from
MyCTE1
order
by
[Path]
Note! There are big disadvantages using this solution! We are limiting ourselves to a maximum of 99999 records in the table. Even if we change the length of the string to 10 or 100 we are still have a very big limitation. Moreover, our result set use very long string, which is a waste of memory, disk, IO and so on! We really need to continue reading :-)
We can move into second option now and think on a better format for our string.
In the solution above I converted the numbers to a string with length 5. Using 5 characters the maximum number that I can present will be 99999, but we want to have a lot more messages in the forum. If we are using INT data type for example, then the maximum value in SQL Server is 2147483647, which mean that I will need to convert all the numbers to a string with length 10.
In general a string with length X can hold maximum number [9*1 + 9*10 + … + 9*10*(X-1) ]. This is because the maximum number on base 10 is 9… ops… this is the answer… our next improvement will be to CONVERT our numbers to base 16 instead of base 10. Using higher base we can represent bigger number using the same length. For example using string with 5 characters we will be able to store the value FFFFF, which is 1048575 in base ten! This means that we can work with 1048575 rows instead of only 99999 :-)
In addition since we use constant length for each number, we do not need to separator which we used before (the character @)
-- 2.2 Promote the smaller records by CONVERTING TO BINARY & using constant length string
;
with
MyCTE1
as
(
select
ID,
Name
,ParentID,
CONVERT
(varbinary(
max
),ID)
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,[Path] +
CONVERT
(
binary
(5),t.ID)
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
select
*
from
MyCTE1
order
by
[Path]
* Note! For several different reasons related in the use and analysis of our results I prefer to use an explicit separator between the values of different levels. once we will move to dynamic chain length, it will become necessary for data analysis (eg Quick Find all data on a certain level, etc...). There for I will convert the data info VARCHAR data type using the build in function fn_varbintohexstr. This next query is not an improvement, but a preparation work for next dynamic solutions. Do not use it if you are OK with using this solution, since storing a binary data type is much better than storing the VARCHAR data type, for indexing purposes and more!
-- 2.2 Promote the smaller records by CONVERTING TO BINARY & using constant length string
;
with
MyCTE1
as
(
select
ID,
Name
,ParentID,N
'@'
+
RIGHT
(sys.fn_varbintohexstr(
CONVERT
(
binary
(5),ID)),10) + N
'@'
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,[Path] +
RIGHT
(sys.fn_varbintohexstr(
CONVERT
(
binary
(5),t.ID)),10) + N
'@'
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
select
*
from
MyCTE1
order
by
[Path]
Our third option will base on the preview solutions, and will improve the value we store by moving to a dynamic length solution.
This time we are going to find the maximum length of each level separately. I will use this value as the constant length of our data. On the one hand we solved the problem of maximum number of records in the table (as we use dynamic length) & and our final string will be much much shorter if we have low number of rows (since we don't need longer length then the maximum length of the data value), but on the other hand this query cost more.
* Note! all the queries in this post only executed one time in order to find the value to use for the sorting! Once we find the value we insert it to a new column and from that time any new rows that is inserted to the table gets the new value using triggers or computed column.
* Note! I will demonstrate this idea using numbers on base 10, but you can use this idea of dynamic length on the preview solution using binary data and get a better solution (the idea is the same, but it is easier to explain on base 10)
-- 2.3 Promote the smaller records by adding leading zero & using dynamic length
-- (i will show the idea using decimal numbers but we can use the BINARY solution here too)
;
with
MyCTE1
as
(
select
ID,
Name
, ParentID,1
as
[
Level
]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,[
Level
]+1
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
,MyCTE2
as
(
select
ID,
Name
,ParentID, [
Level
],
MAX
(len(ID)) over (partition
by
[
Level
]) ML
from
MyCTE1
)
,MyCTE3
as
(
select
ID,
Name
,ParentID,[
Level
],
RIGHT
(REPLICATE(
'0'
,ML)+
CONVERT
(NVARCHAR(
MAX
), ID),ML)
as
[OrderBy]
from
MyCTE2
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,t.[
Level
], [OrderBy] + N
'@'
+
RIGHT
(REPLICATE(
'0'
,ML)+
CONVERT
(nvarchar(
MAX
),t.ID),ML)
from
MyCTE3
join
MyCTE2 t
on
MyCTE3.ID = t.ParentID
)
select
*
from
MyCTE3
order
by
OrderBy
All preview solutions based on the same logic: We are Promoting the smaller numbers over the bigger numbers by adding leading zeros, and using constant length string (constant per level or in general).
At time, when I got those solutions, I looked for a better solution and I got a nice idea: instead of promoting the smaller numbers (Operation that requires cost query or/and produce long string), I will demote the larger numbers.
Final Queries to build our new column content
Our concept based on the fact that the character "@" is smaller than numbers characters, and there for we are using it as a separator between levels. We also based on the fact that the character "0" is smaller than any other number, and there for we can use it to push smaller numbers before the bigger numbers.
My forth solution based on the fact that the character "a" is bigger than numbers, and there for we can use it to push bigger numbers after the smaller numbers.
-- 2.4 Postpone the bigger records by adding leading letters.
;
with
MyCTE1
as
(
select
ID,
Name
,ParentID,N
'@'
+ REPLICATE(
'a'
,LEN(
CONVERT
(nvarchar(10),ID)) - 1) +
CONVERT
(nvarchar(10),ID) + N
'@'
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,[Path] + REPLICATE(
'a'
,LEN(
CONVERT
(nvarchar(10),t.ID)) - 1) +
CONVERT
(nvarchar(10),t.ID) + N
'@'
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
select
*
from
MyCTE1
order
by
[Path]
And now we can show the result with indent of space or any other character to make it more readable:
-- My main idea is to use this query one time in order to add a column which will hold the datya
-- in this format.
-- for this short tutrial, I will add a [LEVEL] column only for displaying the result in a Tree-Strubcture
;
with
MyCTE1
as
(
select
ID,
Name
,ParentID, 0
as
[
Level
],N
'@'
+ REPLICATE(
'a'
,LEN(
CONVERT
(nvarchar(10),ID)) - 1) +
CONVERT
(nvarchar(10),ID) + N
'@'
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID, [
Level
] + 1,[Path] + REPLICATE(
'a'
,LEN(
CONVERT
(nvarchar(10),t.ID)) - 1) +
CONVERT
(nvarchar(10),t.ID) + N
'@'
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
select
ID,
Name
,ParentID,[Path],
REPLICATE(
'_____'
, [
Level
]) +
' '
+ [
Name
]
as
[Tree-
Like
Structure]
from
MyCTE1
order
by
[Path]
Our Result SET
Conclusion & Comments
This article show the way of sorting the data On-The-Fly, using a recursive query. Those solutions are very useful for a One-Time-Query, but if you are dealing with hierarchical data then the only solutions that will fit your need is to design your database structure to fit Hierarchical Data Structure. One of the solution is to use the structure from this tutorial for a special column which will be used for the sorting "order by" as mention above. You can use the queries in this tutorial in order to insert the first values to the new column. Once youe database is ready with the new column and the values you can use trigger or computed column. You should build index on this column and in your application use simple SELECT query with ORDER BY the new column.
Note! The dynamic solutions in this tutorial (like the third option) can't help us to build a new column as the data will change during time! I recommend to use the forth solution.
I hope this article was useful and fun to read :-)
--- The End ---
Appendix A: Sorting by Different Column
In some situation we need to sort the hierarchical data using different column. For example in a forum interface we probably want sort the data by the time that the thread was last update. In that case we can add a timestamp column and we will want to sort the data by this column and not by the ID column. Another example will be when we have users name column and we shell want to sort by the user name column.
The logic is the same, and the only different is that instead of using ID we will use the function ROW_NUMBER to produce a column for the sorting.
-- Apependix A: order by different column
;
with
cte
as
(
select
ID,
ParentID,
name
,
N
'@'
+ REPLICATE(
'a'
,LEN(row_number() over(partition
by
parentid
order
by
name
))) +
CONVERT
(nvarchar(
MAX
), row_number() over(partition
by
parentid
order
by
name
)) + N
'@'
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,
t.ParentID,
t.
name
,
--row_number() over(partition by t.parentid order by t.name)
[Path] + REPLICATE(
'a'
,LEN(row_number() over(partition
by
t.parentid
order
by
t.
name
))) +
CONVERT
(nvarchar(
MAX
), row_number() over(partition
by
t.parentid
order
by
t.
name
)) + N
'@'
from
cte
join
HierarchicalTable t
on
cte.ID = t.ParentID
)
select
*, REPLICATE(
'____'
, LEN([Path]) - LEN(
REPLACE
([Path],
'@'
,
''
)) - 2) +
' '
+ [
name
]
from
cte
order
by
[Path]
Appendix B: Using HierarchyID data type [SQL 2008+]
All the solutions till this point can be execute using any database. as ii mantion above I have use this logic 13 year ago. The only unique element of SQL Server that we used was the CTE and that an be replace by subquery or recursive function. In SQL Server 2008 Microsoft add a new column type, named HierarchyID, that helps solve some of the problems in modeling and querying hierarchical information. This column actually use a very similar logic as my hierarchical column.
The next solution can be execute only on SQL Server version 2008+, as we are going to use a build in column type HierarchyID.
-- Appendix B: Using HierarchyID data type [SQL 2008+]
;
with
MyCTE1
as
(
select
ID,
Name
,ParentID,N
'/'
+
CONVERT
(nvarchar(
MAX
),ID) + N
'/'
as
[Path]
from
HierarchicalTable
where
ParentID = 0
union
all
select
t.ID,t.
Name
,t.ParentID,[Path] +
CONVERT
(nvarchar(
MAX
),t.ID) + N
'/'
from
MyCTE1
join
HierarchicalTable t
on
MyCTE1.ID = t.ParentID
)
select
ID,
Name
,ParentID,[Path],
convert
(hierarchyid ,[Path])
From
MyCTE1
order
by
convert
(hierarchyid ,[Path])
Appendix C: maximum recursion has been exhausted
* By default SQL Server limit the recursive level to 100. If we will try to work with recursive CTE as in our solutions for more then 100 levels then we will get this error massage:
Msg 530, Level 16, State 1, Line X
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
The solution is simply Specify the maxrecursion option at the end of the query:
….
from
XXX
option
(maxrecursion 0)
Resources
This article is just a translation of Ronen Ariely's tutorials, posted about 13 years ago on the site http://k4school.com (old site which is already close for long time).
Tree structure
http://en.wikipedia.org/wiki/Tree_structure
WITH common_table_expression (Transact-SQL)
http://msdn.microsoft.com/en-us/library/ms175972.aspx