en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP 


 


Microsoft® Community Contributor 


Microsoft® Community Contributor


 Read first, before you you use the blog! Maximize
פבר13

Written by: ronen ariely
13/02/2019 00:55 RssIcon

Nothing like a good challenge to start a discussion

STRING_SPLIT is a built-in table-valued function that splits a string into rows based on a separator character.  It was added in SQL Server 2016 after many years that people asked for it.

For example we can split the string 'My Name Is Ronen Ariely' in order to get each word in separate row

DECLARE @Txt NVARCHAR(MAX) = N'My name is Ronen Ariely'
SELECT * FROM string_split(@Txt,' ')

value
My
name
is
Ronen
Ariely

Unfortunately, there is one huge limitation using this function. Like any table The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified, and since this function returns only one column with the separated strings, we have no direct way to sort the data according to the original order of the string (the original sentence in my sample).

Note! there is an open feedback request to add an extra column to the result SET of the function,which will include the order of the separated parts in the original string. You are more than welcome to go and add your vote in the system and maybe influence the future versions of SQL Server.

The Challenge!

I was challenged by someone in the forums to use the function STRING_SPLIT and guarantee the order of the result SET according to the original text.

Note! This is only a theoretical discussion and not a solution for production. If you need to split a string into rows and return the rows in the order of the original text, then you should probably use SQLCLR for the task.

The solution: scenario 1

First solution which I thought about was simply to use the function CHARINDEX in order to find the location of each sub-string in the original string.

declare @s nvarchar(max)= N'My Name Is Ronen Ariely'
SELECT value , CHARINDEX (value,@s,0) Myorder
FROM string_split(@S,' ')
order by CHARINDEX (value,@s,0)
GO

The above solution works great as long as we do not have duplicate sub-strings 

valueMyorder
My1
name4
is9
Ronen12
Ariely18

Unfortunately, this solution will not work if we have duplicate sub-queries, since the function CHARINDEX finds the first location of the sub-string in the original string.

declare @s nvarchar(max)= N'My name Is Ronen Ariely. Remember my name'
SELECT value , CHARINDEX (value,@s,0) Myorder
FROM string_split(@S,' ')
order by CHARINDEX (value,@s,0)
GO

valueMyorder
My1
my1
name4
name4
Is9
Ronen12
Ariely.18
Remember26

Notice that the word "name" exists twice and since CHARINDEX find the first location of the word, both sub-strings got the same order value. Moreover, since I am using a collate which is not case sensitive, the word"My" and the word "my"considered as the same for the function CHARINDEX and also get the same value for the order.

The solution: scenario 2

The solution of this scenario where we have the same word twice is very simple, based on the fact that we know where the word appear first and the function CHARINDEX allow us to search the location from specific location. Therefore, in order to find the location of the second word we simply start the search from after the location of the first word.

For this task I will use the function ROW_NUMBER in order to mark the duplicate words.

declare @s nvarchar(max)= N'My Name Is Ronen Ariely. Remember my name'
;With MyCTE as(
    SELECT value , CHARINDEX (value,@s,0) Myorder
    FROM string_split(@S,' ')
)
SELECT value , Myorder, RN = ROW_NUMBER() OVER (PARTITION BY Myorder ORDER BY Myorder) - 1
    , CHARINDEX (value,@s,Myorder + ROW_NUMBER() OVER (PARTITION BY Myorder ORDER BY Myorder) - 1) FinalOrder
FROM MyCTE
order by FinalOrder
GO

valueMyorderRNFinalOrder
My101
Name404
Is909
Ronen12012
Ariely.18018
Remember26026
my1135
name38038

The solution: scenario 3

Once you understand the logic in my above solution, we can take it to the next level and solve a flexible scenario where words can appear multiple times and not just twice as the above solution.

To solve this case I will simply use a recursive CTE

Note! I an speaking about a simple solution in the sense of "getting the result" which we want, but once more I want to emphasize that this is not a solution for production and the performance of this solution is appalling

declare @s nvarchar(max)= N'My Name Is Ronen Ariely. Remember my name and my name will stay'
;With MyCTE as(
    SELECT value , CHARINDEX (' ' + value + ' ',' ' + @s + ' ',0) c, RN = ROW_NUMBER() OVER (PARTITION BY CHARINDEX (' ' + value + ' ',' ' + @s + ' ',0) ORDER BY CHARINDEX (' ' + value + ' ',' ' + @s + ' ',0)) - 1
    FROM string_split(@S,' ')
),
MyRecor as (
    SELECT MyCTE.value, MyCTE.c, MyCTE.RN, 0 AS MyLevel  , CHARINDEX (' ' + value + ' ',' ' + @s + ' ',c) ci
    FROM MyCTE  
    WHERE RN = 0
    UNION ALL 
    SELECT e.value, e.c, e.RN, MyLevel + 1  , CHARINDEX (' ' + d.value + ' ',' ' + @s + ' ',d.ci + d.c) ci
    FROM MyCTE AS
        INNER JOIN MyRecor AS
        ON e.value = d.value
    where d.RN = MyLevel and e.RN = MyLevel +1
)
SELECT value, ci
from MyRecor
order by ci
GO

valueci
My1
name4
Is9
Ronen12
Ariely.18
Remember26
my35
name38
and43
my47
Name50
will55
stay60

The challenge was done - Conclusions

We can use the function string_split in order to return the sub-string in the original order.It is not recommended but it can be done for the sake of the discussion

I hope this was interesting and I'll see you next time