en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP

 


Microsoft® Community Contributor 


Microsoft® Community Contributor


 Read first, before you you use the blog! Maximize

Recent Entries

Minimize
אוק25

Written by: ronen ariely
25/10/2020 08:53 RssIcon

Introduction

Before the release of SQL Server 2016, one of the most required solution was for a function which split string. There are many solutions you can find online. Most of these are based on loops that provide poor performance, using XML, or using SQLCLR which in my test provides the best option.

After many years of requests for such a function, Microsoft has finally provided us with a new function called STRING_SPLIT in SQL Server 2016, which should solve our need to split a string. Unfortunately this solution is a half solution at best!

In most cases we need a solution which guarantee the order of the result set. The STRING_SPLIT does not provide this. Thus from 2016, the requests to provide a function for splitting string, has been replaced by shouts to improve the existing function by adding column which present the order of the text.

Meanwhile, most users continue to work with the old solutions that provide poor performance, except for the SQLCLR user solution which provides excellent performance but is complex to implement for most users.

I am wondering why in 2019 so many people are fixed to use the "official function" of splitting string STRING_SPLIT and when it does not fit, thenimmediately jump to old solutions which provide poor performance. I just came across another question in the Microsoft QnA forum that is related to this topic, and a google search did not yield the simple solution which is right under our eyes. 

So.. what that "perfect solution"?!?

I do not like to use the term perfect since each solution might fit best for different scenario, but for most cases all we need to do is using a different approach using another function which was presented in SQL Server 2016 named OPENJSON.

 

Using STRING_SPLIT

This solution only fit for cases we do not care about the order of the result set!

-- Using string_split
declare @String NVARCHAR(1000)
SET @String = 'This@Is@My@String@To@Split'
SELECT * from STRING_SPLIT (@String, '@')
GO -- The problem is that the order of the result is not guaranteed

More technics to split the string without order guarantees can be found here: https://sqlperformance.com/2016/03/sql-server-2016/string-split

But we want to have the order guaranteed! So lets check the solutions

 

Using XML

Based on solution by Mikael Eriksson at stackoverflow

Option one

declare @String NVARCHAR(1000)
SET @String = 'This@Is@My@String@To@Split'
DECLARE @x XML
SET @X = cast('' + replace(@String, '@', '') + '' AS XML)
SELECT @X
-- T.c is the node and it has order in the node() function
select T.c.value('.', 'varchar(max)') as q, row_number() over(order by T.c) as rn
from @X.nodes('i') T(c)

The order of the nodes returned by the method nodes() is guaranteed but we must have a column in the result set which include the order value or the result of the total query will not be guarantee since without ORDER BY the order is not guaranteed.

 

Option two

For this solution we will use numbers table

;WITH cteN(Number) AS(
  SELECT ROW_NUMBER() OVER (ORDER BY (select null)) - 1
  FROM sys.all_columns AS s1
  CROSS JOIN sys.all_columns AS s2
)
SELECT [Number] INTO dbo.Numbers
FROM cteN WHERE [Number] <= 1000000
GO
CREATE UNIQUE CLUSTERED INDEX CIX_Number ON dbo.Numbers([Number])
WITH (
  FILLFACTOR = 100,      -- in the event server default has been changed
  DATA_COMPRESSION = ROW -- if Enterprise & table large enough to matter
);
GO

and the solution is

declare @String NVARCHAR(1000)
SET @String = 'This@Is@My@String@To@Split'
DECLARE @x XML
SET @X = cast('' + replace(@String, '@', '') + '' AS XML)
select T.c.value('.','varchar(max)') as q, N.Number
from Numbers as N
cross apply @X.nodes('i[sql:column("N.Number")]') as T(c)
where N.Number between 1 and @X.value('count(i)', 'int')


 

Using OPENJSON to the rescue

-- Using OPENJSON We can split with order guaranteed
declare @String NVARCHAR(1000)
SET @String = 'This@Is@My@String@To@Split'
SELECT [key],[value] from OPENJSON ('["' + REPLACE(@String,'@','","') + '"]')
GO -- and we have the "key" which porovide a guaranteed order

Let's take this approach to the next level and show some more complex solutions...

 

Splitting Strings in Multiple Columns and keeping the same order

I need a better title for this... Any how let me explain what is our goal

 

Scenario

We have a table with multiple columns that each stores a string. We need to split these strings into rows of substrings, based on a specified separator character. But we must keep the same order of substrings o fit other columns order of substrings.

DROP TABLE IF EXISTS T1
 GO
 CREATE table T1 (sno int,sname varchar(200),sname1 varchar(200))
 GO
     
 Insert into T1(sno,sname,sname1)
  Values
  (1,'1A','1a'), (1,'2A|2B','2a|2b'),
  (1,'3A|3B|3C','3a|3b|3c'), (1,'4A|4B|4C|4D','4a|4b|4c|4d')
 GO
     
 SELECT * FROM T1
 GO

For example, the second row in the above table should produce 2 rows in the result set since in column sname and column sname1 we have a string with a single separator character "|". This can achieved using STRING_SPLIT if the order was not important but in opur case we want in the new rows to have the value "2A" together with the value "2a" and the values "2B" mnust come in the same row as the value "2b" since these substring locates in the same order in the original string

 

The Solution

We can use OPENJSON to split the each column and since OPENJSON provide a key which present the order of the substring, we can use simple filter to chose have the right substring from each source string in the same order.

;With MyCTE0 as (
    select sno, sname,sname1
        ,s = '["' + Replace(sname, '|', '","') + '"]'
        ,s1 = '["' + Replace(sname1, '|', '","') + '"]'
    from T1
),
MyCTE1 as (
    select sno, s, s1
        , k1 = t.[key], v1 = t.[value]
    from MyCTE0
    CROSS APPLY OPENJSON (s, N'$') t
)
SELECT sno, v1 , v2
FROM MyCTE1
CROSS APPLY (SELECT t1.[key] k2 , t1.[value] v2 FROM OPENJSON (s1, N'$') t1 where t1.[key] = MyCTE1.k1) t
GO

 

So, as you can see the solution is not complex and the performance are OK, but who wants to compromise on OK?!? I want the best 🤣

Well... to be true, when you work with SQL Server then you always must remember that for best solution, One has to choose the right solution for a particular situation and there is no one solution that will always be the best. The answer for most most question is: It is It depend

Using the solution above might fit for most cases but when dealing with large amount of data with many rows and when the text we need to split might  might include a lot of sub parts, then we need a bit different approach...

Using Temporarily table

The trigger for writing this post was a question asked at Microsoft QnA forum. It is highly recommended to check the thread. There are many interesting comment and several solutions which were presented there. A test which was done on the different solution shows that my query run almost 20 times faster then other solution which was based on looping.

DROP TABLE IF EXISTS #t1
 DROP TABLE IF EXISTS #t2
 DROP TABLE IF EXISTS #temp4
 go
     
     
 DECLARE @d datetime2 = sysdatetime()
 CREATE TABLE #t1 (id  int NOT NULL,
                   pos int NOT NULL,
                   ShipName nvarchar(60) NOT NULL,
                   PRIMARY KEY (id, pos))
 CREATE TABLE #t2 (id  int NOT NULL,
                   pos int NOT NULL,
                   ShipAddress nvarchar(60) NOT NULL,
                   PRIMARY KEY (id, pos))
     
 INSERT #t1(id, pos, ShipName)
  SELECT t.id,[key] listpos ,[value] nstr
  from PipeData t
  CROSS APPLY OPENJSON ('["' + REPLACE(t.ShipName,'|','","') + '"]') AS s
     
 INSERT #t2(id, pos, ShipAddress)
  SELECT t.id,[key] listpos ,[value] nstr
  from PipeData t
  CROSS APPLY OPENJSON ('["' + REPLACE(t.ShipAddress,'|','","') + '"]') AS s
     
 SELECT t1.id, t1.ShipName, t2.ShipAddress as str
 INTO   #temp4
 FROM   #t1 t1
 JOIN   #t2 t2 ON t1.id = t2.id
              AND t1.pos = t2.pos
     
 SELECT datediff(ms, @d, sysdatetime())
 go

 


Read More