אוק25
Written by:
ronen ariely
25/10/2020 08:53 
Introduction
Before the release of SQL Server 2016, one of the most required solution was for a function which split string. There are many solutions you can find online. Most of these are based on loops that provide poor performance, using XML, or using SQLCLR which in my test provides the best option.
After many years of requests for such a function, Microsoft has finally provided us with a new function called STRING_SPLIT in SQL Server 2016, which should solve our need to split a string. Unfortunately this solution is a half solution at best!
In most cases we need a solution which guarantee the order of the result set. The STRING_SPLIT does not provide this. Thus from 2016, the requests to provide a function for splitting string, has been replaced by shouts to improve the existing function by adding column which present the order of the text.
Meanwhile, most users continue to work with the old solutions that provide poor performance, except for the SQLCLR user solution which provides excellent performance but is complex to implement for most users.
I am wondering why in 2019 so many people are fixed to use the "official function" of splitting string STRING_SPLIT and when it does not fit, thenimmediately jump to old solutions which provide poor performance. I just came across another question in the Microsoft QnA forum that is related to this topic, and a google search did not yield the simple solution which is right under our eyes.
So.. what that "perfect solution"?!?
I do not like to use the term perfect since each solution might fit best for different scenario, but for most cases all we need to do is using a different approach using another function which was presented in SQL Server 2016 named OPENJSON.
Using STRING_SPLIT
This solution only fit for cases we do not care about the order of the result set!
-- Using string_split
declare
@String NVARCHAR(1000)
SET
@String =
'This@Is@My@String@To@Split'
SELECT
*
from
STRING_SPLIT (@String,
'@'
)
GO
-- The problem is that the order of the result is not guaranteed
More technics to split the string without order guarantees can be found here: https://sqlperformance.com/2016/03/sql-server-2016/string-split
But we want to have the order guaranteed! So lets check the solutions
Using XML
Option one
declare
@String NVARCHAR(1000)
SET
@String =
'This@Is@My@String@To@Split'
DECLARE
@x XML
SET
@X =
cast
(
''
+
replace
(@String,
'@'
,
'
'
) +
'
'
AS
XML)
SELECT
@X
-- T.c is the node and it has order in the node() function
select
T.c.value(
'.'
,
'varchar(max)'
)
as
q, row_number() over(
order
by
T.c)
as
rn
from
@X.nodes(
'i'
) T(c)
The order of the nodes returned by the method nodes() is guaranteed but we must have a column in the result set which include the order value or the result of the total query will not be guarantee since without ORDER BY the order is not guaranteed.
Option two
For this solution we will use numbers table
;
WITH
cteN(Number)
AS
(
SELECT
ROW_NUMBER() OVER (
ORDER
BY
(
select
null
)) - 1
FROM
sys.all_columns
AS
s1
CROSS
JOIN
sys.all_columns
AS
s2
)
SELECT
[Number]
INTO
dbo.Numbers
FROM
cteN
WHERE
[Number] <= 1000000
GO
CREATE
UNIQUE
CLUSTERED
INDEX
CIX_Number
ON
dbo.Numbers([Number])
WITH
(
FILLFACTOR = 100,
-- in the event server default has been changed
DATA_COMPRESSION = ROW
-- if Enterprise & table large enough to matter
);
GO
and the solution is
declare
@String NVARCHAR(1000)
SET
@String =
'This@Is@My@String@To@Split'
DECLARE
@x XML
SET
@X =
cast
(
''
+
replace
(@String,
'@'
,
'
'
) +
'
'
AS
XML)
select
T.c.value(
'.'
,
'varchar(max)'
)
as
q, N.Number
from
Numbers
as
N
cross
apply @X.nodes(
'i[sql:column("N.Number")]'
)
as
T(c)
where
N.Number
between
1
and
@X.value(
'count(i)'
,
'int'
)
Using OPENJSON to the rescue
-- Using OPENJSON We can split with order guaranteed
declare
@String NVARCHAR(1000)
SET
@String =
'This@Is@My@String@To@Split'
SELECT
[
key
],[value]
from
OPENJSON (
'["'
+
REPLACE
(@String,
'@'
,
'","'
) +
'"]'
)
GO
-- and we have the "key" which porovide a guaranteed order
Let's take this approach to the next level and show some more complex solutions...
Splitting Strings in Multiple Columns and keeping the same order
I need a better title for this... Any how let me explain what is our goal
Scenario
We have a table with multiple columns that each stores a string. We need to split these strings into rows of substrings, based on a specified separator character. But we must keep the same order of substrings o fit other columns order of substrings.
DROP
TABLE
IF EXISTS T1
GO
CREATE
table
T1 (sno
int
,sname
varchar
(200),sname1
varchar
(200))
GO
Insert
into
T1(sno,sname,sname1)
Values
(1,
'1A'
,
'1a'
), (1,
'2A|2B'
,
'2a|2b'
),
(1,
'3A|3B|3C'
,
'3a|3b|3c'
), (1,
'4A|4B|4C|4D'
,
'4a|4b|4c|4d'
)
GO
SELECT
*
FROM
T1
GO

For example, the second row in the above table should produce 2 rows in the result set since in column sname and column sname1 we have a string with a single separator character "|". This can achieved using STRING_SPLIT if the order was not important but in opur case we want in the new rows to have the value "2A" together with the value "2a" and the values "2B" mnust come in the same row as the value "2b" since these substring locates in the same order in the original string
The Solution
We can use OPENJSON to split the each column and since OPENJSON provide a key which present the order of the substring, we can use simple filter to chose have the right substring from each source string in the same order.
;
With
MyCTE0
as
(
select
sno, sname,sname1
,s =
'["'
+
Replace
(sname,
'|'
,
'","'
) +
'"]'
,s1 =
'["'
+
Replace
(sname1,
'|'
,
'","'
) +
'"]'
from
T1
),
MyCTE1
as
(
select
sno, s, s1
, k1 = t.[
key
], v1 = t.[value]
from
MyCTE0
CROSS
APPLY OPENJSON (s, N
'$'
) t
)
SELECT
sno, v1 , v2
FROM
MyCTE1
CROSS
APPLY (
SELECT
t1.[
key
] k2 , t1.[value] v2
FROM
OPENJSON (s1, N
'$'
) t1
where
t1.[
key
] = MyCTE1.k1) t
GO

So, as you can see the solution is not complex and the performance are OK, but who wants to compromise on OK?!? I want the best 🤣
Well... to be true, when you work with SQL Server then you always must remember that for best solution, One has to choose the right solution for a particular situation and there is no one solution that will always be the best. The answer for most most question is: It is It depend
Using the solution above might fit for most cases but when dealing with large amount of data with many rows and when the text we need to split might might include a lot of sub parts, then we need a bit different approach...
Using Temporarily table
The trigger for writing this post was a question asked at Microsoft QnA forum. It is highly recommended to check the thread. There are many interesting comment and several solutions which were presented there. A test which was done on the different solution shows that my query run almost 20 times faster then other solution which was based on looping.
DROP
TABLE
IF EXISTS #t1
DROP
TABLE
IF EXISTS #t2
DROP
TABLE
IF EXISTS #temp4
go
DECLARE
@d datetime2 = sysdatetime()
CREATE
TABLE
#t1 (id
int
NOT
NULL
,
pos
int
NOT
NULL
,
ShipName nvarchar(60)
NOT
NULL
,
PRIMARY
KEY
(id, pos))
CREATE
TABLE
#t2 (id
int
NOT
NULL
,
pos
int
NOT
NULL
,
ShipAddress nvarchar(60)
NOT
NULL
,
PRIMARY
KEY
(id, pos))
INSERT
#t1(id, pos, ShipName)
SELECT
t.id,[
key
] listpos ,[value] nstr
from
PipeData t
CROSS
APPLY OPENJSON (
'["'
+
REPLACE
(t.ShipName,
'|'
,
'","'
) +
'"]'
)
AS
s
INSERT
#t2(id, pos, ShipAddress)
SELECT
t.id,[
key
] listpos ,[value] nstr
from
PipeData t
CROSS
APPLY OPENJSON (
'["'
+
REPLACE
(t.ShipAddress,
'|'
,
'","'
) +
'"]'
)
AS
s
SELECT
t1.id, t1.ShipName, t2.ShipAddress
as
str
INTO
#temp4
FROM
#t1 t1
JOIN
#t2 t2
ON
t1.id = t2.id
AND
t1.pos = t2.pos
SELECT
datediff(ms, @d, sysdatetime())
go
Read More