en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP 


 


Microsoft® Community Contributor 


Microsoft® Community Contributor


 Read first, before you you use the blog! Maximize
מרץ30

Written by: ronen ariely
30/03/2019 23:59 RssIcon


Introduction

The image Above represent the challenge we face. We have a SET of separated slots/ranges and we want to represent the merged ranges which cover all the separated ranges, using a SET of minimum number of ranges. In other words, we want to to Group continuous and overlaps Ranges together. If we have SET of slots/ranges, and two or more of these ranges has shared point/s (a point which exists in both ranges), then these ranges can be merge together into a single range. 

For example if we are dealing with ranges of integers and we have the ranges 3-6 and 5-8, then we can marge these into the single range 3-8, or the ranges 2-4,4-6 can be merged into 2-6.

The most Intuitive solution to developers is base on loops throw the source SET of ranges, and working on the records one-by-one. 

1. We can address the source SET of data as a collection like two dimensional Array:
int[,] SourceArray = new int[,] { { 1, 2 }, { 3, 4 }, { 5, 6 }, { 7, 8 } };
2. Create a new "Merged SET"
int[,] MergedArray = new int[,]
3. Loop throw the source SET of ranges in the source data one-by-one, and calculate the merged SET for each new range that we add.

Basically like the image above shows

This approach can be implemented in tabular databases using loops (for example we can use "while" loop, or cursor)

Unfortunately, tabular database are designed to work with SET of data and not row-by-row. The servers "tries" behind the scenes to use the best algorithms in order to provide best performance for SET of data, and using this approach will probably lead to very poor performance.

In my previous post I covered the solution using the "Gaps and Islands" approach, which is the most common solution using tabular databases like SQL Server.

I HIGHLY recommend to read the previous post BEFORE you continue to read!

There is another trick I use in some cases which I explained in the past several times in the forums (I might elaborate on this approach more in future post). The basic idea is to find a way to represent our Source Ranges as ranges of BITs (zero and one) and use simple built-in Bitwise operations in order to find the Merged Ranges. This task is almost straightforward using the bitwise operator inclusive OR.

In this post I will cover a solution which is based on SQL Server built-in GEOMETRY data type.

Now that I mentioned the word "GEOMETRY"... Please check the image above again, but this time think about the problem in a system as two dimensions axes instead of a one-dimensional horizontal axis.

You can notice that what we are dealing with is a simple case of finding the combined area of multiple rectangles.

Working with POLYGONs

DROP TABLE IF EXISTS GeometryTable
GO
CREATE table GeometryTable(
  geom geometry
)
 
truncate table GeometryTable
INSERT INTO GeometryTable VALUES
(geometry::STPolyFromText('POLYGON(( 2 0,  4 0,  4 10, 2 10, 2 0))', 0)),
(geometry::STPolyFromText('POLYGON(( 5 0,  6 0,  6 10, 5 10, 5 0))', 0)),
(geometry::STPolyFromText('POLYGON((10 0, 15 0, 15 10, 10 10, 10 0))', 0)),
(geometry::STPolyFromText('POLYGON((3 0, 6 0, 6 10, 3 10, 3 0))', 0)) -- this one have overlap area
GO
select geom from GeometryTable
GO

Let's use the built-in tool in SSMS in order to watch the data in a graph (You can use Azure Data Studio as well)

Now we do not need to do any complex procedure in order to merge all these "POLYGON"s into "MultiPolygon". SQL Server has a full built-in support for working with GEOMETRY data types. We can use the function UnionAggregate in order to aggregate all the polygons which is not more complex than using SUM function in numbers.

-- combine all ranges to one MultiPolygon
SELECT Geometry::UnionAggregate(geom)
from GeometryTable
GO

yep... this image present the solution which we want to get, but we are not there yet. Behind the scenes SQL Server still address the input as separated POLYGONs. We can use the method ToString() in order to get the GEOMETRY data type in a simple readable textual format.

SELECT Geometry::UnionAggregate(geom).ToString()
from GeometryTable
GO

The above query will return the value:

MULTIPOLYGON (((10 0, 15 0, 15 10, 10 10, 10 0)), ((2 0, 3 0, 4 0, 5 0, 6 0, 6 10, 5 10, 4 10, 3 10, 2 10, 2 0)))

You can noticed that SQL Server succeed to combine the separated POLYGONs into two POLYGOns as expected, but inside each POLYGON we can see all the points from all the source data. SQL Server does not recognize what the geometric shape is (check the second POLYGON in the result). We as human can notice that the result is a SET of rectangles.

The value "(2 0, 3 0, 4 0, 5 0, 6 0, 6 10, 5 10, 4 10, 3 10, 2 10, 2 0))" which is the the left POLYGON in the image above, can be replaced with the value "(2 0, 6 0, 6 10, 2 10, 2 0))".

THIS IS BASICALLY OUR NEXT STEP

For the sake of the solution we need to have a numbers table (or use a virtual numbers table which we create on-the-fly).

CREATE TABLE Numbers (
    N INT CONSTRAINT Number_PK PRIMARY KEY CLUSTERED(N)
);
WITH
    L0   AS(SELECT 1 AS C UNION ALL SELECT 1 AS O), -- 2 rows
    L1   AS(SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), -- 4 rows
    L2   AS(SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B), -- 16 rows
    L3   AS(SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows
    L4   AS(SELECT 1 AS C FROM L3 AS A CROSS JOIN L3 AS B), -- 65,536 rows
    Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS N FROM L4)
INSERT INTO Numbers
    SELECT TOP 10000 N FROM Nums ORDER BY N;
GO

Let's split the MultiPolygon into the separated Polygons

-- brack the MultiPolygoninto individual Polygons
;With MyCTE as(
    SELECT Geometry::UnionAggregate(geom) as Combined
    from GeometryTable
)
select
    --Combined.STNumGeometries(), -- number of elements
    --Combined.STGeometryN(Numbers.n),
    Combined.STGeometryN(Numbers.n).ToString()
FROM MyCTE
INNER JOIN Numbers ON n <= Combined.STNumGeometries()
GO

Result:

POLYGON ((10 0, 15 0, 15 10, 10 10, 10 0))
POLYGON ((2 0, 3 0, 4 0, 5 0, 6 0, 6 10, 5 10, 4 10, 3 10, 2 10, 2 0))

Next we simply need to convert the values into string -> using string's functions we can clean the un-needed part and get the values of the clean ranges.

Grouping together Continuous Integer-slots

Well, this was all nice and pretty when our source data was set of polygons, but how this can help in case of INTEGERs?

The solution is simple. I can convert the range of integers into a rectangle using the logic: The range will become the bottom line in the rectangle, the height will be constant (for example 10) and now we can close the rectangle with the upper line.  

DROP TABLE IF EXISTS Ranges
GO 
create table Ranges(MyStart int, MyEnd INT)
GO
INSERT Ranges (MyStart,MyEnd) values
    (1,3),
    (5,6),
    (8,11),
    (5,9),
    (9,13)
GO

Now we can use the following format to represent the range x-y:

POLYGON(( x 0,  y 0,  y 10, x 10, x 0))

And in query it will look like bellow:

;With MyCTE01 as (
    select
        CONVERT(VARCHAR(10),MyStart) as s,
        CONVERT(VARCHAR(10),MyEnd) as e
        --
    FROM Ranges
)
select MyPol = 'POLYGON(( ' + s + ' 0,  ' +e+ ' 0,  ' +e+ ' 10, ' +s+ ' 10, ' +s+ ' 0))'
from MyCTE01

Result;

MyPol
-------------------------------------------------------
POLYGON(( 1 0,  3 0,  3 10, 1 10, 1 0))
POLYGON(( 5 0,  6 0,  6 10, 5 10, 5 0))
POLYGON(( 8 0,  11 0,  11 10, 8 10, 8 0))
POLYGON(( 5 0,  9 0,  9 10, 5 10, 5 0))
POLYGON(( 9 0,  13 0,  13 10, 9 10, 9 0))

And now we can use the exact same solution as the one I used for polygons. 

First we will convert the string into POLYGON data type, then will aggregate the rows into multipolygon, and clean the unneeded data from the result.

but why using rectangle instead of simple lines? The answer is that we do not need. This was for the sake of the explanation but now that you got the idea we can move to lines instead of polygons.

;With MyCTE01 as (
    select
        CONVERT(VARCHAR(10),MyStart) as s,
        CONVERT(VARCHAR(10),MyEnd) as e
    FROM Ranges
),
MyCTE02 as (
    select MyPol = 'LINESTRING( ' + s + ' 0,  ' +e+ ' 0)'
    from MyCTE01
)
,MyCTE03 as (
    select geometry::STLineFromText(MyPol,0) as geom
    from MyCTE02
)
,MyCTE04 as(
    SELECT Geometry::UnionAggregate(geom) as Combined
    from MyCTE03
)
,MyCTE05 as(
    select Combined.STGeometryN(Numbers.n) F
    FROM MyCTE04
    INNER JOIN Numbers ON n <= Combined.STNumGeometries()
)
select
    F.STBoundary().STGeometryN(2).STX S,
    F.STBoundary().STGeometryN(1).STX E
FROM MyCTE05
GO

VICTORY!

I have done it ;-)

Grouping together Continuous Time-slots

For this demo I will use the the DDL+DML which was provided in the original question at stackoverflow.

DROP TABLE IF EXISTS TEST
GO
CREATE TABLE TEST (ID int, tFrom datetime, tUntil dateTime)
insert into TEST Values (1,'2019-1-1 12:00', '2019-1-1 13:00')
insert into TEST Values (1,'2019-1-1 13:00', '2019-1-1 14:00')
insert into TEST Values (1,'2019-1-1 14:00', '2019-1-1 16:00')
insert into TEST Values (1,'2019-1-1 18:00', '2019-1-1 19:00')
insert into TEST Values (1,'2019-1-1 19:00', '2019-1-1 20:00')
insert into TEST Values (1,'2019-1-1 20:00', '2019-1-1 21:00')
insert into TEST Values (1,'2019-1-1 22:00', '2019-1-1 23:00')
insert into TEST Values (2,'2019-1-1 12:00', '2019-1-1 13:00')
insert into TEST Values (2,'2019-1-1 13:00', '2019-1-1 14:00')
insert into TEST Values (2,'2019-1-1 14:00', '2019-1-1 16:00')
insert into TEST Values (2,'2019-1-1 18:00', '2019-1-1 19:00')
insert into TEST Values (2,'2019-1-1 19:00', '2019-1-1 20:00')
insert into TEST Values (2,'2019-1-1 20:00', '2019-1-1 21:00')
insert into TEST Values (2,'2019-1-1 22:00', '2019-1-1 23:00')
GO
 
SELECT ID, tFrom, tUntil, DATEDIFF_BIG(second, 0, tFrom) s, DATEDIFF_BIG(second, 0, tUntil) e
FROM TEST

 

;With MyCTE01 as (
    select ID,
        CONVERT(VARCHAR(100),DATEDIFF_BIG(second, 0, tFrom)) as s,
        CONVERT(VARCHAR(100),DATEDIFF_BIG(second, 0, tUntil)) as e
    FROM TEST
),
MyCTE02 as (
    select ID, MyPol = 'LINESTRING( ' + s + ' 0,  ' +e+ ' 0)'
    from MyCTE01
)
,MyCTE03 as (
    select ID, geometry::STLineFromText(MyPol,0) as geom
    from MyCTE02
)
,MyCTE04 as(
    SELECT ID, Geometry::UnionAggregate(geom) as Combined
    from MyCTE03
    GROUP BY ID
)
,MyCTE05 as(
    select ID, Combined.STGeometryN(Numbers.n) F
    FROM MyCTE04
    INNER JOIN (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13))Numbers(n)
        ON n >=1 and n <= Combined.STNumGeometries()
)
,MyCTE06 as (
    select ID,
        F.STBoundary().STGeometryN(2).STX S,
        F.STBoundary().STGeometryN(1).STX E
    FROM MyCTE05
)
SELECT ID, DATEADD(SECOND,CONVERT(BIGINT,S)%60,DATEADD(MINUTE, S/60, 0)) S , DATEADD(SECOND,CONVERT(BIGINT,E)%60,DATEADD(MINUTE, E/60, 0)) E
FROM MyCTE06
GO

 

I hope you like my trick and that this post was useful.

Follow me on Facebook and Join me to my next post