  You are here:

## Blog Archive:

* Can be used in order to search for older blogs Entries

## Search in blogs

 KeywordsPhrase

## Blog Categories:

* Can be used in order to search for blogs Entries by Categories

## Blog Tags:

* Can be used in order to search for blogs by keywords

## TNWikiSummit  Read first, before you you use the blog!  ## Grouping continuous Ranges together: Part 2, Using geometry Data type

מרץ30

Written by: ronen ariely
30/03/2019 23:59  Introduction

The image Above represent the challenge we face. We have a SET of separated slots/ranges and we want to represent the merged ranges which cover all the separated ranges, using a SET of minimum number of ranges. In other words, we want to to Group continuous and overlaps Ranges together. If we have SET of slots/ranges, and two or more of these ranges has shared point/s (a point which exists in both ranges), then these ranges can be merge together into a single range.

For example if we are dealing with ranges of integers and we have the ranges 3-6 and 5-8, then we can marge these into the single range 3-8, or the ranges 2-4,4-6 can be merged into 2-6.

The most Intuitive solution to developers is base on loops throw the source SET of ranges, and working on the records one-by-one.

1. We can address the source SET of data as a collection like two dimensional Array:
int[,] SourceArray = new int[,] { { 1, 2 }, { 3, 4 }, { 5, 6 }, { 7, 8 } };
2. Create a new "Merged SET"
int[,] MergedArray = new int[,]
3. Loop throw the source SET of ranges in the source data one-by-one, and calculate the merged SET for each new range that we add.

Basically like the image above shows

This approach can be implemented in tabular databases using loops (for example we can use "while" loop, or cursor)

Unfortunately, tabular database are designed to work with SET of data and not row-by-row. The servers "tries" behind the scenes to use the best algorithms in order to provide best performance for SET of data, and using this approach will probably lead to very poor performance.

In my previous post I covered the solution using the "Gaps and Islands" approach, which is the most common solution using tabular databases like SQL Server.

I HIGHLY recommend to read the previous post BEFORE you continue to read!

There is another trick I use in some cases which I explained in the past several times in the forums (I might elaborate on this approach more in future post). The basic idea is to find a way to represent our Source Ranges as ranges of BITs (zero and one) and use simple built-in Bitwise operations in order to find the Merged Ranges. This task is almost straightforward using the bitwise operator inclusive OR.

In this post I will cover a solution which is based on SQL Server built-in GEOMETRY data type.

Now that I mentioned the word "GEOMETRY"... Please check the image above again, but this time think about the problem in a system as two dimensions axes instead of a one-dimensional horizontal axis.

You can notice that what we are dealing with is a simple case of finding the combined area of multiple rectangles.

# Working with POLYGONs

`DROP` `TABLE` `IF EXISTS GeometryTable`
`GO`
`CREATE` `table` `GeometryTable(`
`  ``geom geometry`
`)`

`truncate` `table` `GeometryTable`
`INSERT` `INTO` `GeometryTable ``VALUES`
`(geometry::STPolyFromText(``'POLYGON(( 2 0,  4 0,  4 10, 2 10, 2 0))'``, 0)),`
`(geometry::STPolyFromText(``'POLYGON(( 5 0,  6 0,  6 10, 5 10, 5 0))'``, 0)),`
`(geometry::STPolyFromText(``'POLYGON((10 0, 15 0, 15 10, 10 10, 10 0))'``, 0)),`
`(geometry::STPolyFromText(``'POLYGON((3 0, 6 0, 6 10, 3 10, 3 0))'``, 0)) ``-- this one have overlap area`
`GO`
`select` `geom ``from` `GeometryTable`
`GO`

Let's use the built-in tool in SSMS in order to watch the data in a graph (You can use Azure Data Studio as well) Now we do not need to do any complex procedure in order to merge all these "POLYGON"s into "MultiPolygon". SQL Server has a full built-in support for working with GEOMETRY data types. We can use the function UnionAggregate in order to aggregate all the polygons which is not more complex than using SUM function in numbers.

`-- combine all ranges to one MultiPolygon`
`SELECT` `Geometry::UnionAggregate(geom)`
`from` `GeometryTable`
`GO` yep... this image present the solution which we want to get, but we are not there yet. Behind the scenes SQL Server still address the input as separated POLYGONs. We can use the method ToString() in order to get the GEOMETRY data type in a simple readable textual format.

`SELECT` `Geometry::UnionAggregate(geom).ToString()`
`from` `GeometryTable`
`GO`

The above query will return the value:

MULTIPOLYGON (((10 0, 15 0, 15 10, 10 10, 10 0)), ((2 0, 3 0, 4 0, 5 0, 6 0, 6 10, 5 10, 4 10, 3 10, 2 10, 2 0)))

You can noticed that SQL Server succeed to combine the separated POLYGONs into two POLYGOns as expected, but inside each POLYGON we can see all the points from all the source data. SQL Server does not recognize what the geometric shape is (check the second POLYGON in the result). We as human can notice that the result is a SET of rectangles.

The value "(2 0, 3 0, 4 0, 5 0, 6 0, 6 10, 5 10, 4 10, 3 10, 2 10, 2 0))" which is the the left POLYGON in the image above, can be replaced with the value "(2 0, 6 0, 6 10, 2 10, 2 0))".

THIS IS BASICALLY OUR NEXT STEP

For the sake of the solution we need to have a numbers table (or use a virtual numbers table which we create on-the-fly).

`CREATE` `TABLE` `Numbers (`
`    ``N ``INT` `CONSTRAINT` `Number_PK ``PRIMARY` `KEY` `CLUSTERED(N)`
`); `
`WITH`
`    ``L0   ``AS``(``SELECT` `1 ``AS` `C ``UNION` `ALL` `SELECT` `1 ``AS` `O), ``-- 2 rows`
`    ``L1   ``AS``(``SELECT` `1 ``AS` `C ``FROM` `L0 ``AS` `A ``CROSS` `JOIN` `L0 ``AS` `B), ``-- 4 rows`
`    ``L2   ``AS``(``SELECT` `1 ``AS` `C ``FROM` `L1 ``AS` `A ``CROSS` `JOIN` `L1 ``AS` `B), ``-- 16 rows`
`    ``L3   ``AS``(``SELECT` `1 ``AS` `C ``FROM` `L2 ``AS` `A ``CROSS` `JOIN` `L2 ``AS` `B), ``-- 256 rows`
`    ``L4   ``AS``(``SELECT` `1 ``AS` `C ``FROM` `L3 ``AS` `A ``CROSS` `JOIN` `L3 ``AS` `B), ``-- 65,536 rows`
`    ``Nums ``AS``(``SELECT` `ROW_NUMBER() OVER(``ORDER` `BY` `(``SELECT` `NULL``)) ``AS` `N ``FROM` `L4)`
`INSERT` `INTO` `Numbers `
`    ``SELECT` `TOP` `10000 N ``FROM` `Nums ``ORDER` `BY` `N;`
`GO`

Let's split the MultiPolygon into the separated Polygons

`-- brack the MultiPolygoninto individual Polygons`
`;``With` `MyCTE ``as``(`
`    ``SELECT` `Geometry::UnionAggregate(geom) ``as` `Combined`
`    ``from` `GeometryTable`
`)`
`select`
`    ``--Combined.STNumGeometries(), -- number of elements`
`    ``--Combined.STGeometryN(Numbers.n),`
`    ``Combined.STGeometryN(Numbers.n).ToString()`
`FROM` `MyCTE`
`INNER` `JOIN` `Numbers ``ON` `n <= Combined.STNumGeometries()`
`GO`

Result:

`POLYGON ((10 0, 15 0, 15 10, 10 10, 10 0))`
`POLYGON ((2 0, 3 0, 4 0, 5 0, 6 0, 6 10, 5 10, 4 10, 3 10, 2 10, 2 0))`

Next we simply need to convert the values into string -> using string's functions we can clean the un-needed part and get the values of the clean ranges.

# Grouping together Continuous Integer-slots

Well, this was all nice and pretty when our source data was set of polygons, but how this can help in case of INTEGERs?

The solution is simple. I can convert the range of integers into a rectangle using the logic: The range will become the bottom line in the rectangle, the height will be constant (for example 10) and now we can close the rectangle with the upper line.

`DROP` `TABLE` `IF EXISTS Ranges`
`GO  `
`create` `table` `Ranges(MyStart ``int``, MyEnd ``INT``)`
`GO`
`INSERT` `Ranges (MyStart,MyEnd) ``values`
`    ``(1,3),`
`    ``(5,6),`
`    ``(8,11),`
`    ``(5,9),`
`    ``(9,13)`
`GO`

Now we can use the following format to represent the range x-y:

POLYGON(( x 0,  y 0,  y 10, x 10, x 0))

And in query it will look like bellow:

`;``With` `MyCTE01 ``as` `(`
`    ``select`
`        ``CONVERT``(``VARCHAR``(10),MyStart) ``as` `s,`
`        ``CONVERT``(``VARCHAR``(10),MyEnd) ``as` `e`
`        ``--`
`    ``FROM` `Ranges`
`)`
`select` `MyPol = ``'POLYGON(( '` `+ s + ``' 0,  '` `+e+ ``' 0,  '` `+e+ ``' 10, '` `+s+ ``' 10, '` `+s+ ``' 0))'`
`from` `MyCTE01`

Result;

`MyPol`
`-------------------------------------------------------`
`POLYGON(( 1 0,  3 0,  3 10, 1 10, 1 0))`
`POLYGON(( 5 0,  6 0,  6 10, 5 10, 5 0))`
`POLYGON(( 8 0,  11 0,  11 10, 8 10, 8 0))`
`POLYGON(( 5 0,  9 0,  9 10, 5 10, 5 0))`
`POLYGON(( 9 0,  13 0,  13 10, 9 10, 9 0))`

And now we can use the exact same solution as the one I used for polygons.

First we will convert the string into POLYGON data type, then will aggregate the rows into multipolygon, and clean the unneeded data from the result.

but why using rectangle instead of simple lines? The answer is that we do not need. This was for the sake of the explanation but now that you got the idea we can move to lines instead of polygons.

`;``With` `MyCTE01 ``as` `(`
`    ``select`
`        ``CONVERT``(``VARCHAR``(10),MyStart) ``as` `s,`
`        ``CONVERT``(``VARCHAR``(10),MyEnd) ``as` `e`
`    ``FROM` `Ranges`
`),`
`MyCTE02 ``as` `(`
`    ``select` `MyPol = ``'LINESTRING( '` `+ s + ``' 0,  '` `+e+ ``' 0)'`
`    ``from` `MyCTE01`
`)`
`,MyCTE03 ``as` `(`
`    ``select` `geometry::STLineFromText(MyPol,0) ``as` `geom`
`    ``from` `MyCTE02`
`)`
`,MyCTE04 ``as``(`
`    ``SELECT` `Geometry::UnionAggregate(geom) ``as` `Combined`
`    ``from` `MyCTE03`
`)`
`,MyCTE05 ``as``(`
`    ``select` `Combined.STGeometryN(Numbers.n) F`
`    ``FROM` `MyCTE04`
`    ``INNER` `JOIN` `Numbers ``ON` `n <= Combined.STNumGeometries()`
`)`
`select`
`    ``F.STBoundary().STGeometryN(2).STX S,`
`    ``F.STBoundary().STGeometryN(1).STX E`
`FROM` `MyCTE05`
`GO`

VICTORY!

I have done it ;-)

# Grouping together Continuous Time-slots

For this demo I will use the the DDL+DML which was provided in the original question at stackoverflow.

`DROP` `TABLE` `IF EXISTS TEST`
`GO`
`CREATE` `TABLE` `TEST (ID ``int``, tFrom datetime, tUntil dateTime)`
`insert` `into` `TEST ``Values` `(1,``'2019-1-1 12:00'``, ``'2019-1-1 13:00'``)`
`insert` `into` `TEST ``Values` `(1,``'2019-1-1 13:00'``, ``'2019-1-1 14:00'``)`
`insert` `into` `TEST ``Values` `(1,``'2019-1-1 14:00'``, ``'2019-1-1 16:00'``)`
`insert` `into` `TEST ``Values` `(1,``'2019-1-1 18:00'``, ``'2019-1-1 19:00'``)`
`insert` `into` `TEST ``Values` `(1,``'2019-1-1 19:00'``, ``'2019-1-1 20:00'``)`
`insert` `into` `TEST ``Values` `(1,``'2019-1-1 20:00'``, ``'2019-1-1 21:00'``)`
`insert` `into` `TEST ``Values` `(1,``'2019-1-1 22:00'``, ``'2019-1-1 23:00'``)`
`insert` `into` `TEST ``Values` `(2,``'2019-1-1 12:00'``, ``'2019-1-1 13:00'``)`
`insert` `into` `TEST ``Values` `(2,``'2019-1-1 13:00'``, ``'2019-1-1 14:00'``)`
`insert` `into` `TEST ``Values` `(2,``'2019-1-1 14:00'``, ``'2019-1-1 16:00'``)`
`insert` `into` `TEST ``Values` `(2,``'2019-1-1 18:00'``, ``'2019-1-1 19:00'``)`
`insert` `into` `TEST ``Values` `(2,``'2019-1-1 19:00'``, ``'2019-1-1 20:00'``)`
`insert` `into` `TEST ``Values` `(2,``'2019-1-1 20:00'``, ``'2019-1-1 21:00'``)`
`insert` `into` `TEST ``Values` `(2,``'2019-1-1 22:00'``, ``'2019-1-1 23:00'``)`
`GO`

`SELECT` `ID, tFrom, tUntil, DATEDIFF_BIG(``second``, 0, tFrom) s, DATEDIFF_BIG(``second``, 0, tUntil) e`
`FROM` `TEST`

`;``With` `MyCTE01 ``as` `(`
`    ``select` `ID,`
`        ``CONVERT``(``VARCHAR``(100),DATEDIFF_BIG(``second``, 0, tFrom)) ``as` `s,`
`        ``CONVERT``(``VARCHAR``(100),DATEDIFF_BIG(``second``, 0, tUntil)) ``as` `e`
`    ``FROM` `TEST`
`),`
`MyCTE02 ``as` `(`
`    ``select` `ID, MyPol = ``'LINESTRING( '` `+ s + ``' 0,  '` `+e+ ``' 0)'`
`    ``from` `MyCTE01`
`)`
`,MyCTE03 ``as` `(`
`    ``select` `ID, geometry::STLineFromText(MyPol,0) ``as` `geom`
`    ``from` `MyCTE02`
`)`
`,MyCTE04 ``as``(`
`    ``SELECT` `ID, Geometry::UnionAggregate(geom) ``as` `Combined`
`    ``from` `MyCTE03`
`    ``GROUP` `BY` `ID`
`)`
`,MyCTE05 ``as``(`
`    ``select` `ID, Combined.STGeometryN(Numbers.n) F`
`    ``FROM` `MyCTE04`
`    ``INNER` `JOIN` `(``VALUES` `(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13))Numbers(n) `
`        ``ON` `n >=1 ``and` `n <= Combined.STNumGeometries()`
`)`
`,MyCTE06 ``as` `(`
`    ``select` `ID,`
`        ``F.STBoundary().STGeometryN(2).STX S,`
`        ``F.STBoundary().STGeometryN(1).STX E`
`    ``FROM` `MyCTE05`
`)`
`SELECT` `ID, DATEADD(``SECOND``,``CONVERT``(``BIGINT``,S)%60,DATEADD(``MINUTE``, S/60, 0)) S , DATEADD(``SECOND``,``CONVERT``(``BIGINT``,E)%60,DATEADD(``MINUTE``, E/60, 0)) E`
`FROM` `MyCTE06`
`GO`

I hope you like my trick and that this post was useful.

Location: Blogs Ronen Ariely Public blog