en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP 


 


Microsoft® Community Contributor 


Microsoft® Community Contributor


Read first, before you you use the blog

Maximize
מרץ10

Written by: ronen ariely
10/03/2018 04:43 RssIcon

FILTERING JUNCTION TABLE BY MULTIPLE VALUS


Background

When we are working with Many-To-Many relationship between two entity tables, we often need to find all the entities (rows in one table) which have relations with specific list of entities from the second type (rows in the second entity table).

For the sake of this discussion, I will use the same tables, which I used for a Many-To-Many post, that I published in the past. I highly recommend to use this opportunity and go over the other post first and confirm that you understand how Many-To-Many relationship implemented in relational database.

* The current post dive into a specific case which was asked in the forums.

Filtering junction table by multiple valus

Our goal is to get all the Users that related to all specific roles. For example, all the users that have both RoleId 2 and RoleId 4.

Note! For the sake of this section we do not need to query the entities tables (USERs and ROLEs), but only the junction table Ari_UsersRoles_Tbl. Tou can get the script to create the table and insert the sample data here. Once we’ll get the user ID which fit our rules, we will be able to use simple JOIN on the result together with the user table, in order to get users names for example – this is simple JOIN operation, and out of the scope of this post.

Approach 1: sub-queries and using "not exists"

Select distinct UserId
From Ari_UsersRoles_Tbl U
Where Not Exists (
    Select  *
    From    (Values (2),(4)) T(RoleId)
    Where   Not Exists (
        Select *
        From Ari_UsersRoles_Tbl UD
        Where   UD.UserID=U.UserID And UD.RoleId=T.RoleId
        )
    );

* This solution was presented in the forum by Geri Reshef. Thanks for the contribution!

If anyone have another approach to solve the issue, you are welcome to send me throw Facebook.

    

Approach 2: Using Bitwise

Bitwise operators perform bit manipulations between two expressions. Bitwise operators perform a procedure that first convert the two input integer values to binary bits, next it performs the operation (AND, OR, or NOT) on each bit, and finally producing a result converted back to integer.

Let’s look at a table with one row for each user, and one column for each optional rule. I will fill the table according to our data. We can give the value 0 to each role that is not related to that user, and we will give the value 1 to rules that related to the user, as can be seen in the table below.

UserName

RoleId 1

RoleId 2

RoleId 3

RoleId 4

RoleId 5

RoleId 6

RoleId 7

RoleId 8

Ronen

1

1

1

1

0

0

0

0

Ariely

1

0

1

0

1

1

0

1

Pituach

1

0

1

0

1

0

1

1

You can notice that we have a series of zeros and ones for each user, which can be represented as a binary value. we can represent the entire information of each user as a binary number.

For example, the user Ronen has the binary value 11110000, the user Ariely get the value 10101101, and the user Pituach get the value 10101011.

We are looking for User that has both RoleId 1 and RoleId 2. In the same way we did above, we can represent the series of rules which we want to find in the table below.

RoleId 1

RoleId 2

RoleId 3

RoleId 4

RoleId 5

RoleId 6

RoleId 7

RoleId 8

0

1

0

1

0

0

0

0

Which gives us the binary value 01010000

* You can notice that the only user which has values 1 in RoleId 2 and 1 in RoleId 4 is Ronen.

Using a simple Bitwise operator AND we can check which user has the exact rules we are looking for.

Let’s see the solution:

-- We declare an integer which represent binary value with 1 in position 2 and 4
-- * The first line is the only point in the script which we will need to change
--   if we will need to fins a different set of rules
declare @RoleId bigint = (power(2,2) + power(2,4) )
;with MyCte as (
    select
        UserId,
        SUM(power(2,RoleId)) s
    from Ari_UsersRoles_Tbl
    group by UserId
)
select UserId--,s, s & @RoleId
from MyCte
where s & @RoleId = @RoleId

Limitation and improvement!

  • This solution fits only if the maximum RoleId value is not above 30, since we are using the function “select power(2, RoleId)”, which returns type INT.
  • We can replace this point in the code with “power(CONVERT(BIGINT,2),CONVERT(BIGINT, RoleId))“, which will return BIGINT and therefore allow us to use RoleId 62 and bellow.
  • If we have values bigger then 62, this approach (based on Bitwise) can still work well with a small tweak, as long as the number of rules we are looking for, is not more then 62. In this case we can simply map the RoleId whuich we are looking for to new numbers, and these whoe we do not need we will map to zero. We can simply to use a CASE statement in order to dynamically represent each role with new value starting from 2^0 = 1, 2^1 = 2, 2^2 = 4… and so on…
declare @RoleId bigint = (power(2,0) + power(2,1) ) -- Don’t forget to map the roles we search in the same way
;with MyCte1 as (
    select distinct
        UserId,
        NewRoleId = CASE
            when RoleId = 2 then 0 -- POWER(2,0)
            when RoleId = 4 then 1
            -- If we need more roles then we simply map these to new numbers 0,1,2,4,8,16,32,64, and so on
           else 0
        END
    from Ari_UsersRoles_Tbl
    -- having this filter if we have indes and updated statistics can improve the performance a lot
    -- there is no reason to work on the entire data when we can filter most in the start
    -- where RoleId in (2,4)
),
MyCte2 as (
    select UserId,SUM(power(2,NewRoleId)) s
    from MyCte1
    group by UserId
)
select UserId
from MyCte2
where s & @RoleId = @RoleId
GO

     

Approach 3: Filter the data, Use Distinc, and group by the UserId

This is probably the simples query. By filtering the rows and using only these who include the RoleId we are searching, and using Distinct at the same time, we insure that each UserId get only one value for each of the RoleId we need. The SUM of these values for each UserId will be equal to the SUM of all the RoleId we need only of all these roles will be related to the user. Therefore, we can use this simple solution:

;with MyCTE as (
    select distinct
        UserId, sum(RoleId) s
    from Ari_UsersRoles_Tbl
    where RoleId in (2,4)
    group by UserId
)
select UserId
from MyCTE
where s = 6
GO


   

Perrformance

So… which solution should we use? Like always the answer is “it depends”. I will go over these three solutions (and these are not the only options we have) and check the performance in several specific case with different indexes.


Without indexs

The first case id using the structure as presented in the previous post, without any indexes.

Query 1: sub-queries

Query 2.1: Using Bitwise

Query 2.2: Using Bitwise with re-mapping the RoleId

Query 3: Direct approach



* The images above were taken from the new database management studio named SQL Server Operations Studio. Unfortunately, it does not include the feature to show relative cost of each query when execute several queries together. therefore, I toke the next images from SQL Server Management Studio.

With NONCLUSTERED INDEX ON Ari_UsersRoles_Tbl (UserId,RoleId)

CREATE NONCLUSTERED INDEX IX_Ari_UsersRoles_Tbl_UserId_RoleId
ON Ari_UsersRoles_Tbl (UserId,RoleId);

This index improves the first query a bit, but IT IMPROVE THE SECOND AND THE LAST QUERIES DRAMATICALLY! The reason is that it fits the order we use the data, and therefore it removes the need of sorting the data. If you will check the execution plan presented above (without indexes) then you can notice that the cost of the sorting was the most expensive operation in the execution, which explain why we can get so much improvement simply by adding the right index.

Check the differences between the first query and the second!
It seems like the first query cost three times more resources then my solution in the second query.

* This index does not help ti the third query. The query dose not executed worse and it does not cost more that without the index, but relatively to other queries it looks much worse now, since the index improved the other queries execution.

With Index NONCLUSTERED INDEX ON Ari_UsersRoles_Tbl (UserId)

DROP INDEX IX_Ari_UsersRoles_Tbl_UserId_RoleId
    ON Ari_UsersRoles_Tbl;
CREATE NONCLUSTERED INDEX IX_Ari_UsersRoles_Tbl_UserId_RoleId
ON Ari_UsersRoles_Tbl (UserId);

This index removes the need of sorting in the first and second queries, but since the index does not include the data for the RoleId, the second query need to get this data out of the index. The execution plan was changed a lot in the second query.

If we will create the same index but include the RoleId, then the result will be resemble to using the index on both column, and the second query will be able to use the index in order to get all the data it needs - the performance will reduce according like in the first index. 

Conclusion

In this post we show several options to get the users that has specific list of roles. It is pretty clear that using Bitwise and represent a list of roles as a single binary value, is a great trick. Using the right index, we can get by far much better performance regarding other solutions which were tested.

     

See also

  • A short post I published, focusing only on implementing Many-To-Many relationship 
  • a nice deep tutorial on relational Database relationships including Many-to-many explanation.

      

Tags: SQL , sql server
Categories: SQL
Location: Blogs Parent Separator Public blog