en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP

 


Microsoft® Community Contributor 


Microsoft® Community Contributor


 Read this before you use the blog! Maximize

Recent Entries

Minimize
מרץ20

Written by: ronen ariely
20/03/2019 22:40 RssIcon

Good day guys,

This post summarize my lecture from last week at PASS DBA Virtual Group.

Usually at the end of my lecture I thanks the audience and all these who came and listened, but here before we start I have the opportunity to thanks Sharon Dooley (the Virtual Group Leader) and obviously to Paresh Motiwala who hosted the meeting. Thanks guys for inviting me to speak.

Introduction

Understanding what we have under the hood is not only done for the sake of learning a theory, but it directly impacts your daily work, and it can help us to improve performance and reduce resources dramatically.

During this session we discussed the structure of tables behind the scenes. We saw the changes behind the scenes related to DDL actions, which we execute on tables. Using this information, I demonstrated several real case scenarios, and the dramatic impact of understanding the internals on choosing our solution.

Is the order of columns important? Is dropping a column is the right solution and what is the impact? Questions like those are very common in the forums and usually does not get the right answers even by experts, but the answers might be a game changer in production environment. 

There is no presentation to share. This session is a live demo!

📝THIS POST DOES NOT REPLACE THE SESSION BUT ONLY COME TO SUMMERTIME IT
in this post I will share the code and some addition points, but it had low value without the recording. To get all the information and to get all the explanations you should check the recording.

Starting the meeting

The meeting held using "gotowebinar" tool, like other PASS Virtual User Group, including PASS Hebrew Group which I lead. Each meeting starts with some administrative announcements and presenting the speaker. 

In short... My name is Ronen Ariely. I am from Israel. I am a senior consultant, and I am primarily engaged in the architecture and designing enterprise level systems in the fields of Data Platforms and developing applications. If you want to read more about me, I simply recommend to use Google or you can find me at Facebook & Linkedin.

We had a harsh start. There was small technical issue at the begging of the meeting (sharing screen & poll). This is the first time that I used poll. Seems like I need to investigate this feature a bit more, since during the meeting I could not see the poll, but other people could see it and use it.


So, let's move to the Technical part of meeting

I opened the meeting with the question: Does Order of Columns in a table Important?

Usually when I ask this question most people say that order is important (during the meeting 2/3 said so), but once I explain that I only interesting on the database's side and not the client's side point-of-view, then I the answers change to NO.

To clarify, I mean is order of columns related to performance, is it related to errors that might rise only when use specific order, or will SELECT query return different DATA = result SET (not important the order of the result SET but the DATA) if the order of columns is different,and so on.

The answers which we can find online from the top experts include: 

  • The column order is just a "cosmetic" thing we humans care about
  • you should NEVER use INSERT without explicitly specifying your columns anyway!
  • column order is not a concept that's relevant to a SQL table
  • Nope. It doesnt matter. You can always retrieve the columns in the order you want by specifying the columns explicitly in SELECT
  • Well, not for SQL Server. Performance and functionality will be exactly the same.

At this point I executed a short script, which show that ORDER IS IMPORTANT (explanation came at the end of the meeting)

And now, let me just present all the code from the meeting, so while you watch the recording, you will be able to test it yourself.


1. watch the recording first!
2. This post does not include a tutorial,
It only include the code from the meeting
.

📝 Note! If you want to skip the administrative opening and the tech issues at the beginning, then you can jump straight to the lecture using this link or move manually to second 569 (about 9:30 minutes from the start).




SQL Server "logical layer" vs "Physical layer"

use InternalsTableStructure01
GO
DROP TABLE IF EXiSTS T
GO
CREATE TABLE T(
    INT01       INT NOT NULL IDENTITY(1,1),
    NVARCHAR02  NVARCHAR(10),
    INT03       CHAR(10) NULL,
    INT04       CHAR(10) NULL
)
INSERT T(NVARCHAR02,INT03,INT04)
SELECT REPLICATE(N'a', 10), 'Ronen','Ariely'
GO

LOGICAL layer

/*==================================================
            [logical layer]
==================================================*/
SELECT * FROM T
GO
----------------------------------------------- sys.columns
select object_id as Table_ID, [name], column_id
from sys.columns
where object_id = OBJECT_ID(N'T')
GO
----------------------------------------------- sp_help
EXEC sp_help @objname = 'T'
GO
---------------- SSMS object explorer - logical layer! */

Physical layer

/*==================================================
            [Physical layer under the scene]
==================================================*/
CREATE OR ALTER FUNCTION dbo.GetPhysicalTableStructure(@TableName sysname)
RETURNS TABLE AS RETURN (
    select
        p.object_id,
        p.index_id,
        p.partition_number,
        pc.leaf_null_bit,
        coalesce(cx.name, c.name) as column_name,
        pc.partition_column_id,
        pc.max_inrow_length,
        pc.max_length,
        pc.key_ordinal,
        pc.leaf_offset,
        pc.is_nullable,
        pc.is_dropped,
        pc.is_uniqueifier,
        pc.is_sparse,
        pc.is_anti_matter
    from sys.system_internals_partitions p
    join sys.system_internals_partition_columns pc on p.partition_id = pc.partition_id
    left join sys.index_columns ic on p.object_id = ic.object_id and ic.index_id = p.index_id and ic.index_column_id = pc.partition_column_id
    left join sys.columns c on p.object_id = c.object_id and ic.column_id = c.column_id
    left join sys.columns cx on p.object_id = cx.object_id and p.index_id in (0,1) and pc.partition_column_id = cx.column_id
    where p.object_id = object_id(@TableName)
)
GO

test the physical structure of the table

SELECT * from GetPhysicalTableStructure('T')
GO

Examine the impact of DDL action on the Table structure

And at this step we start to play with the table and examine the result

📝Note! Once more I remind you to check the recording. This is only the code from the meeting!

 

Add PRIMARY KEY

/*==================================================
            [ADD PRIMARY KEY]
==================================================*/
SELECT * from GetPhysicalTableStructure('T')
    ALTER TABLE T ADD CONSTRAINT pk_T_INT01 primary key (INT01);
SELECT * from GetPhysicalTableStructure('T')
GO

Examine the binary data that is stored on the disk

/*==================================================
            [examine the binary data that is stored on the disk]
==================================================*/
DBCC IND('InternalsTableStructure01', 'dbo.T', 0);
GO
DBCC TRACEON(3604, -1);
GO
DBCC PAGE('InternalsTableStructure01', 1, 344, 1);
GO

Drop the PRIMARY KEY

/*==================================================
    [DROP Primary Key]
==================================================*/
SELECT * from GetPhysicalTableStructure('T')
    ALTER TABLE DROP CONSTRAINT pk_T_INT01
SELECT * from GetPhysicalTableStructure('T')
GO

Add CLUSTERED INDEX

/*==================================================
    [ADD CLUSTERED INDEX]
==================================================*/
SELECT * from GetPhysicalTableStructure('T')
    CREATE CLUSTERED INDEX CX_T_INT04 ON T (INT04)
SELECT * from GetPhysicalTableStructure('T')
GO

Checking the impact of CLUSTERED INDEX on the data that is stored on the disk:

/*==================================================
            [examine the binary data that is stored on the disk]
==================================================*/
DBCC IND('InternalsTableStructure01', 'dbo.T', 0);
GO
DBCC TRACEON(3604, -1);
GO
DBCC PAGE('InternalsTableStructure01', 1, 344, 1);
GO

And how NONCLUSTERED impact the physical structure?

/*==================================================
    [ADD NONCLUSTERED INDEX]
==================================================*/
SELECT * from GetPhysicalTableStructure('T')
    ALTER TABLE T ADD CONSTRAINT  pk_T_INT01 PRIMARY KEY NONCLUSTERED  (INT01)
SELECT * from GetPhysicalTableStructure('T')
GO

 

/*==================================================
            [INDEX Rebuild ONLINE=ON ]
==================================================*/
SELECT * from GetPhysicalTableStructure('T')
    ALTER index pk_T_INT01 on T rebuild with (online=on);
SELECT * from GetPhysicalTableStructure('T')
GO

 

/*==================================================
    [Enabling CHANGE_TRACKING]
==================================================*/
ALTER DATABASE InternalsTableStructure01
SET CHANGE_TRACKING = ON(CHANGE_RETENTION = 5 DAYS, AUTO_CLEANUP = ON);
GO
 
ALTER TABLE T ENABLE CHANGE_TRACKING;
SELECT * from GetPhysicalTableStructure('T')
GO

 




Note:
At this time in the session, I discussed a real life scenario which based on a question at the msdn forums.

You can found the entire discussion and all my relevant code, in following post.

Before you continue the reading, Please Click here to move the discussion about
Best Practice for un-needed column in the table

At the discussion above, I cover the impact of actions like: DROP COLUMN from table, ALTER COLUMN type or properties (NULL/NOT NULL), and so on. 



Once we cover the understanding of basic DDL actions on the table physical structure, we continued with some more tests:

DROP TABLE IF EXISTS T;
CREATE TABLE T(
    FixedDataType1 BIGINT NOT NULL, -- We do NOT need this column anymore
    FixedDataType2 CHAR(20) NULL,
    VarDatatype VARCHAR(20) NULL,
    FixedDataType3 CHAR(20) NULL
)
GO
 
/*==================================================
     [FIXED LENGTH column to VARIABLE-LENGTH]
==================================================*/
-- Let's examine changing from FIXED-LENGTH column to VARIABLE-LENGTH column
SELECT * from GetPhysicalTableStructure('T')
ALTER TABLE T ALTER COLUMN FixedDataType2 VARCHAR(20)
SELECT * from GetPhysicalTableStructure('T')
GO
 
/*==================================================
     [VARIABLE-LENGTH: increase size]
==================================================*/
SELECT * from GetPhysicalTableStructure('T')
    ALTER TABLE T ALTER COLUMN VarDatatype VARCHAR(100)
SELECT * from GetPhysicalTableStructure('T')
GO
 
/*==================================================
     [VARIABLE-LENGTH: Decrease size]
==================================================*/
SELECT * from GetPhysicalTableStructure('T')
ALTER TABLE T
    ALTER COLUMN VarDatatype VARCHAR(20)
SELECT * from GetPhysicalTableStructure('T')
GO
 
/*==================================================
     [VARIABLE-LENGTH: DROP & ADD]
==================================================*/
DROP TABLE IF EXISTS T;
CREATE TABLE T(
    VarDatatype1 VARCHAR(20) NULL,
    VarDatatype2 VARCHAR(20) NULL,
    FixedDataType1 CHAR(20) NULL
)
GO
insert T (VarDatatype1) values ('Ronen')
GO
 
-----DROP+ADD
SELECT * from GetPhysicalTableStructure('T')
ALTER TABLE T DROP COLUMN VarDatatype1------------------------ DROP COLUMN
ALTER TABLE T ADD  VarDatatype1New VARCHAR(20) NULL----------------- ADD COLUMN
SELECT * from GetPhysicalTableStructure('T')
GO
 
DBCC TRACEON (3604);
GO
DBCC IND ('InternalsTableStructure01', 'T', 1);
GO -- 360
DBCC PAGE (InternalsTableStructure01, 1, 360, 3);
GO
 
/*==================================================
     [VARIABLE-LENGTH: ALTER Unicode to ASCII]
==================================================*/
DROP TABLE IF EXISTS T;
CREATE TABLE T(
    VarDatatype1 NVARCHAR(20) NULL,
    VarDatatype2 NVARCHAR(20) NULL,
    FixedDataType1 NCHAR(20) NULL
)
GO
insert T (VarDatatype1) values (N'Ronen')
GO
 
SELECT * from GetPhysicalTableStructure('T')
ALTER TABLE T ALTER COLUMN VarDatatype1 varchar(20) null
SELECT * from GetPhysicalTableStructure('T')
GO
 
DBCC TRACEON (3604);
GO
DBCC IND ('InternalsTableStructure01', 'T', 1);
GO
DBCC PAGE (InternalsTableStructure01, 1, 360, 3);
GO

 

/*==================================================
     [Partitioned Tables]
==================================================*/
create partition function PF (INT) as range for values ('10');
go
create partition scheme PS as partition PF all to ([PRIMARY]);
go
------------------------------------
DROP TABLE IF EXISTS T
GO
create table T (
    INT01 INT not null,
    INT02 int not null
) on PS(INT01);
go
 
SELECT * from GetPhysicalTableStructure('T')
GO

Questions and answers are MOST Important, and here some of the questions which were asked and answered:

Note! I still need to summarize the meeting and explain the first demo regarding the Order Of Columns. This review of the meeting is order by the order of the meeting... You can skip the "Question & Answers" and the "Feedback from the audience" if you want to continue with the code.

Question & Answers

Q: Is there no difference in query performance or memory allocation when initially designing tables with varchar(20) vs varchar(n)? In other words, is there a reason not to use varchar(max) when creating the table?

A: During this meeting we discuss the physical structure of the table, which mean the data in the disk. Memory allocation is not directly relevant to this meeting since SQL Server does not load to the memory all the hidden physical columns structure (metadata) which he stored on the disk, but only the data. Regarding reasons not to use varchar(max) the short answer is that we need to examine case to case. This is totally outside the scope of the meeting today, but since I do not want to leave any question un-answered, here is a nice post which explain the topic well.

Q: So the solution to reclaim storage after all these 'invisible' changes is to rebuild table?

A: yes

Q: Will a recording of this talk be available for later watching?

A: Yes, it can found here: https://dba.pass.org/?EventID=12967

Q: When you perform a "SELECT * FROM T1 INTO T2" statement, will the column allocations be reorganized?

A: First of all I must emphasize that you should always use the columns names explicitly unless you have a really really good reason not to do so! This can lead to huge issues. Regarding the question, when we use SELECT query from table, then the server returns only the Logical layer of the table. All that we examine in this session is hidden from the common users or from the logical layer. The query SELECT INTO will build a new table which is clean without all the hidden columns (there is no reorganized but if I understand the meaning then it is the same). This is a great question by the way. I might use it as riddle to the audience in my future lectures.

Q: If you change a clustered primary key field, will the table be reorganized when the primary key is rebuilt?

A: First off-topic point: A table does not include fields but columns. The confusion of using "field" instead of "column" usually comes from developers, since in the GUI interface of applications we have "forms". The "fields" are entities in forms. In most cases each field in the GUI fits to a column in the table, which lead people to use the phrase "field in table" by mistake.

Back to the question, the answer is that change in CLUSTERED INDEX on fixed size column type (not related to PRIMARY KEY) will probably change the physical order of the columns (like the demo of adding CLUSTERED INDEX) and the CLUSTERED INDEX column will move to the first location. The table will not be rebuilt. This is not the same as using rebuild in order to clean the physical structure of the table.

Q: How were the leaf_offsets negative?

A: I explained this during the lecture. In short: all FIXED-LENGTH columns of the same row are stored in the disk in the same place one after the other, and all the VARIABLE-LENGTH columns are stored in a different location in the row.

The FIXED-LENGTH column have positive value for leaf_offsets. This value is the position of the column in bytes, from the start of the row.
The VARIABLE-LENGTH has negative value for the leaf_offsets, which present their order.

Therefore, the answer is that the negative leaf_offsets are VARIABLE-LENGTH columns.

Q: I apply help and support to learn more about this basic online skill such as creating editing and formatting of my SQL Internals document.

A: I am not sure that I understand the question. If you want to get more information and learn about the internals then you should come to lectures like this one, read blogs and so on..it is not something you can learn from the official documentation.

Q: Can you please provide scripts you used during presentation?

A: this post include all the script or links to the script I used. I do not hide anything, and you can always contact me on Facebook :-)

Feedback from the audience

These are all the feedback which I got (I asked the host to send me all). Positive or negative any feedback which posed is good and can help me improve future lectures. For this I HIGHLY thanks anyone who submitted his feedback or will provide his feedback in the future. THANKS!

  • Facinating...  did not know this happened under the hood.  Very informative.
  • I was looking forward to this presentation, but I am going to drop off. Audio just isn't working for me.
  • Difficult to follow
  • Special thanks for the beautiful priorities,beautiful communication.
  • Great job! It was a little hard for me to follow but that may be due to my SQL skills.
  • Audio was a bit of a struggle picking up background echo, a headset microphone might be beneficial.  Really enjoyed his direct examples of showing how something would work 1 way, and break in another way when the majority of people would see it as the exact same table build.
  • I don't understand his English.
  • could not make out what the speaker was saying -- the connection was very distorted
  • A little more summary and less demo, as during the demo we got to see the effect but not much discussion on best practice.
  • The demos were great - very clear.  Enjoyed the session a lot.
  • Would be good to test presentation setup before start time
  • Need more respect for your audience and support staff.  Good information and knowledge however.
  • Took a bit long to get started due to issues with screen sharing and such. Beginning was a little slow, but the end was good. Would have liked to see more of the scripts being run to understand a little better what was being presented. Overall, the subject matter was fascinating. T
Now it is a good time to move to the first demo which show that Order of columns is highly important. And for this please move to the following post:

click the link above and come back after

3 Top Conclusions!

  1. Understanding what we have under the hood is not only done for the sake of learning a theory! It directly impacts your daily work. It can help us to improve performance dramatically.
  2. As we saw Order of columns can significantly affect performance! It can even impact errors.
  3. Dropping a column does not reclaim the size directly. It can be very important to rebuild the table when we impact the physical structure.
Thanks to all that came to the lecture👍
I hope it was useful, and you learn something new.
I'll see you in the next meeting or you can find me online😀

signature Ronen Ariely 
 [Personal Site] [Blog] [Facebook] [Linkedin]

 

Resources and more to read

CLUSTERED INDEX