en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP 


 


Microsoft® Community Contributor 


Microsoft® Community Contributor


 Read first, before you you use the blog! Maximize
אוג22

Written by: ronen ariely
22/08/2018 12:10 RssIcon

Good day guys,

This post presents some important points related to discussion about a real-life case, which was asked in the MSDN's forums. For the sake of the post, I will present a summarize of the solutions, which were mentioned in the original thread, and a short discussion on each.

In order to examine the solutions I will use several UN-DOCUMENTED elements including: sys.system_internals_partitions, sys.system_internals_partition_columns, DBCC IND, and DBCC PAGE

This post is advance topic, level 500 - Internals!

Update 2019-03-019: After presenting a lecture in PASS user group, i add some more points and code, and re-organize the post a bit.

Scenario

We have a table [T] with Integer Column [FixedDataType1], which is no longer needed.

use InternalsTableStructure01
GO

DROP TABLE IF EXISTS T;
CREATE TABLE T(
    FixedDataType1 BIGINT NOT NULL, -- We do NOT need this column anymore
    FixedDataType2 CHAR(20) NULL,
    VarDatatype VARCHAR(20) NULL,
    FixedDataType3 CHAR(20) NULL
)
GO
INSERT T(FixedDataType1, FixedDataType2, VarDatatype, FixedDataType3)
VALUES (9999, 'Ronen', N'Ariely', 'Lecture')
GO

The question: What should we do in order to reclaim the size and/or improve performance?

(1) DROP column

(2) UPDATE values to zero

(3) Change the column to null

(4) ALTER column type

For the sake of the discussion I will CREATE new Inline Table-Valued Function, which return some information about the physical structure of the columns in the table.

/*==================================================
            [Physical layer under the scene]
==================================================*/
CREATE OR ALTER FUNCTION dbo.GetPhysicalTableStructure(@TableName sysname)
RETURNS TABLE AS RETURN (
    select
        p.object_id,
        p.index_id,
        p.partition_number,
        pc.leaf_null_bit,
        coalesce(cx.name, c.name) as column_name,
        pc.partition_column_id,
        pc.max_inrow_length,
        pc.max_length,
        pc.key_ordinal,
        pc.leaf_offset,
        pc.is_nullable,
        pc.is_dropped,
        pc.is_uniqueifier,
        pc.is_sparse,
        pc.is_anti_matter
    from sys.system_internals_partitions p
    join sys.system_internals_partition_columns pc on p.partition_id = pc.partition_id
    left join sys.index_columns ic on p.object_id = ic.object_id and ic.index_id = p.index_id and ic.index_column_id = pc.partition_column_id
    left join sys.columns c on p.object_id = c.object_id and ic.column_id = c.column_id
    left join sys.columns cx on p.object_id = cx.object_id and p.index_id in (0,1) and pc.partition_column_id = cx.column_id
    where p.object_id = object_id(@TableName)
)
GO

Let's examine the physical structure of the table (in other words, the structure of the columns in the table) using the above function.

SELECT * from GetPhysicalTableStructure('T')
GO

in this post we will focus on the value of the leaf_offset of Fixed length data types. For Fixed length data types, the value present the position in Bytes where the value of the column is stored in the disk,from the beginning of the the row.

For example in the above result, we can see that the column "FixedDataType1" is located 4 bytes after the start of the row (the location of the rows is stored at the end of the page in an array-mapping). The columns length is 8 bytes (BIGINT) which mean that the next Fixed length data types column will be stored at position 4+8 = 12. As expected the second column "FixedDataType2" has the value 12 for leaf_offset, and next Fixed length data types "FixedDataType3" is at position 32.

Solution 1: Dropping the column

Let’s check the first solution, Dropping the column.In first glance it make sense that if we DROP the column, then SQL Server will be ableto usesmallersizein order to store the rows. Is this correct?!?

ALTER TABLE T DROP COLUMN FixedDataType1
SELECT column_name, leaf_offset from GetPhysicalTableStructure('T')
GO

Notice that DROP Column did not drop any Physical column! 

The Column name was simply marked as NULL in the metadata. The value NULL represent a hidden column, which will not be presented in the LOGICAL layer which users work with (These type of hidden columns are only available using undocumented elements).

In fact, the data in the disk was not changed, and we can still read it (using undocumented tools, as I show in this post)

When SQL Server read the data from the disk to the memory, it only stores the logical columns in memory, but since it must store the data in the same structure for all rows in the table, this mean that we did not reclaim the space from the disk.

In the original thread there was a suggestion that maybe the space is reclaimed if entire extents are deleted. 

This is not correct. Even if you delete the entire data in the table in all pages (all extents) using "delete" statement, and even if you did not inserted any data yet to the table, the row's structure remain the same. The position of the data in the rows are stored according to the row's structure and leaf_offset of each column in the row's structure. The space between byte 4 to byte 12 will simply be wasted.

Each row that you add to the table uses the same structure and will use the same positions for the column's data using the same leaf_offset, according to the table's structure. You do not reclaim the empty spaces which created only by removing column.

Solution 2: UPDATE the values to Zero

/*==================================================
            [UPDATE values to zero]
==================================================*/

This solution make no sense, since the type of this column is FIXED SIZE. It cost the same for any value. The number 0 stored on the same size as the number 1000000.

Solution 3: ALTER Column from "NOT NULL" to "NULL"

/*==================================================
            [ALTER Column from "NOT NULL" to "NULL"]
==================================================*/
DROP TABLE IF EXISTS T;
CREATE TABLE T(FixedDataType1 BIGINT NOT NULL,FixedDataType2 CHAR(20) NULL,FixedDataType3 CHAR(20) NULL,VarDatatype VARCHAR(20) NULL
)
GO
SELECT column_name,leaf_offset, is_nullable from GetPhysicalTableStructure('T')
GO
 
ALTER TABLE T ALTER COLUMN FixedDataType1 BIGINT NULL
SELECT column_name,leaf_offset, is_nullable from GetPhysicalTableStructure('T')
GO

The column was simply marked as NULL, but the structure remained the same.

For the sake of the post I want to show the impact of ALTER Column from "NULL" to "NOT NULL" on Fixed Data type.

/*==================================================
            [ALTER Column from "NULL" to "NOT NULL"]
==================================================*/
ALTER TABLE T ALTER COLUMN FixedDataType1 BIGINT NOT NULL
SELECT column_name,leaf_offset, is_nullable from GetPhysicalTableStructure('T')
GO

You can notice that another physical column was added to the table's structure, but the physical positions of the data did not chang.

Solution 4: change the type of the column to a smaller type

/*==================================================
            [ALTER Column to a smaller type]
==================================================*/
ALTER TABLE T ALTER COLUMN FixedDataType1 TINYINT NOT NULL
SELECT column_name,leaf_offset from GetPhysicalTableStructure('T')
GO

Compare this action to the result of dropping a column. You can notice that the action is similar (but not exactly the same) to dropping a column and adding a new column in the same location of the old one. There is no change in the physical positions, and the action was in-place. This result make sense, since the new size of the column is smaller than the original size,SQL Server can use the same place for the smaller size.

Again, just for the sake of the post and in order to cover the topic, let's see what will happen if I ALTER COLUMN back to the bigger size.

/*==================================================
            [ALTER Column to a bigger type]
==================================================*/
ALTER TABLE T ALTER COLUMN FixedDataType1 BIGINT NOT NULL
SELECT column_name,leaf_offset from GetPhysicalTableStructure('T')
GO

This time (ALTER to bigger size) the action is exactly equivalent to DROP COLUMN and ADD new COLUMN. SQL Server does not use the same location even so it has place there for the new size. Instead, the server marks the old column as NULL and adds another column after the last fixed length column in the table.

In our case the last fixed length column was at position 32 and it's length is 10. Therefore the new column was added in position 52, and the table's size increased.

By the way, if the column that we DROP/ALTER is the last physical column in the fixed length columns, then SQL Server can use the same place for the new column. 

ALTER TABLE T DROP COLUMN FixedDataType1
ALTER TABLE T ADD FixedDataType4 BIGINT NOT NULL
SELECT column_name,leaf_offset from GetPhysicalTableStructure('T')
GO

The solution

In order to reclaim the space of dropped/altered columns we need to rebuilt the table.

ALTER TABLE T REBUILD
SELECT * from GetPhysicalTableStructure('T')
GO

 

Conclusions

My personal approach in general is not to lose any data if possible. Disks are pretty cheap these days and in most cases there is no reason to put the size of the table as a critical criteria. For example, usually I prefer to mark rows as deleted in the application level (using a column"IsDeleted") but not really delete the data from the table. Maybe in the future someone will want to know what was in the past?!?

In this post we saw that 

  • When we DROP a column we do not necessarily lose the data
  • Drooping a column is very fast, since it is only metadata changes, and it does not reclaim the size directly in the physical storage. 
  • ALTER the column's type/length does not reclaim the space and in fact it might result in use of more space.
  • In order to reclaim unused space in the table's structure we can rebuild the table. 

I hope that this was helpful.
I highly recommend to check the original thread as well, for more information.

And I will see you in my next post ;-)


Relevant Links