אוג22
Written by:
ronen ariely
22/08/2018 12:10 
Good day guys,
This post presents some important points related to discussion about a real-life case, which was asked in the MSDN's forums. For the sake of the post, I will present a summarize of the solutions, which were mentioned in the original thread, and a short discussion on each.
In order to examine the solutions I will use several UN-DOCUMENTED elements including: sys.system_internals_partitions, sys.system_internals_partition_columns, DBCC IND, and DBCC PAGE
This post is advance topic, level 500 - Internals!
Update 2019-03-019: After presenting a lecture in PASS user group, i add some more points and code, and re-organize the post a bit.
Scenario
We have a table [T] with Integer Column [FixedDataType1], which is no longer needed.
use InternalsTableStructure01
GO
DROP
TABLE
IF EXISTS T;
CREATE
TABLE
T(
FixedDataType1
BIGINT
NOT
NULL
,
-- We do NOT need this column anymore
FixedDataType2
CHAR
(20)
NULL
,
VarDatatype
VARCHAR
(20)
NULL
,
FixedDataType3
CHAR
(20)
NULL
)
GO
INSERT
T(FixedDataType1, FixedDataType2, VarDatatype, FixedDataType3)
VALUES
(9999,
'Ronen'
, N
'Ariely'
,
'Lecture'
)
GO
The question: What should we do in order to reclaim the size and/or improve performance?
(1) DROP column
(2) UPDATE values to zero
(3) Change the column to null
(4) ALTER column type
For the sake of the discussion I will CREATE new Inline Table-Valued Function, which return some information about the physical structure of the columns in the table.
/*==================================================
[Physical layer under the scene]
==================================================*/
CREATE
OR
ALTER
FUNCTION
dbo.GetPhysicalTableStructure(@TableName sysname)
RETURNS
TABLE
AS
RETURN
(
select
p.object_id,
p.index_id,
p.partition_number,
pc.leaf_null_bit,
coalesce
(cx.
name
, c.
name
)
as
column_name,
pc.partition_column_id,
pc.max_inrow_length,
pc.max_length,
pc.key_ordinal,
pc.leaf_offset,
pc.is_nullable,
pc.is_dropped,
pc.is_uniqueifier,
pc.is_sparse,
pc.is_anti_matter
from
sys.system_internals_partitions p
join
sys.system_internals_partition_columns pc
on
p.partition_id = pc.partition_id
left
join
sys.index_columns ic
on
p.object_id = ic.object_id
and
ic.index_id = p.index_id
and
ic.index_column_id = pc.partition_column_id
left
join
sys.columns c
on
p.object_id = c.object_id
and
ic.column_id = c.column_id
left
join
sys.columns cx
on
p.object_id = cx.object_id
and
p.index_id
in
(0,1)
and
pc.partition_column_id = cx.column_id
where
p.object_id = object_id(@TableName)
)
GO
Let's examine the physical structure of the table (in other words, the structure of the columns in the table) using the above function.
SELECT
*
from
GetPhysicalTableStructure(
'T'
)
GO

in this post we will focus on the value of the leaf_offset of Fixed length data types. For Fixed length data types, the value present the position in Bytes where the value of the column is stored in the disk,from the beginning of the the row.
For example in the above result, we can see that the column "FixedDataType1" is located 4 bytes after the start of the row (the location of the rows is stored at the end of the page in an array-mapping). The columns length is 8 bytes (BIGINT) which mean that the next Fixed length data types column will be stored at position 4+8 = 12. As expected the second column "FixedDataType2" has the value 12 for leaf_offset, and next Fixed length data types "FixedDataType3" is at position 32.
Solution 1: Dropping the column
Let’s check the first solution, Dropping the column.In first glance it make sense that if we DROP the column, then SQL Server will be ableto usesmallersizein order to store the rows. Is this correct?!?
ALTER
TABLE
T
DROP
COLUMN
FixedDataType1
SELECT
column_name, leaf_offset
from
GetPhysicalTableStructure(
'T'
)
GO

Notice that DROP Column did not drop any Physical column!
The Column name was simply marked as NULL in the metadata. The value NULL represent a hidden column, which will not be presented in the LOGICAL layer which users work with (These type of hidden columns are only available using undocumented elements).
In fact, the data in the disk was not changed, and we can still read it (using undocumented tools, as I show in this post)
When SQL Server read the data from the disk to the memory, it only stores the logical columns in memory, but since it must store the data in the same structure for all rows in the table, this mean that we did not reclaim the space from the disk.
In the original thread there was a suggestion that maybe the space is reclaimed if entire extents are deleted.
This is not correct. Even if you delete the entire data in the table in all pages (all extents) using "delete" statement, and even if you did not inserted any data yet to the table, the row's structure remain the same. The position of the data in the rows are stored according to the row's structure and leaf_offset of each column in the row's structure. The space between byte 4 to byte 12 will simply be wasted.
Each row that you add to the table uses the same structure and will use the same positions for the column's data using the same leaf_offset, according to the table's structure. You do not reclaim the empty spaces which created only by removing column.
Solution 2: UPDATE the values to Zero
/*==================================================
[
UPDATE
values
to
zero]
==================================================*/
This solution make no sense, since the type of this column is FIXED SIZE. It cost the same for any value. The number 0 stored on the same size as the number 1000000.
Solution 3: ALTER Column from "NOT NULL" to "NULL"
/*==================================================
[
ALTER
Column
from
"NOT NULL"
to
"NULL"
]
==================================================*/
DROP
TABLE
IF EXISTS T;
CREATE
TABLE
T(FixedDataType1
BIGINT
NOT
NULL
,FixedDataType2
CHAR
(20)
NULL
,FixedDataType3
CHAR
(20)
NULL
,VarDatatype
VARCHAR
(20)
NULL
)
GO
SELECT
column_name,leaf_offset, is_nullable
from
GetPhysicalTableStructure(
'T'
)
GO
ALTER
TABLE
T
ALTER
COLUMN
FixedDataType1
BIGINT
NULL
SELECT
column_name,leaf_offset, is_nullable
from
GetPhysicalTableStructure(
'T'
)
GO

The column was simply marked as NULL, but the structure remained the same.
For the sake of the post I want to show the impact of ALTER Column from "NULL" to "NOT NULL" on Fixed Data type.
/*==================================================
[
ALTER
Column
from
"NULL"
to
"NOT NULL"
]
==================================================*/
ALTER
TABLE
T
ALTER
COLUMN
FixedDataType1
BIGINT
NOT
NULL
SELECT
column_name,leaf_offset, is_nullable
from
GetPhysicalTableStructure(
'T'
)
GO

You can notice that another physical column was added to the table's structure, but the physical positions of the data did not chang.
Solution 4: change the type of the column to a smaller type
/*==================================================
[
ALTER
Column
to
a smaller type]
==================================================*/
ALTER
TABLE
T
ALTER
COLUMN
FixedDataType1 TINYINT
NOT
NULL
SELECT
column_name,leaf_offset
from
GetPhysicalTableStructure(
'T'
)
GO

Compare this action to the result of dropping a column. You can notice that the action is similar (but not exactly the same) to dropping a column and adding a new column in the same location of the old one. There is no change in the physical positions, and the action was in-place. This result make sense, since the new size of the column is smaller than the original size,SQL Server can use the same place for the smaller size.
Again, just for the sake of the post and in order to cover the topic, let's see what will happen if I ALTER COLUMN back to the bigger size.
/*==================================================
[
ALTER
Column
to
a bigger type]
==================================================*/
ALTER
TABLE
T
ALTER
COLUMN
FixedDataType1
BIGINT
NOT
NULL
SELECT
column_name,leaf_offset
from
GetPhysicalTableStructure(
'T'
)
GO

This time (ALTER to bigger size) the action is exactly equivalent to DROP COLUMN and ADD new COLUMN. SQL Server does not use the same location even so it has place there for the new size. Instead, the server marks the old column as NULL and adds another column after the last fixed length column in the table.
In our case the last fixed length column was at position 32 and it's length is 10. Therefore the new column was added in position 52, and the table's size increased.
By the way, if the column that we DROP/ALTER is the last physical column in the fixed length columns, then SQL Server can use the same place for the new column.
ALTER TABLE T DROP COLUMN FixedDataType1
ALTER TABLE T ADD FixedDataType4 BIGINT NOT NULL
SELECT column_name,leaf_offset from GetPhysicalTableStructure(
'T'
)
GO

The solution
In order to reclaim the space of dropped/altered columns we need to rebuilt the table.
ALTER
TABLE
T REBUILD
SELECT
*
from
GetPhysicalTableStructure(
'T'
)
GO

Conclusions
My personal approach in general is not to lose any data if possible. Disks are pretty cheap these days and in most cases there is no reason to put the size of the table as a critical criteria. For example, usually I prefer to mark rows as deleted in the application level (using a column"IsDeleted") but not really delete the data from the table. Maybe in the future someone will want to know what was in the past?!?
In this post we saw that
- When we DROP a column we do not necessarily lose the data
- Drooping a column is very fast, since it is only metadata changes, and it does not reclaim the size directly in the physical storage.
- ALTER the column's type/length does not reclaim the space and in fact it might result in use of more space.
- In order to reclaim unused space in the table's structure we can rebuild the table.
I hope that this was helpful.
I highly recommend to check the original thread as well, for more information.
And I will see you in my next post ;-)
Relevant Links