Heaps, forwarding pointers, ALTER TABLE REBUILD and non-clustered indexes

Let’s start with some background on forwarding pointers:

Forwarding pointers in heaps can be a mess to get rid of. A forwarding pointer happens when you modify a row so that the row doesn’t fit on the same page anymore. SQL Server moves the row to a new page and leaves a forwarding pointer where the old row used to be. This means that non-clustered indexes are not affected by the moving of the row – it can still point to the old location. So the modification is less costly than if all non-clustered indexes needed to reflect the row move. But having forwarding pointers can be bad for performance when you read data. A perhaps less known fact is that a scan over a table needs to follow forwarding pointers – essentially “jumping back and forth” a lot if you have many forwarding pointers. That can be really bad for performance (if you have table scans, of course). So, how do we get rid of forwarding pointers? Well, we can shrink the database file, but that is a little like curing a headache by jumping into a big meat grinder. We can export all data and re-import it – not very practical.

Greg Linwood reminded me that in SQL Server 2008, we can do ALTER TABLE … REBUILD. Now, I knew about this option, and every time I tell about it in class I’ve been thinking silently for myself “I need to test whether this is a way to get rid of fwd pointers”. (You generally talk about ALTER TABLE … REBUILD when you talk about enabling compression on a heap.) So, doing a REBUILD of a table using ALTER TABLE sounds promising. Will it get rid of forwarding pointers? Will it also rebuild all non-clustered indexes?

Quick answer for those who don’t care reading the TSQL script:
ALTER TABLE … REBUILD will remove forwarding pointers, but for some strange reason it will also rebuild all non-clustered indexes on that table.

See the TSQL code below. It was adapted from a challenge by Linchi Shea to produce a data loading script resulting in worst performance (where I immediately thought of forwarding pointers). See for instance http://sqlblog.com/blogs/linchi_shea/archive/2009/06/07/t-sql-exercise-simplest-data-loading-script-to-produce-worst-query-performance-june-7th-update.aspx.

USE tempdb
GO

IF EXISTS (SELECT FROM sysobjects WHERE name 'test')
DROP TABLE test
GO
CREATE TABLE test (
INT NOT NULL
,x2 INT NOT NULL
,CHAR(10) NOT NULL DEFAULT ('')
,CHAR(10) NOT NULL DEFAULT('')
)

DECLARE @rows INT = 666666@toKeep INT = 400000@diff INT

INSERT test (xx2)
SELECT TOP(@rows)
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS r
,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) % 10 AS s
FROM master..spt_values a CROSS JOIN master..spt_values b
GO
ALTER TABLE test ALTER COLUMN CHAR(892)
ALTER TABLE test ALTER COLUMN CHAR(100)
GO

DECLARE @rows INT = 666666@toKeep INT = 400000@diff INT
DELETE TOP(@rows @toKeep)
FROM test WHERE x2 IN(2468)
GO

CREATE INDEX x1 ON test(x)
CREATE INDEX x2 ON test(x2)

SELECT FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('test'), NULL, NULL, 'DETAILED')
--First  run (no nc indexes ): 387157 fwd records (out of 400000 rows), 96104 pages
--Second run (two nc indexes): 387157 fwd records (out of 400000 rows), 96105 pages

CHECKPOINT
DBCC DROPCLEANBUFFERS
DECLARE @t time SYSDATETIME()
ALTER TABLE test REBUILD
SELECT DATEDIFF(ms@tCAST(SYSDATETIME() AS time))
--First run, no non-clustered indexes, three subsequent executions (ms): 19351, 20683, 20275
--Second run, with two non-clustered indexes, three subsequent executions (ms): 31803, 35065, 37511

SELECT FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('test'), NULL, NULL, 'DETAILED')
--First  run, heap = "index" 0 (no nc indexes ): 0 fwd records, 50002 pages
--Second run, heap = "index" 0 (two nc indexes): 0 fwd records, 50003 pages
--But: avg_page_space_used_in_percent changes for the nc indexes after the rebuild.
--That I take as nc indexes are also rebuilt.

Leave a Reply

Your email address will not be published.