Index fragmentation, part 2

Does fragmentation hurt?

Last time, I said “probably not”. I’m glad I added that “probably” in there. This time, I will say possibly. See http://sqlblog.karaszi.com/does-index-fragmentation-matter/

In my last blog post, I wanted to focus on the sequential vs random I/O aspect and how that part should be more or less irrelevant with modern hardware. So I did a test that did a full scan (following the linked list of an index) and see if I could notice any performance difference on my SSD. I couldn’t.

That isn’t the end of the story, it turns out. Another aspect is how the data is brought into memory. You might know that SQL server can do “read ahead”, meaning it does larger reads per I/O instead if single-page I/O. I.e., fewer but larger I/O operations. This sounds fine, but what happens when we have fragmentation?

I have had this in the back of my mind for ages now. Since nobody else seemed to care about or discuss this topic, I had this feeling that I was barking up the wrong tree. But, since I now have a test-script I can just tweak it a bit so it does less than a full scan to see if fragmentation seems to make a difference for read-ahead. Spoiler alert: It sure can do! (But possibly not for read-ahead per se.)

My thinking was that read-ahead will read stuff into memory, which it turns out it doesn’t need. Something like bring an extent into memory, but it turns it only need 1 of those 8 pages, and then it jumps somewhere else. I never had a real confirmation whether this applies, or if read-ahead is smarter than that.

So, basically, I parameterized my query so I can play with the selectivity for each SELECT. I have two tables, or perhaps I should say two B-Tree indexes, to play with:

NameRows (millions)Size GB
wide index3.78.4
narrow index201

So, what selectivity to choose? I did three tests:

  • 1000 rows per query.
  • 2 rows per query. For this I suspected no difference. After all, each query will just navigate the tree and find those 2 rows most probably on the same page.
  • 10000 rows per query.

This should be enough to see a pattern, I hope. We already know that for a full scan/seek we couldn’t see a difference. I’m measuring the following attributes:

  • Duration.
  • Physical reads.
  • Size of index in memory. I.e., how much memory was consumed by the query load.

I did not find any difference for a warm index, i.e., when the data is already in cache. So, I didn’t clutter this post with such measures here. Feel free to run my test and play with it if you want to see for yourself. This is why I did not measure logical reads. Also, CPU usage was so low it wasn’t measurable.

Again, the rebuild was done with a fillfactor so that the indexes are the same size both when fragmented and when not fragmented. This is probably not how you do things, but it is so obvious that fitting more rows on a page will make reads quicker, so we don’t need to investigate that aspect.

1000 rows per SELECT:

index_widthfragmentedduration_msphysical_readsmb_cache
narrow_indexn95180.64
narrow_indexy109013118.01
wide_indexn515619114.94
wide_indexy565620416.27

What stands out above is how much of the narrow table was brought into memory from the workload. First few times I ran the test, I thought that something was wrong. I also ran the test and looked if I had physical I/O for the “warm” execution – no. The thinking was that the for the no frag executions, data was brought out of cache for whatever reason, but no physical I/O for the warm execution. I also did a run where I removed the wide_index from the test, thinking that perhaps having this in there will skew the test data – but no.

I also investigated the BP using sys.dm_os_buffer_descriptors with the thinking that just a few MB of used cache cannot contain the data I’m looking for. I fit 162 rows per data page when the index isn’t fragmented. I had 72 such pages in cache. This gives me about 11000 rows. In my workload, I did 10 iterations where each read 1000 rows. I.e., 10000 rows in total. In other words, the data that I worked with did indeed fit in those few MB of memory. I also used sys.dm_os_buffer_descriptors after an execution with only the fragmented narrow table and now there are 17,000 pages in cache instead of 72 pages.

My conclusion is that fragmentation can cause bringing lots and lots if unnecessary data into cache. If you have other thoughts or theories, let me know.

As for the other numbers: Duration increases a bit with fragmentation, and so does physical reads. But not that much.

2 rows per SELECT:

index_widthfragmentedduration_msphysical_readsmb_cache
narrow_indexn67320.16
narrow_indexy84013124.41
wide_indexn94120.19
wide_indexy1421171.58

Above doesn’t make sense at all. Why would a two (2) row SELECT bring so much data into cache when the index happens to be fragmented? I.e., why is read-ahead kicking in for a query with such high selectivity? So I decided to dig deeper. I extracted the query and ran it for itself, just a single execution, without any fancy dynamic SQL going on. I compared execution plans between the fragmented index and the non-fragmented index, but both are identical, index seek. Estimated numbers of rows were also pretty much on the target (1.2 rows).

A single query, reading a single row bring 20 data pages into cache when the table was fragmented (32 pages in total, including index pages). When defragmenting that index, the very same query brought 6 pages into cache. That was still more than expected, but less then when we had fragmentation. There is obviously more at play here… The funny thing is that STATISTICS IO doesn’t report any read-ahead and trace flag 652 (disable read ahead) doesn’t do any difference for the numbers above in this paragraph. So maybe it isn’t read ahead, after all? But there is something fishy going on.

10,000 rows per SELECT:

index_widthfragmentedduration_msphysical_readsmb_cache
narrow_indexn2851655.14
narrow_indexy225158120.04
wide_indexn313711883147.12
wide_indexy346971908149.26

Same pattern for the narrow index. Not much for me to add, except that here we got less I/O for the narrow index when the table was fragmented. I ran it a few times and got a consistent result. I can’t explain this, so I just leave it for any of you to pick your guesses.

Disabling read-ahead

What if I disable read-ahead? I used trace flag 652 to disable read-ahead and ran the the workload with 1000 rows selectivity:

index_widthfragmentedduration_msphysical_readsmb_cache
narrow_indexn1117151.19
narrow_indexy172322120.65
wide_indexn1433320415.95
wide_indexy1614321316.83

Same pattern for memory usage. But now we see a very different number for physical reads. It seems like Extended Event, the sp_statement_completed event and physical_reads value, suffers from the same defect as performance monitor. It doesn’t include the reads done my the read-ahead thread. On the other hand, we can also see that read-ahead can be our friend. Look at the difference in duration.

Bottom line

It turns out that fragmentation can have a pretty significant impact. How much? As usual, it depends on your workload. If you scan entire indexes (see my earlier blog post), then the memory aspect doesn’t seem to differ, which also makes sense. Unless you are so memory constrained that a page is removed from cache, while the query is still running and that page turns out to be needed again.

But when we are more selective than doing a full scan, fragmentation can make a difference for memory usage, evidently. My test is only one workload. And I only noticed a relevant difference for the narrow index. YMMV, of course.

The code

SET NOCOUNT ON


-----------------------------------------------------------------------------------
--Create table to keep track of buffer pool usage
USE StackOverflow

DROP TABLE IF EXISTS bp_usage

CREATE TABLE bp_usage (
 index_width varchar(30) NOT NULL
,fragmented varchar(20) NOT NULL
,mb_cache decimal(9,2) NOT NULL)


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Disable IAM order scan, so we know that SQL Server will follow the linked list
--See https://sqlperformance.com/2015/01/t-sql-queries/allocation-order-scans

EXEC sp_configure 'cursor threshold', 1000000
RECONFIGURE


-----------------------------------------------------------------------------------
--Drop and create event session to keep track of execution time
IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'frag_test')
DROP EVENT SESSION frag_test ON SERVER

--Delete XE file, using xp_cmdshell (bad, I know)
EXEC sp_configure 'xp_cmdshell', 1 RECONFIGURE WITH OVERRIDE
EXEC xp_cmdshell 'DEL R:\frag_test*.xel', no_output 
EXEC sp_configure 'xp_cmdshell', 0 RECONFIGURE WITH OVERRIDE

CREATE EVENT SESSION frag_test ON SERVER 
ADD EVENT sqlserver.sp_statement_completed()
ADD TARGET package0.event_file(SET filename=N'R:\frag_test')
WITH (MAX_DISPATCH_LATENCY=2 SECONDS)
GO

ALTER EVENT SESSION frag_test ON SERVER STATE = START


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Setup section

--Grow the data and log files for StackOverflow database.
IF EXISTS(SELECT size * 8/(1024*1024), * FROM sys.database_files WHERE name = N'StackOverflow2010' AND size * 8/(1024*1024) < 20)
	ALTER DATABASE [StackOverflow] MODIFY FILE ( NAME = N'StackOverflow2010', SIZE = 20GB )
IF EXISTS(SELECT size * 8/(1024*1024), * FROM sys.database_files WHERE name = N'StackOverflow2010_log' AND size * 8/(1024*1024) < 10)
	ALTER DATABASE [StackOverflow] MODIFY FILE ( NAME = N'StackOverflow2010_log', SIZE = 10GB )
GO


--Create the table for the narrow index
RAISERROR('Setup section. About to create table with narrow index...', 10, 1) WITH NOWAIT

DROP TABLE IF EXISTS narrow_index

SELECT TOP(1000*1000*20) ROW_NUMBER() OVER( ORDER BY (SELECT NULL)) AS c1, CAST('Hello' AS char(8)) AS c2
INTO narrow_index
FROM sys.columns AS a, sys.columns AS b, sys.columns AS c

CREATE CLUSTERED INDEX x ON narrow_index(c1)

--Modify Posts table, so we condense cl ix and make it non-fragmented. So we (soon) can fragment it.
RAISERROR('Setup section. About to modify Posts table, so we condense cl ix and make it non-fragmented...', 10, 1) WITH NOWAIT
UPDATE Posts SET LastEditorDisplayName = ''
ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 100)
GO


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Investigate the data if you want

/*
--wide index
SELECT TOP(100) * FROM Posts
EXEC sp_indexinfo Posts				--Found on my web-site
EXEC sp_help 'Posts'

--narrow index
SELECT TOP(100) * FROM narrow_index
EXEC sp_indexinfo narrow_index		--Found on my web-site
EXEC sp_help 'narrow_index'
*/



-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Create the proc that executes our SQL
RAISERROR('About to create the proc that executes our SQL...', 10, 1) WITH NOWAIT
GO
CREATE OR ALTER PROC run_the_sql
 @fragmented varchar(20)
,@rows_to_read int
,@range_iterations int
AS
DECLARE 
 @sql varchar(1000)
,@sql_condition varchar(1000)
,@bp_wide_mb decimal(9,2)
,@bp_narrow_mb decimal(9,2)
,@range_start int
,@range_iterations_counter int


--Empty cache
CHECKPOINT
DBCC DROPCLEANBUFFERS

--Cold cache
SET @range_iterations_counter = 1
SET @range_start = 1000
WHILE @range_iterations_counter <= @range_iterations
BEGIN
	SET @sql_condition = ' BETWEEN ' + CAST(@range_start AS varchar(10)) + ' AND ' + CAST(@range_start + @rows_to_read AS varchar(10))
	SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ' + @fragmented + '] FROM Posts WHERE Id ' + @sql_condition + ')'
	EXEC (@sql)
	SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ' + @fragmented + '] FROM narrow_index WHERE c1 ' + @sql_condition + ')'
	EXEC (@sql)
	SET @range_start = @range_start + 100000
	SET @range_iterations_counter += 1
END

/*
--Warm cache (same as above, don't clear cache first)
SET @range_iterations_counter = 1
SET @range_start = 1000
WHILE @range_iterations_counter <= @range_iterations
BEGIN
	SET @sql_condition = ' BETWEEN ' + CAST(@range_start AS varchar(10)) + ' AND ' + CAST(@range_start + @rows_to_read AS varchar(10))
	SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ' + @fragmented + '] FROM Posts WHERE Id ' + @sql_condition + ')'
	EXEC (@sql)
	SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ' + @fragmented + '] FROM narrow_index WHERE c1 ' + @sql_condition + ')'
	EXEC (@sql)
	SET @range_start = @range_start + 100000
	SET @range_iterations_counter += 1
END
*/


--Keep track of BP usage
SET @bp_wide_mb = 
(
			SELECT 
			CAST((COUNT(*) * 8.00) / 1024 AS DECIMAL(9,2)) AS MB 
			FROM  sys.allocation_units AS a  
			  JOIN sys.dm_os_buffer_descriptors AS b 
				ON a.allocation_unit_id = b.allocation_unit_id
			  JOIN sys.partitions AS p
				ON a.container_id = p.hobt_id
			WHERE p.object_id = OBJECT_ID('Posts')
			  AND b.database_id = DB_ID()
)

SET @bp_narrow_mb = 
(
			SELECT 
			CAST((COUNT(*) * 8.00) / 1024 AS DECIMAL(9,2)) AS MB 
			FROM  sys.allocation_units AS a  
			  JOIN sys.dm_os_buffer_descriptors AS b 
				ON a.allocation_unit_id = b.allocation_unit_id
			  JOIN sys.partitions AS p
				ON a.container_id = p.hobt_id
			WHERE p.object_id = OBJECT_ID('narrow_index')
			  AND b.database_id = DB_ID()
)

INSERT INTO bp_usage (index_width, fragmented, mb_cache)
VALUES
 ('wide_index', CASE WHEN @fragmented = 'high_frag_level' THEN 'y' ELSE 'n' END, @bp_wide_mb)
,('narrow_index', CASE WHEN @fragmented = 'high_frag_level' THEN 'y' ELSE 'n' END, @bp_narrow_mb)


--Note size of index and frag level, should be comparative between executions
SELECT OBJECT_NAME(object_id), index_type_desc, CAST(avg_fragmentation_in_percent AS decimal(5,1)) AS frag, page_count/1000 AS page_count_1000s
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') 
WHERE index_level = 0 AND alloc_unit_type_desc = 'IN_ROW_DATA' AND OBJECT_NAME(object_id) IN('Posts', 'narrow_index')
ORDER BY index_id
GO


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--1: Cause fragmentation in both indexes

--Fragment wide ix
RAISERROR('About to cause fragmentation in wide index...', 10, 1) WITH NOWAIT
UPDATE Posts SET LastEditorDisplayName = REPLICATE('x', 39)

--Fragment narrow ix
RAISERROR('About to cause fragmentation in narrow index...', 10, 1) WITH NOWAIT
UPDATE narrow_index SET c1 = c1 + 1 WHERE c1 % 100 = 0

--Run the queries
RAISERROR('About to run queries with high frag level...', 10, 1) WITH NOWAIT
EXEC run_the_sql @fragmented = 'high_frag_level', @rows_to_read = 1000, @range_iterations = 10


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--2: no frag in either index, fillfactor set to make same size as when fragmented

RAISERROR('About to eliminate fragmentation in wide index...', 10, 1) WITH NOWAIT
ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 60)

RAISERROR('About to eliminate fragmentation in narrow index...', 10, 1) WITH NOWAIT
ALTER INDEX x ON narrow_index REBUILD WITH (FILLFACTOR = 50)

--Run the queries
RAISERROR('About to run queries with low frag level...', 10, 1) WITH NOWAIT
EXEC run_the_sql @fragmented = 'low_frag_level', @rows_to_read = 1000, @range_iterations = 10


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Reset
EXEC sp_configure 'cursor threshold', -1
RECONFIGURE

--Stop trace
ALTER EVENT SESSION frag_test ON SERVER STATE = STOP

-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Work the trace data

--Extract into a temp table
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2
DROP TABLE IF EXISTS myXeData3

SELECT CAST(event_data AS XML) AS StatementData
INTO myXeData
FROM sys.fn_xe_file_target_read_file('R:\frag_test*.xel', NULL, NULL, NULL);

--SELECT * FROM #myXeData;

--Use XQuery to transform XML to a table
WITH t AS(
SELECT 
 StatementData.value('(event/data[@name="duration"]/value)[1]','bigint') AS duration_microsec
,StatementData.value('(event/data[@name="cpu_time"]/value)[1]','bigint') AS cpu_microsec
,StatementData.value('(event/data[@name="physical_reads"]/value)[1]','bigint') AS physical_reads
,StatementData.value('(event/data[@name="logical_reads"]/value)[1]','bigint') AS logical_reads
,StatementData.value('(event/data[@name="statement"]/value)[1]','nvarchar(500)') AS statement_
FROM myXeData AS evts
WHERE StatementData.value('(event/data[@name="statement"]/value)[1]','nvarchar(500)') LIKE '%frag_level%'
),
t2 AS (
SELECT 
 CASE WHEN t.physical_reads = 0 THEN 'warm' ELSE 'cold' END AS cold_or_warm
,CASE WHEN t.statement_ LIKE '%cl_ix_scan_%' THEN 'wide_index' ELSE 'narrow_index' END AS index_width
,CASE WHEN t.statement_ LIKE '%low_frag_level%' THEN 'n' ELSE 'y' END AS fragmented
,duration_microsec
,cpu_microsec
,physical_reads
,logical_reads
FROM t)
SELECT *
INTO myXeData2
FROM t2;




-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Run below manually and investigate the output

--Raw data from the trace
SELECT * FROM myXeData2 ORDER BY cold_or_warm, index_width, fragmented
--Verify pretty consistent values in each quartet. 
--If not, then something special occurred (checkppoint, or something external to SQL) - delete that row.

--Get avg values into a new temp table
SELECT
-- t2.cold_or_warm
 t2.index_width
,t2.fragmented
,AVG(t2.duration_microsec) AS duration_microsec
--,AVG(t2.cpu_microsec) AS cpu_microsec
,AVG(t2.physical_reads) AS physical_reads
--,AVG(t2.logical_reads) AS logical_reads
INTO myXeData3
FROM myXeData2 aS t2
GROUP BY t2.index_width, t2.fragmented
ORDER BY index_width, fragmented

--Check it out
SELECT * FROM myXeData3 ORDER BY index_width, fragmented

--Memory usage
SELECT * FROM bp_usage

--Bottom line, join the two
SELECT p.index_width, p.fragmented, p.duration_microsec, p.physical_reads, b.mb_cache
FROM myXeData3 AS p INNER JOIN bp_usage AS b ON p.index_width = b.index_width AND p.fragmented = b.fragmented
ORDER BY index_width, fragmented


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Cleanup
/*
DROP TABLE IF EXISTS narrow_index
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2
DROP TABLE IF EXISTS bp_usage
*/

Does index fragmentation matter?

Short answer: probably not.

Hold on! Make sure you also read my follow-up post on this topic, which has some interesting findings: http://sqlblog.karaszi.com/index-fragmentation-part-2/

Long answer: read on.

You know the story. Every week or so, we defragment the indexes. Many of us uses Ola Hallengren’s great script for this, some uses Maintenance Plans, and there are of course other alternatives as well. But are we just wasting time and effort? Quite probably we are. I’m going to start with some basics, and then do some reasoning, and finally give you some numbers of a very simple test that I ran. The T-SQL code is available. If you give it a try, please let us know your finding for your environment by adding a comment. I will do some generalizations and simplifications, to avid this post being 10 times longer.

 

What is index fragmentation?

By index here I mean b-tree indexes. You know, the regular types of indexes. I’m not referring to “odd” indexes such as columnstore, hash, geospatial, fulltext or xml indexes. For a b-tree index (from now on, I will just say “index”), the leaf level is stored according the order of the index key. I.e., the leaf is sorted. As you modify data, you can get page splits. Imagine inserting a row “in the middle” and there’s no room on the page. SQL server will split the page, by inserting a page from somewhere into the linked list, move 50% to that new page, and then add the row to where it should be.

  • If you now follow the linked list, you jump back and forth. Aka external fragmentation.
  • The page split left us with two (nearly) half-full pages. Aka internal fragmentation.

 

What do we do about it?

Duh, we defragment the indexes, of course! 🙂 The two commonly used methods are:

  • Rebuild the indexes. ALTER INDEX … REBUILD. This will create a new index and then remove the old index. This requires free space in the database file(s) about the same size as the index.
  • Reorganize the index. ALTER INDEX … REORGANIZE. Basically walk the linked list and for every time you have to go backward in the file when going to the next page, you swap these two pages. It does a bit more than that (re-introduce the fillfactor), but essentially swapping places of the pages not going forwards physically in the file.

While doing defrag, we typically leave some free space on each page (aka fillfactor). Rebuild allow us to specify the fillfactor, while reorg will re-apply whatever value you specified when creating the index.

 

Why might fragmentation hurt us?

Several aspects here. Let us discuss them one by one:

Lots of empty space

Yes, a page split leaves us with two half-full pages. At that point in time. But as more rows are added, they will fit on that free space.

In the end, you will probably average at about 75% fullness (75% being in between half-full and full). That is for indexes where the inserts (and possibly updates) are inserted “all over the place”.

But we also have increasing indexes. Think columns such as “OrderDate”. Or identity/sequence columns, which are pretty often primary keys, which in turn pretty often are also the clustered index. For such an index, inserts will always add rows to the end of the linked list, and SQL Server will just add pages to the end – not “splitting” any page. (Yeah, adding a page to the end of the linked list is technically called a page split in SQL Server lingo, but that show more the problems that SQL server has with terminology than anything else.) The point here are that increasing indexes will not have free space in them because of inserts going to the end.

Common values I’ve seen for fillfactor are between 70 and 80%.

  • For increasing indexes, that just make the indexes bigger. And remember that it isn’t unlikely that the actual data is an increasing index (the clustered index). So, instead of having a 100 GB table, you now have a 120 GB table.
  • For other indexes, you probably get about the same size “naturally” (without doing defrag) as when doing a defrag (depending on your fillfactor, of course).

 

Pages are out of order

Ask your SAN people how much difference there is between sequential I/O and random I/O in your SAN. The answer to this question will tell you whether this point is relevant. For a single spinning disk, there is a big difference. With sequential I/O you don’t have the disk head movement. But as you introduce more disks in various RAID configs, add in a SAN in the mix and now have a bunch of disks and also other concurrent activity apart from your SQL Server in that SAN, the picture isn’t so clear anymore. Now add SSD or some similar technology to the mix. In the end, there might not be that much of a difference between random and sequential I/O on a modern storage solution.

Splitting a page takes time

Yup, sure does. If you can defragment frequently enough with a chosen fillfactor for each index so that they never split, then you avoid this. But how many of us do that? I.e., hand craft the chosen fillfactor for each index by observing the amount of fragmentation for each index since last time we did defrag, and repeat this week after week until we tuned the “perfect” fillfactor for each index. Some of us might do that, but as a consultant, I can tell you that none of my clients have opted for me to do that type of work. In the end, I’m not that certain that we reduce page splits by doing our regular defrag, at least not to the extent that we hope for.

Splitting writes to the transaction log.

Yes, it does. But again, how much difference is there in reality between defrag and not defrag regarding page splits occurring during business hours.

 

Why might doing defrag hurt us?

Of course there is another side of the coin! Defrag isn’t free. When people asking me whether to defrag, a short question back from me is “How much does it hurt?” If the answer is “None at all”, then sure go ahead! If the answer is “A lot!”, then we want to think twice.

It reduces concurrency because of locking

  • Rebuild has an ONLINE option available if you are on Enterprise Edition (EE), which makes the rebuild essentially online. But using online causes the rebuilt to take longer time compared to offline.
  • If not on EE, then the whole table will be locked.
    • Shared lock if the rebuild is on a nonclustered index – this allow concurrent reads but not modifications.
    • Exclusive locks if the index is a clustered index – disallowing both read and modifications.
  • Reorg will only put a brief lock of the two pages “where it currently is”. This makes reorg much more online-ly than rebuild on non-EE.

 

It puts load your system

Say you have a 200 GB database, and rebuild everything. Now you shuffle 200 GB data. This isn’t free. Even if you use smarter scripts to only defrag what is actually fragmented in the first place, defrag will probably shuffle quite a lot of data. Your VM/SAN/SQL/whatever probably have other things to do at night-time. We all fight over these precious night/weekend hours. And for systems that are expected to be online pretty much all the time, this is even more pressing.

You will need a larger LDF file

If you rebuild everything (blindly) then the ldf file need to be the same size as the (sum of) data file(s). A smart script perhaps only rebuild 30% of te data, cutting this down to some 30% of the data size. (This doesn’t apply to simple recovery model.)

The following log backup will be large

Again, if you rebuild 100 GB worth of indexes, then the subsequent log backup will be 100 GB in size. Etc. You get the picture by now. (Again this doesn’t apply to simple recovery model.)

 

Is there a bottom line?

I don’t know. Maybe there is. But as you already know, we need to throw in a bunch of “it depends” in the mix. But I believe that there are a bunch of defrag going on for a bunch of SQL servers just because “that is the way we always did it”. Our environments aren’t the same as they were 20 years ago.

 

Can we quantify this?

Now we are getting to the really hard part. Our environments aren’t the same. The HW, SAN, data, VMs, load pattern. Etc.

Still, I was curious to see if I could measure any difference between a fragmented and a non-fragmented index, on an SSD. I used my Ultrabook, 2 core proc with 4 logical cores, with an SSD disk. (I did not disable parallelism.) This is not how your production SQL Servers are setup, I hope. But I still wanted to test the SSD aspect, and I think that many of you are moving towards flash and tiered SANS, if you didn’t do that transformation already. Anyhow, at the end of this post, you’ll find the SQL I ran. In short:

  • I used the Stack Overflow database, available to download. The smallest one (roughly 10 GB of data). Here I used the Posts table, which has a clustered index on an identity column. This is my wide index. The table (i.e., the index) has 37 million rows and is about 8.5 GB in size (when I run my tests).
  • I created my own table for the narrow index. I wanted to test both. This has 20 million rows, and is only about 1 GB in size.
  • I have a stored procedure that scans the whole index following the linked list (for both above, one query per each). No data is returned to the client.
  • I run the queries twice in the proc, so I can compare cold (physical I/O) and warm (no physical I/O).
  • I did this first for when the indexes are fragmented, ran the proc 4 times.
  • I then did a defrag, and ran the proc 4 times again.
  • I selected the fillfactor so we end up with same index size as in the first run. This can be debated, but what I wanted to look at is the random vs sequential aspect.
  • I run the proc 4 times. I was prepared to delete if any one execution had weird values because of external things going on. That was not necessary in the end.
  • I averaged duration_ms, cpu_ms, physical_reads and logical_reads.
  • I captured the metrics using an Extended Events trace, which I post-processed using T-SQL and X/Query.

If you want to use the script at the end, you will need to go through it and adapt to your environment. It will perhaps take 30 minutes to understand what it does and adapt file name for trace etc. And then some 30 minutes for the execution.

My findings

For my environment, I found no relevant difference between a fragmented index and a non fragmented index when doing a scan over the whole index, from first page in the linked list to the last page. I’m not surprised considering that I’m on SSD, abut I still wanted to see this with my own eyes, in a sort of controlled environment.

Update: Magnus Ahlkvist did a test at a SQL Saturday meeting on a spinning disk. He did find a significant difference for the wider table. However, no noticeable difference for the narrow index (something one would want to dive deeper into why…).

Here are the numbers (not nicely formatted):

cold_or_warm index_width fragmented duration_ms cpu_ms physical_reads logical_reads
cold narrow_index n 1848 2543 123787 124471
cold narrow_index y 1877 2550 124307 125571
cold wide index n 15067 2758 1019433 1020312
cold wide index y 15217 2722 1018283 1019295
warm narrow_index n 1149 4104 0 124942
warm narrow_index y 1105 4101 0 126517
warm wide index n 475 1867 0 1028017
warm wide index y 490 1945 0 1031488

References

Paul White: https://sqlperformance.com/2015/01/t-sql-queries/allocation-order-scans

The Code:

-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Setup trace section

SET NOCOUNT ON

ALTER EVENT SESSION frag_test ON SERVER STATE = STOP
WAITFOR DELAY '00:00:02'

--Delete XE file, using xp_cmdshell (bad, I know)
EXEC sp_configure 'xp_cmdshell', 1 RECONFIGURE WITH OVERRIDE
EXEC xp_cmdshell 'DEL R:\frag_test*.xel', no_output 
EXEC sp_configure 'xp_cmdshell', 0 RECONFIGURE WITH OVERRIDE


IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'frag_test')
DROP EVENT SESSION frag_test ON SERVER


CREATE EVENT SESSION frag_test ON SERVER 
ADD EVENT sqlserver.sp_statement_completed()
ADD TARGET package0.event_file(SET filename=N'R:\frag_test')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=2 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

ALTER EVENT SESSION frag_test ON SERVER STATE = START



-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Disable IAM order scan, so we know that SQL Server will follow the linked list
--See https://sqlperformance.com/2015/01/t-sql-queries/allocation-order-scans

EXEC sp_configure 'cursor threshold', 1000000
RECONFIGURE


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Setup section

--You might want to grow the data and log files for StackOverflow database pre-running this test. Or run it twice.

USE StackOverflow

DROP TABLE IF EXISTS narrow_index

--Create the table for the narrow index
SELECT TOP(1000*1000*20) ROW_NUMBER() OVER( ORDER BY (SELECT NULL)) AS c1, CAST('Hello' AS char(8)) AS c2
INTO narrow_index
FROM sys.columns AS a, sys.columns AS b, sys.columns AS c

CREATE CLUSTERED INDEX x ON narrow_index(c1)

--Modify Posts table, so we condense cl ix and make it non-fragmented. So we (soon) can fragment it.
UPDATE Posts SET LastEditorDisplayName = ''
ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 100)
GO


-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Investigate the data if you want

/*
--wide index
SELECT TOP(100) * FROM Posts
EXEC sp_indexinfo Posts				--Found on my web-site
EXEC sp_help 'Posts'

--narrow index
SELECT TOP(100) * FROM narrow_index
EXEC sp_indexinfo narrow_index		--Found on my web-site
EXEC sp_help 'narrow_index'
*/



-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Create the proc that executes our SQL

CREATE OR ALTER PROC run_the_sql
 @fragmented varchar(20)
AS
DECLARE 
 @sql varchar(1000)

--Empty cache
CHECKPOINT
DBCC DROPCLEANBUFFERS


--Cold cache
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ' + @fragmented + '] FROM Posts WHERE Id between 1000 AND 20000)'
EXEC (@sql)
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ' + @fragmented + '] FROM narrow_index WHERE c1 BETWEEN 50000 AND 60000)'
EXEC (@sql)

----Warm cache
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ' + @fragmented + '] FROM Posts WHERE Id between 1000 AND 20000)'
EXEC (@sql)
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ' + @fragmented + '] FROM narrow_index WHERE c1 BETWEEN 50000 AND 60000)'
EXEC (@sql)

--Note size of index and frag level, should be comparative between executions
SELECT OBJECT_NAME(object_id), index_type_desc, CAST(avg_fragmentation_in_percent AS decimal(5,1)) AS frag, page_count/1000 AS page_count_1000s
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') 
WHERE index_level = 0 AND alloc_unit_type_desc = 'IN_ROW_DATA' AND OBJECT_NAME(object_id) IN('Posts', 'narrow_index')
ORDER BY index_id
GO



-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--1: cause fragmentation in both indexes

--Fragment cl ix
UPDATE Posts SET LastEditorDisplayName = REPLICATE('x', 39)

--Fragment nx ix
UPDATE narrow_index SET c1 = c1 + 1 WHERE c1 % 100 = 0

--Run the queries
EXEC run_the_sql @fragmented = 'high_frag_level'
EXEC run_the_sql @fragmented = 'high_frag_level'
EXEC run_the_sql @fragmented = 'high_frag_level'
EXEC run_the_sql @fragmented = 'high_frag_level'




-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--2: no frag in either index, fillfactor set to make same size as when fragmented

ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 60)
ALTER INDEX x ON narrow_index REBUILD WITH (FILLFACTOR = 50)

--Run the queries
EXEC run_the_sql @fragmented = 'low_frag_level'
EXEC run_the_sql @fragmented = 'low_frag_level'
EXEC run_the_sql @fragmented = 'low_frag_level'
EXEC run_the_sql @fragmented = 'low_frag_level'



-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Reset
EXEC sp_configure 'cursor threshold', -1
RECONFIGURE

--Stop trace
ALTER EVENT SESSION frag_test ON SERVER STATE = STOP

-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Work the trace data

--Extract into a temp table
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2

SELECT CAST(event_Data AS XML) AS StatementData
INTO myXeData
FROM sys.fn_xe_file_target_read_file('R:\frag_test*.xel', NULL, NULL, NULL);

--SELECT * FROM #myXeData;

--Use XQuery to transform XML to a table
WITH t AS(
SELECT 
 StatementData.value('(event/data[@name="duration"]/value)[1]','bigint') / 1000 AS duration_ms
,StatementData.value('(event/data[@name="cpu_time"]/value)[1]','bigint') /1000 AS cpu_ms
,StatementData.value('(event/data[@name="physical_reads"]/value)[1]','bigint') AS physical_reads
,StatementData.value('(event/data[@name="logical_reads"]/value)[1]','bigint') AS logical_reads
,StatementData.value('(event/data[@name="statement"]/value)[1]','nvarchar(500)') AS statement_
FROM myXeData AS evts
WHERE StatementData.value('(event/data[@name="statement"]/value)[1]','nvarchar(500)') LIKE '%frag_level%'
),
t2 AS (
SELECT 
 CASE WHEN t.physical_reads = 0 THEN 'warm' ELSE 'cold' END AS cold_or_warm
,CASE WHEN t.statement_ LIKE '%cl_ix_scan_%' THEN 'wide index' ELSE 'narrow_index' END AS index_width
,CASE WHEN t.statement_ LIKE '%low_frag_level%' THEN 'n' ELSE 'y' END AS fragmented
,duration_ms
,cpu_ms
,physical_reads
,logical_reads
FROM t)
SELECT *
INTO myXeData2
FROM t2;

--Raw data from the trace
SELECT * FROM myXeData2 ORDER BY index_width, cold_or_warm, fragmented
--Verify pretty consistent values in each quartet. 
--If not, then something special occurred (checkppoint, or something external to SQL) - delete that row.

--Get avg values and compare them
SELECT
 t2.cold_or_warm
,t2.index_width
,t2.fragmented
,AVG(t2.duration_ms) AS duration_ms
,AVG(t2.cpu_ms) AS cpu_ms
,AVG(t2.physical_reads) AS physical_reads
,AVG(t2.logical_reads) AS logical_reads
FROM myXeData2 aS t2
GROUP BY t2.cold_or_warm, t2.index_width, t2.fragmented
ORDER BY cold_or_warm, index_width, fragmented



-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Cleanup
/*
DROP TABLE IF EXISTS narrow_index
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2
*/

Managing tempdb

This post is all about the tempdb database. It is not about capacity planning, like size or number of data files – there is plenty of info about that out there. Instead, this is about managing it, in the sense of changing initial file size, adding a file, removing a file etc. SSMS isn’t obvious for some of these things for the tempdb database, and for some things SSMS is flat out wrong.

I tested all operations on SQL server 2017 and SSMS 17.8.1.

 

Background

Tempdb is re-created at start-up. Or, rather re-initiated (in lack of better words). SQL Server do not delete the files and create new ones, that would take ages for large files if you don’t have Instant File Initialization (which you never have for log files). So how does SQL Server know what files to create, the logical and physical names, size etc? The answer is sys.master_files.

 

sys.master_files

This is the “template” information for the tempdb files. I will refer to sys.master_files as the template below (not to be confused with model which is template for other things, but not the file aspects of tempdb). Here you see how the tempdb files will be created/initialized at next start-up. Don’t confuse this with tempdb.sys.database_files, which show the current state for the tempdb files. Below two queries will show you the current state for the tempdb files and also the template state:

--Current
SELECT
'tempdb' AS db_name_
,file_id
,name
,physical_name
,size * 8/1024 AS size_MB
,type_desc
,CASE WHEN is_percent_growth = 1 THEN CAST(growth AS varchar(3)) + ' %' ELSE CAST(growth * 8/1024 AS varchar(10)) + ' MB' END AS growth
,max_size * 8/1024 AS max_size_MB
FROM tempdb.sys.database_files
ORDER BY type, file_id

--Template
SELECT
DB_NAME(database_id) AS db_name_
,file_id
,name
,physical_name
,size * 8/1024 AS size_MB
,type_desc
,CASE WHEN is_percent_growth = 1 THEN CAST(growth AS varchar(3)) + ' %' ELSE CAST(growth * 8/1024 AS varchar(10)) + ' MB' END AS growth
,max_size * 8/1024 AS max_size_MB
FROM master.sys.master_files
WHERE DB_NAME(database_id)  = 'tempdb'
ORDER BY db_name_, type, file_id

 

Increasing current and template file size to larger than what it currently is

This is pretty straight-forward, both using T-SQL directly and also using SSMS. Here’s what it looks like in SSMS:

(The word “Initial” for the header “Initial Size (MB)” is pure nonsense. What you see is the current file size, picked up from sys.database_files. I am happy to see that the word “Initial” is removed in SSMS 18.0.)

To increase the file size you just type the desired, larger, size for the tempdb file. The T-SQL command to do this is also very straight forward, example:

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'tempdev', SIZE = 5GB )

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'temp2', SIZE = 5GB )

Note that SSMS will generate the file size in KB fpr the T-SQL command, but the T-SQL command accepts KB, MB, GB and even TB.

This operation increases the current file size as well as the template size.

 

Decreasing current file size

To reduce the current size, you can attempt a shrink operation, for instance using such as below T-SQL command:

USE tempdb
DBCC SHRINKFILE(tempdev, 100)

The second parameter is the desired size in MB. This will change the current size as well as the template size. In my experience, you should be prepared for a limited success to do shrink file a tempdb file reducing the current size. In many cases, you will have to re-start SQL Server so it will be created with the template size. Or, hunt down whatever is using it and take it from there.

 

Changing template file size to smaller than what it currently is

One way is to change the size in the GUI to a smaller size and it will try to make the current size smaller as well as change the template size. If you don’t want to perform the shrink operation, then specify a larger size than the current size, script the command to a T-SQL query window and in there just specify whatever size you want. It will generate a command such as:

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'tempdev', SIZE = 100MB )

If the specified size is smaller than the current file size, then it will only change the template, not the current size.

 

Adding file

This is dead-easy. In SSMS, just press the “Add” button and specify the desired name, size etc. This will generate T-SQL such as:

ALTER DATABASE tempdb 
 ADD FILE ( NAME = N'tempdb3', FILENAME = N'R:\SqlData\a\tempdb_mssql_3.ndf' , SIZE = 5GB , FILEGROWTH = 200MB )

This will add it immediately and also to the template (both sys.database_files and sys.master_files).

 

Remove file

This is simple enough if the file is empty. In SSMS you select the file and press the “Remove” button.  It will generate T-SQL such as:

ALTER DATABASE tempdb 
 REMOVE FILE tempdb3

But if the file is current used you will get an error message and nothing was changed: not the current state, nor the template.

You can always attempt to shrink first using the EMPTYFILE option:

USE tempdb
DBCC SHRINKFILE(tempdb3, EMPTYFILE)

If you’re lucky, then the file was emptied and you can now use the ALTER command with the REMOVE FILE option. If not, then you are back to where you started.

You can try to re-start SQL Server and see if that allow you to remove the file. If not, then perhaps using the “failsafe” startup switch: /f (see Erin Stellato’s blog about this: https://www.sqlskills.com/blogs/erin/remove-files-from-tempdb/ ). Not be connected from SSMS when executing the commands might also help.

 

Changing other file attributes

If you want to change things like autogrow or maxsize, then just use SSMS or T-SQL directly. Here’s an example T-SQL command to change both autogrow and maxsize:

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'tempdev', MAXSIZE = 500GB , FILEGROWTH = 500MB )

 

Moving a file

This operation requires a re-start. SSMS doesn’t allow you to do this, so use T-SQL directly, such as:

ALTER DATABASE tempdb 
 MODIFY FILE (NAME = tempdev, FILENAME = 'R:\SqlData\b\tempdb.mdf')

Warning: Henrik reminded me in the comments section to add a warning here. “Make sure that R:\SqlData\b is present and that SQL Server has sufficient rights to create files here!” He is of course correct. In worst case you end up with SQL Server refusing to start if it cannot create tempdb where you have specified. In such case you can try the -f startup switch, perhaps a topic for another blog post.

Now, this is a sneaky one. It does change both sys.database_files and sys.master_files. So, looking only at the catalog views, you will be fooled that the file was moved immediately. But that didn’t happen, check the disk. Fortunately, the command comes with a message such as:

The file "tempdev" has been modified in the system catalog.
The new path will be used the next time the database is started.

That makes it pretty obvious. Note that the old file will not be removed when you restart your SQL Server. Go delete that file, after verifying that the new file was indeed created (if the “old” one is still the current one, then you won’t be able to delete it as long as SQL Server is running).

 

Summary

Managing tempdb isn’t that hard. As soon as you understand that we have the current state (tempdb.sys.master_files) and the template (sys.master_files), it is easier to understand. And, yes, SSMS is a bit quirky for these operations. The reason, methinks, is that it uses the same GUI as for other databases, which doesn’t have the aspect of a “template”.

Did I miss any operation? Let me know and I’ll might add it.

 

Should it be [ola] or [olahallengren]?

I’m (of course 😉 ) talking about the schema that Ola Hallengren plan to move his objects to. If you don’t get what I’m talking about, then visit Ola’s database maintenance solution site here.

Ola is soliciting feedback for the schema naming at Github now, go and express your opinion at: https://github.com/olahallengren/sql-server-maintenance-solution/issues/91.

We are now recommended to install cumulative updates

Microsoft just released a couple of CUs for SQL Server 2012. What is exiting is how Microsoft phrases their recommendations whether we should install them or not. Below is from the KB of one of those recently released CUs:

  • Microsoft recommends ongoing, proactive installation of CUs as they become available:
  • SQL Server CUs are certified to the same levels as service packs and should be installed at the same level of confidence.
  • Historical data shows that a significant number of support cases involve an issue that has already been addressed in a released CU.
  • CUs may contain added value over and above hotfixes. This includes supportability, manageability, and reliability updates.

Now, that is a pretty significant change from what they used to say. In addition, requiring the CU is much easier. You just go to MS Download, select whether you want 32 or 64 bit and then download the bits immediately.

Check it out yourself, go to for instance https://www.microsoft.com/en-us/download/details.aspx?id=50731.

Or check out how the KB for a new SQL Server CU: https://support.microsoft.com/en-us/kb/3120313 (see the “Notes for the update” section).

Managing the errorlog file

I frequently see recommendations to regularly run sp_cycle_errorlog, so that the errorlog doesn’t become huge. My main concern with that is that the errorlog contains valuable information.

When I do a health check on a SQL Server machine, I want a few months worth of errorlog information available. I typically use my own scripts for this, available here. Almost every time I go through the errorlog, I find valuable information. Some things you address, like find whatever it is that is attempting to login every minute. Other things you might not have control over, but the information is valuable to have.

So, if you run sp_cycle_errorlog every day or week, you end up with only a week worth, or a few weeks worth of errorlog file information.

Suggestion 1: Increase the number of errorlog files.

You probably want more than 6 history errorlog files. For instance, a client of mine told me that he was about to patch a server a few days before I was to visit that client. That patch procedure resulted in enough re-start of SQL Server so we ended up with only 4 days worth of errorlog files. Yes, this client had the default of 6 historic errorlog files. I typically increase this to 15. You can do this by right-clicking the “SQL Server Logs” folder under “Management” in SSMS. If you want to use T-SQL, you can use xp_instance_regwrite, as in:

EXEC xp_instance_regwrite
N’HKEY_LOCAL_MACHINE’
,N’Software\Microsoft\MSSQLServer\MSSQLServer’
,N’NumErrorLogs’, REG_DWORD, 15;

Suggestion 2: Set a limit for the size of the errorlog file.

But what about the size? Say that we have crash dumps, for instance. Or other things that start to happen very frequently. The good news is that as of SQL Server 2012, we can set a max size for the errorlog file. There is no GUI for this, so we have to manipulate the registry directly. Again, we can use xp_instance_regwrite. Below will limit the size to 30 MB:

EXEC xp_instance_regwrite
N’HKEY_LOCAL_MACHINE’
,N’Software\Microsoft\MSSQLServer\MSSQLServer’
,N’ErrorLogSizeInKb’, REG_DWORD, 30720;

With 15 files, you can patch of your SQL Server machine without aging out all old errorlog files. And with a max size of 30 MB, you keep each file manageable in size. And you keep the total size of errorlog files for that instance to 450 MB. Not enough to fill your disks. But enough to have historical information for when you are about to perform a health check on your SQL Server instance.

References: this by Jan Kåre and this by Paul Randal.

Are we Borg?

Is it time to stop remeber things? For real, this time?

Today I needed to find two pieces of SQL Server related information. One was straight foward, I posted it on #sqlhelp on twitter and a minute or so later I had the answer. The other was a bit more involved and I asked around in our MVP mail list – a couple of hours later I had bunch of suggestions and tips. These are only examples, it could just as well have been some web-forum, newsgroup, or some other forum. We’ve all had cases like this, but it made me think and reflect.

Travel back in time, say 20 years or so. Computers were hard to use. Well, not computers per se, but rather the software made them hard to use. (Without software computers are easy to use – they don’t do anything!) We were lucky if the software came with a printed manual. Whatever the quality of that manual.

Then came electronic help with the software. That was a big step forward. And it even had index and search!

With the Internet, things really started to pick up. Sure, there were BBS before that, but not broadly used. With Internet, we were given e-mail. Mail lists. Newsgroups. Altavista, Google and Bing (sounds like a supergroup). Web-forums. Facebook. Twitter. Etc. And not only that, we carry the Internet in our pockets, wherever we are.

So, this is what hit me today. I used to keep tips and tricks in a register. Nuggets I picked up, here and there. The past few years, I do less and less of this. There are so many great minds out there, who also are willing to lend a helping hand. So, almost when I realize I need some bit of information, that information is available. Sounds pretty much like the collective mind of the Borg to me. Perhaps not quite there yet, but give it a year or five.

But what worries me is: Where does that leave us, as human beings. Where is the difference between learning and remembering. Deducing versus asking and being given the information. I’m sure you can relate to this, at some level. For instance when we were forbidden to use pocket calculatos at school, because they were too powerful  Or the other way around.

To put it bluntly: If I don’t have to remember anything about SQL Server (the info is out there, right?), I won’t be a very profficient SQL Server person. I’m sure we can formulate lots of ways to differentiate between learning/creating/deducing and remembering. And, don’t get me wrong. I think the development is really interesting – and the community out there is way cool.

But, I just find the development a bit interesting… and worrying … and thrilling.

Why we never want to trick SQL Server into rebuilding a log file

“Never” is a strong word, so let me say that we really really want to avoid it, if at all humanly possible. In short, we will have a (potentially) broken database, both at the physical level and at the logical level. Read on.

Just to be clear, when I refer to a “log file” here, I’m talking about a transaction log file, an .ldf file. Ever so often we see in forums how log files are “lost”, “disappears” or so. Now, SQL Server relies on the ldf file at startup to bring the database to a consistent state. This is known as “recovery”, or “startup recovery”. This is pretty well known, I for instance teach this at every intro level admin course. So, what if

  • The ldf file isn’t there?
  • Or isn’t accessible to the db engine?
  • Or is broken somehow?
  • Or is from a different point in time from the data file? (You’d be surprised to all the things one see over the years.)

Well, SQL Server will do the only reasonable, refuse us into the database and produce an error message (in eventlog etc).

What we see from time to time, is trying to “trick” SQL Server into re-creating an ldf file. So, why is this so bad? I will try to explain why. Let me first say that SQL Server doesn’t do these things to be mean to us, or to prove a point. If SQL Server know that the ldf file is not necessary for recovery (the database was “cleanly shutdown”), then it can and will re-create a log file for us at startup. The problem is that it isn’t these cases we see in forum. The cases we see in the forums is when this didn’t happen. SQL Server relied on the ldf file in order to bring the database to a consistent state.

Enough beating around the bush, here is an example of why we don’t want to trick SQL Server to forcefully re-create a log file:

Say you have a transaction in which you add a row to the order header table (oh) and five rows to the order details table (od). Physically, each command is reflected in the tlog, the page is modified (but still cached) and at commit, the log records are written to the ldf file. There are lots written about these algorithms, but the concepts are pretty simple. For more details, read this.

Say that we forcefully stop SQL Server, delete the ldf file, start SQL Server, see that the database isn’t accessible and somehow “trick” SQL Server into creating an ldf file for us. What is the problem? Why is this so bad? The problem is that you can have no confidence in the state of your data, both at the physical level and at the logical level. Let me start explaining what I mean by the logical level and problems at this level:

The logical level
By the logical level, I consider the user data. The rows in the tables, quite simply. We inserted one row in the oh table and five rows in the od table. These can be inserted into a total of two pages or 6 pages (in reality more, since each table is likely to have indexes etc, I’ll touch on this when we get to the logical level). Say that three of order details rows have been written to disk, but not the order header row, and not the other two order details rows. This is just an example; you can pick and choose any combo you want. It can get just as bad as you can imagine! You might think that the pages are in the end written in the same sequence as we modified them. No, that is now how the cookie crumbles (read the article I posted link to). You can probably imagine other scenarios, closer to your data. For instance, we will not do some certain modification to a table unless some other prior modification was also performed. In essence, rebuilding a log file leave us with no confidence in the data. Are you prepared to go through your database and manually verify all the data in there? Majority of you will say “no”, and it might not even be doable (data volumes, lack of information to verify etc). So, logical inconsistencies are bad. Really bad. We don’t want this. Ok?

The physical level
This log is not only used for “user-data”. It i also used for system data. Tables has indexes, where each row in a table is reflected by a row in each index. We have allocation pages stating what storage is used. We have IAM pages and PFS pages. We have linked lists. And whatnot. You probably realize that these structures also require that some modification is performed in full or not at all. (an extent allocation will be reflected in both the GAM or SGAM page and also in th extent data itself, etc). What do you think is used to protect this? Yes, you got it, the ldf file. Trick SQL Server into re-creating an ldf file and you have all sorts of physical inconsistencies. Actually, physical inconsistencies are a bit better than logical since we do have a way to check for these. I’m of course talking about the mighty DBCC CHECKDB command, a command with lots of smarts (right Paul?) to check that the db is consistent at the physical level. And what if it isn’t? CHECKDB spits out errors. Sure it has repair options, but those generally mean (unless you are the type of person who wins are lotteries) that the repair will rip out whatever makes the db inconsistent resulting in data loss (which also has no knowledge of what you might consider logical consistent data).

So, what to do?
I hope the answer is obvious. Don’t get into this situation in the first place. Don’t go deleting ldf files for instance. Do take backups frequently enough so you don’t end up in a situation like “I need to rebuild the log or I’m toast.”. If something strange happens, don’t start mucking about with the system unless you know what you are doing. Hire a specialist, call MS support or so. If I had a brand new Ferrari, I wouldn’t disassemble the engine in case I hear weird noise from the it. Heck, I barely pop the hood of my Beamer!

And no, I won’t get into what commands can be used to re-build the ldf file. I expect all readers of this to not get into a situation where it is needed. 🙂

(I understand one can encounter a machine with no backups and something happens to the ldf file, and such scnearios. Of course I do. I feel really really bad every time I read about such a situation, because there is no graceful way to handle it. That is why I have such a harsh tone above. I don’t want this to happen to anybody. One would hope that this is obvious, but nothing ever is on the Net. So I’ve learned. So, please leave such unconstructive comments out of the discussions!)

Watch out for old stuff

No, I’m not referring to me, I’m referring to options, config values and such which should really have been removed from the product a long time ago. I was reading a recent blog from Jonathan Kehayias and realize how much old stuff are still visible one way or the other in the product. There are of course commands which has more modern replacements (manage logins and users, attaching databases, changing SID for user etc), but keeing the old command for a few version is common sense. I’m thinking more of stuff where the functionality it gone ages, or even decades, ago, so why keep the knob for it?

Referring to Jonathan’s article, the LoginMode property has in SMO and policies four possible values.

  • “Unknown” maps to registry value 3 or higher which is clearly rubbish (somebody manually changed this to something weird).
  • “Mixed” maps to value 2 in the registry, Known as “SQL Server and Windows”, or “Mixed mode”.
  • “Integrated” maps to value 1 in the registry. Known as “Windows only”.
  • “Normal” maps to values 0 in the registry. But what is this, you think? There are only two values, right?

So, we have something strange exposed in SMO and policies, and it is even referred to as “normal”. This is one example if heritage which should have been cleaned up a long long time ago. We have to go all the way back to SQL Server 6.5 in order to explain this.

  • “Normal” (more commonly known as “Standard” back in the days) is more like what “Mixed” is today.
  • “Mixed” meant something slightly different back then compared to what it means today. We need to recall that back then, we didn’t have Windows authentication like how it is done today. We could map Windows account to SQL Server account using a tool, and this mapping could be used to do what we call today a “Windows authentication” or “trusted connection”. This setting meant that SQL Server could attempt a trusted connection based on the login name you specify, even if you in the connection string didn’t ask for a trusted connection. Confusing, huh? No wonder they got rid of the distinction bewteen “Normal” and “Mixed”.

As of 7.0, the only registry values used are 1 (Integrated) and 2 (Mixed). So why do SMO still expose 0 (Normal)?

We have plenty of other such examples. Look in the registry and you can see mapping for _, # and $. These were used in the mapping tool, so you could map a Windows login, which apparently has a backslash in the name and SQL Server would translate the backslash to some other character (_ by default), allowed as a SQL Server login name. Why do these still exist in the registry?
Or “Default login”? There no such thing as a default login in SQL Server anymore (there is something like a guest *user* but that is a different things from a login.
Or how about setting the master database to full recovery model? That is absurd, you still can’t do log backup and it still behaves like in simple recovery.

OK, enough of the rant. Back to meaningful work.

Do you remember Database Gateways?

I replied to a question in a forum today about intercepting and changing a SQL query, before it reaches SQL Server. I started typing a reply, and realized that… wow… “Does this take me on a trip down memory lane, or what!”. So, just for fun, here is what I wrote (slightly modified):

There used to be products on the market, called “Database Gateways”, something that looks like one type of DBMS but actually accepts the SQL queries and submits them to some other type of DBMS. Basically an “in between” piece of software. For instance I used to work with such a product, which looked like a SQL Server but submitted the queries to IBM’s mainframe DB2.

That market pretty much disappeared around when ODBC was born. Yes, we are talking ancient here, around 1992.

There was a leftover from this time period in SQL Server, called “Open Data Services”, which was basically an API allowing you to write something that looks like a SQL Server. There was even a sample app which allowed you to intercept the SQL queries and then re-submit them to a SQL Server.

This API was in turn transformed over time to what became known as “extended stored procedures”, which in turn was deprecated with SQL Server 2005 and the SQL CLR support.