Short answer: probably not.
[Edit] I suggest you also check out the final post in this series: http://sqlblog.karaszi.com/fragmentation-the-final-installment/
[Edit 2] Because of the flood of spam, I’ve closed comments on this. Please comment o the final installment post instead.
Long answer: read on.
You know the story. Every week or so, we defragment the indexes. Many of us uses Ola Hallengren’s great script for this, some uses Maintenance Plans, and there are of course other alternatives as well. But are we just wasting time and effort? Quite probably we are. I’m going to start with some basics, and then do some reasoning, and finally give you some numbers of a very simple test that I ran. The T-SQL code is available. If you give it a try, please let us know your finding for your environment by adding a comment. I will do some generalizations and simplifications, to avid this post being 10 times longer.
What is index fragmentation?
By index here I mean b-tree indexes. You know, the regular types of indexes. I’m not referring to “odd” indexes such as columnstore, hash, geospatial, fulltext or xml indexes. For a b-tree index (from now on, I will just say “index”), the leaf level is stored according the order of the index key. I.e., the leaf is sorted. As you modify data, you can get page splits. Imagine inserting a row “in the middle” and there’s no room on the page. SQL server will split the page, by inserting a page from somewhere into the linked list, move 50% to that new page, and then add the row to where it should be.
- If you now follow the linked list, you jump back and forth. Aka external fragmentation.
- The page split left us with two (nearly) half-full pages. Aka internal fragmentation.
What do we do about it?
Duh, we defragment the indexes, of course! 🙂 The two commonly used methods are:
- Rebuild the indexes. ALTER INDEX … REBUILD. This will create a new index and then remove the old index. This requires free space in the database file(s) about the same size as the index.
- Reorganize the index. ALTER INDEX … REORGANIZE. Basically walk the linked list and for every time you have to go backward in the file when going to the next page, you swap these two pages. It does a bit more than that (re-introduce the fillfactor), but essentially swapping places of the pages not going forwards physically in the file.
While doing defrag, we typically leave some free space on each page (aka fillfactor). Rebuild allow us to specify the fillfactor, while reorg will re-apply whatever value you specified when creating the index.
Why might fragmentation hurt us?
Several aspects here. Let us discuss them one by one:
Lots of empty space
Yes, a page split leaves us with two half-full pages. At that point in time. But as more rows are added, they will fit on that free space.
In the end, you will probably average at about 75% fullness (75% being in between half-full and full). That is for indexes where the inserts (and possibly updates) are inserted “all over the place”.
But we also have increasing indexes. Think columns such as “OrderDate”. Or identity/sequence columns, which are pretty often primary keys, which in turn pretty often are also the clustered index. For such an index, inserts will always add rows to the end of the linked list, and SQL Server will just add pages to the end – not “splitting” any page. (Yeah, adding a page to the end of the linked list is technically called a page split in SQL Server lingo, but that show more the problems that SQL server has with terminology than anything else.) The point here are that increasing indexes will not have free space in them because of inserts going to the end.
Common values I’ve seen for fillfactor are between 70 and 80%.
- For increasing indexes, that just make the indexes bigger. And remember that it isn’t unlikely that the actual data is an increasing index (the clustered index). So, instead of having a 100 GB table, you now have a 120 GB table.
- For other indexes, you probably get about the same size “naturally” (without doing defrag) as when doing a defrag (depending on your fillfactor, of course).
Pages are out of order
Ask your SAN people how much difference there is between sequential I/O and random I/O in your SAN. The answer to this question will tell you whether this point is relevant. For a single spinning disk, there is a big difference. With sequential I/O you don’t have the disk head movement. But as you introduce more disks in various RAID configs, add in a SAN in the mix and now have a bunch of disks and also other concurrent activity apart from your SQL Server in that SAN, the picture isn’t so clear anymore. Now add SSD or some similar technology to the mix. In the end, there might not be that much of a difference between random and sequential I/O on a modern storage solution.
Splitting a page takes time
Yup, sure does. If you can defragment frequently enough with a chosen fillfactor for each index so that they never split, then you avoid this. But how many of us do that? I.e., hand craft the chosen fillfactor for each index by observing the amount of fragmentation for each index since last time we did defrag, and repeat this week after week until we tuned the “perfect” fillfactor for each index. Some of us might do that, but as a consultant, I can tell you that none of my clients have opted for me to do that type of work. In the end, I’m not that certain that we reduce page splits by doing our regular defrag, at least not to the extent that we hope for.
Splitting writes to the transaction log.
Yes, it does. But again, how much difference is there in reality between defrag and not defrag regarding page splits occurring during business hours.
Why might doing defrag hurt us?
Of course there is another side of the coin! Defrag isn’t free. When people asking me whether to defrag, a short question back from me is “How much does it hurt?” If the answer is “None at all”, then sure go ahead! If the answer is “A lot!”, then we want to think twice.
It reduces concurrency because of locking
- Rebuild has an ONLINE option available if you are on Enterprise Edition (EE), which makes the rebuild essentially online. But using online causes the rebuilt to take longer time compared to offline.
- If not on EE, then the whole table will be locked.
- Shared lock if the rebuild is on a nonclustered index – this allow concurrent reads but not modifications.
- Exclusive locks if the index is a clustered index – disallowing both read and modifications.
- Reorg will only put a brief lock of the two pages “where it currently is”. This makes reorg much more online-ly than rebuild on non-EE.
It puts load your system
Say you have a 200 GB database, and rebuild everything. Now you shuffle 200 GB data. This isn’t free. Even if you use smarter scripts to only defrag what is actually fragmented in the first place, defrag will probably shuffle quite a lot of data. Your VM/SAN/SQL/whatever probably have other things to do at night-time. We all fight over these precious night/weekend hours. And for systems that are expected to be online pretty much all the time, this is even more pressing.
You will need a larger LDF file
If you rebuild everything (blindly) then the ldf file need to be the same size as the (sum of) data file(s). A smart script perhaps only rebuild 30% of te data, cutting this down to some 30% of the data size. (This doesn’t apply to simple recovery model.)
The following log backup will be large
Again, if you rebuild 100 GB worth of indexes, then the subsequent log backup will be 100 GB in size. Etc. You get the picture by now. (Again this doesn’t apply to simple recovery model.)
Is there a bottom line?
I don’t know. Maybe there is. But as you already know, we need to throw in a bunch of “it depends” in the mix. But I believe that there are a bunch of defrag going on for a bunch of SQL servers just because “that is the way we always did it”. Our environments aren’t the same as they were 20 years ago.
Can we quantify this?
Now we are getting to the really hard part. Our environments aren’t the same. The HW, SAN, data, VMs, load pattern. Etc.
Still, I was curious to see if I could measure any difference between a fragmented and a non-fragmented index, on an SSD. I used my Ultrabook, 2 core proc with 4 logical cores, with an SSD disk. (I did not disable parallelism.) This is not how your production SQL Servers are setup, I hope. But I still wanted to test the SSD aspect, and I think that many of you are moving towards flash and tiered SANS, if you didn’t do that transformation already. Anyhow, at the end of this post, you’ll find the SQL I ran. In short:
- I used the Stack Overflow database, available to download. The smallest one (roughly 10 GB of data). Here I used the Posts table, which has a clustered index on an identity column. This is my wide index. The table (i.e., the index) has 37 million rows and is about 8.5 GB in size (when I run my tests).
- I created my own table for the narrow index. I wanted to test both. This has 20 million rows, and is only about 1 GB in size.
- I have a stored procedure that scans the whole index following the linked list (for both above, one query per each). No data is returned to the client.
- I run the queries twice in the proc, so I can compare cold (physical I/O) and warm (no physical I/O).
- I did this first for when the indexes are fragmented, ran the proc 4 times.
- I then did a defrag, and ran the proc 4 times again.
- I selected the fillfactor so we end up with same index size as in the first run. This can be debated, but what I wanted to look at is the random vs sequential aspect.
- I run the proc 4 times. I was prepared to delete if any one execution had weird values because of external things going on. That was not necessary in the end.
- I averaged duration_ms, cpu_ms, physical_reads and logical_reads.
- I captured the metrics using an Extended Events trace, which I post-processed using T-SQL and X/Query.
If you want to use the script at the end, you will need to go through it and adapt to your environment. It will perhaps take 30 minutes to understand what it does and adapt file name for trace etc. And then some 30 minutes for the execution.
My findings
For my environment, I found no relevant difference between a fragmented index and a non fragmented index when doing a scan over the whole index, from first page in the linked list to the last page. I’m not surprised considering that I’m on SSD, abut I still wanted to see this with my own eyes, in a sort of controlled environment.
Update: Magnus Ahlkvist did a test at a SQL Saturday meeting on a spinning disk. He did find a significant difference for the wider table. However, no noticeable difference for the narrow index (something one would want to dive deeper into why…).
Here are the numbers (not nicely formatted):
cold_or_warm | index_width | fragmented | duration_ms | cpu_ms | physical_reads | logical_reads |
cold | narrow_index | n | 1848 | 2543 | 123787 | 124471 |
cold | narrow_index | y | 1877 | 2550 | 124307 | 125571 |
cold | wide index | n | 15067 | 2758 | 1019433 | 1020312 |
cold | wide index | y | 15217 | 2722 | 1018283 | 1019295 |
warm | narrow_index | n | 1149 | 4104 | 0 | 124942 |
warm | narrow_index | y | 1105 | 4101 | 0 | 126517 |
warm | wide index | n | 475 | 1867 | 0 | 1028017 |
warm | wide index | y | 490 | 1945 | 0 | 1031488 |
References
Paul White: https://sqlperformance.com/2015/01/t-sql-queries/allocation-order-scans
The Code:
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Setup trace section
SET NOCOUNT ON
ALTER EVENT SESSION frag_test ON SERVER STATE = STOP
WAITFOR DELAY '00:00:02'
--Delete XE file, using xp_cmdshell (bad, I know)
EXEC sp_configure 'xp_cmdshell', 1 RECONFIGURE WITH OVERRIDE
EXEC xp_cmdshell 'DEL R:\frag_test*.xel', no_output
EXEC sp_configure 'xp_cmdshell', 0 RECONFIGURE WITH OVERRIDE
IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'frag_test')
DROP EVENT SESSION frag_test ON SERVER
CREATE EVENT SESSION frag_test ON SERVER
ADD EVENT sqlserver.sp_statement_completed()
ADD TARGET package0.event_file(SET filename=N'R:\frag_test')
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=2 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO
ALTER EVENT SESSION frag_test ON SERVER STATE = START
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Disable IAM order scan, so we know that SQL Server will follow the linked list
--See https://sqlperformance.com/2015/01/t-sql-queries/allocation-order-scans
EXEC sp_configure 'cursor threshold', 1000000
RECONFIGURE
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Setup section
--You might want to grow the data and log files for StackOverflow database pre-running this test. Or run it twice.
USE StackOverflow
DROP TABLE IF EXISTS narrow_index
--Create the table for the narrow index
SELECT TOP(1000*1000*20) ROW_NUMBER() OVER( ORDER BY (SELECT NULL)) AS c1, CAST('Hello' AS char(8)) AS c2
INTO narrow_index
FROM sys.columns AS a, sys.columns AS b, sys.columns AS c
CREATE CLUSTERED INDEX x ON narrow_index(c1)
--Modify Posts table, so we condense cl ix and make it non-fragmented. So we (soon) can fragment it.
UPDATE Posts SET LastEditorDisplayName = ''
ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 100)
GO
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Investigate the data if you want
/*
--wide index
SELECT TOP(100) * FROM Posts
EXEC sp_indexinfo Posts --Found on my web-site
EXEC sp_help 'Posts'
--narrow index
SELECT TOP(100) * FROM narrow_index
EXEC sp_indexinfo narrow_index --Found on my web-site
EXEC sp_help 'narrow_index'
*/
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Create the proc that executes our SQL
CREATE OR ALTER PROC run_the_sql
@fragmented varchar(20)
AS
DECLARE
@sql varchar(1000)
--Empty cache
CHECKPOINT
DBCC DROPCLEANBUFFERS
--Cold cache
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ' + @fragmented + '] FROM Posts WHERE Id between 1000 AND 20000)'
EXEC (@sql)
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ' + @fragmented + '] FROM narrow_index WHERE c1 BETWEEN 50000 AND 60000)'
EXEC (@sql)
----Warm cache
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ' + @fragmented + '] FROM Posts WHERE Id between 1000 AND 20000)'
EXEC (@sql)
SET @sql = 'DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ' + @fragmented + '] FROM narrow_index WHERE c1 BETWEEN 50000 AND 60000)'
EXEC (@sql)
--Note size of index and frag level, should be comparative between executions
SELECT OBJECT_NAME(object_id), index_type_desc, CAST(avg_fragmentation_in_percent AS decimal(5,1)) AS frag, page_count/1000 AS page_count_1000s
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED')
WHERE index_level = 0 AND alloc_unit_type_desc = 'IN_ROW_DATA' AND OBJECT_NAME(object_id) IN('Posts', 'narrow_index')
ORDER BY index_id
GO
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--1: cause fragmentation in both indexes
--Fragment cl ix
UPDATE Posts SET LastEditorDisplayName = REPLICATE('x', 39)
--Fragment nx ix
UPDATE narrow_index SET c1 = c1 + 1 WHERE c1 % 100 = 0
--Run the queries
EXEC run_the_sql @fragmented = 'high_frag_level'
EXEC run_the_sql @fragmented = 'high_frag_level'
EXEC run_the_sql @fragmented = 'high_frag_level'
EXEC run_the_sql @fragmented = 'high_frag_level'
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--2: no frag in either index, fillfactor set to make same size as when fragmented
ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 60)
ALTER INDEX x ON narrow_index REBUILD WITH (FILLFACTOR = 50)
--Run the queries
EXEC run_the_sql @fragmented = 'low_frag_level'
EXEC run_the_sql @fragmented = 'low_frag_level'
EXEC run_the_sql @fragmented = 'low_frag_level'
EXEC run_the_sql @fragmented = 'low_frag_level'
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Reset
EXEC sp_configure 'cursor threshold', -1
RECONFIGURE
--Stop trace
ALTER EVENT SESSION frag_test ON SERVER STATE = STOP
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Work the trace data
--Extract into a temp table
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2
SELECT CAST(event_Data AS XML) AS StatementData
INTO myXeData
FROM sys.fn_xe_file_target_read_file('R:\frag_test*.xel', NULL, NULL, NULL);
--SELECT * FROM #myXeData;
--Use XQuery to transform XML to a table
WITH t AS(
SELECT
StatementData.value('(event/data[@name="duration"]/value)[1]','bigint') / 1000 AS duration_ms
,StatementData.value('(event/data[@name="cpu_time"]/value)[1]','bigint') /1000 AS cpu_ms
,StatementData.value('(event/data[@name="physical_reads"]/value)[1]','bigint') AS physical_reads
,StatementData.value('(event/data[@name="logical_reads"]/value)[1]','bigint') AS logical_reads
,StatementData.value('(event/data[@name="statement"]/value)[1]','nvarchar(500)') AS statement_
FROM myXeData AS evts
WHERE StatementData.value('(event/data[@name="statement"]/value)[1]','nvarchar(500)') LIKE '%frag_level%'
),
t2 AS (
SELECT
CASE WHEN t.physical_reads = 0 THEN 'warm' ELSE 'cold' END AS cold_or_warm
,CASE WHEN t.statement_ LIKE '%cl_ix_scan_%' THEN 'wide index' ELSE 'narrow_index' END AS index_width
,CASE WHEN t.statement_ LIKE '%low_frag_level%' THEN 'n' ELSE 'y' END AS fragmented
,duration_ms
,cpu_ms
,physical_reads
,logical_reads
FROM t)
SELECT *
INTO myXeData2
FROM t2;
--Raw data from the trace
SELECT * FROM myXeData2 ORDER BY index_width, cold_or_warm, fragmented
--Verify pretty consistent values in each quartet.
--If not, then something special occurred (checkppoint, or something external to SQL) - delete that row.
--Get avg values and compare them
SELECT
t2.cold_or_warm
,t2.index_width
,t2.fragmented
,AVG(t2.duration_ms) AS duration_ms
,AVG(t2.cpu_ms) AS cpu_ms
,AVG(t2.physical_reads) AS physical_reads
,AVG(t2.logical_reads) AS logical_reads
FROM myXeData2 aS t2
GROUP BY t2.cold_or_warm, t2.index_width, t2.fragmented
ORDER BY cold_or_warm, index_width, fragmented
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
--Cleanup
/*
DROP TABLE IF EXISTS narrow_index
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2
*/
Funny how modern technology changes things.
However, I would still be potentially concerned about page splits that lead to the fragmentation. Pretty sure they’re still obnoxious even with fancy solid state drives.
Additionally it would be really interesting to see cache utilisation and bytes read. Whilst those two might not matter in simple tests like this, they may well matter in a live env.
Uh OK but what about buffer pool memory usage?
The cold, wide page reads look wrong to me…
I’m seeing at least 80MB of wasted buffer pool, is that right?
I have been a proponent of little to no index defrag for quite some time now.
The only I thing I would be concerned related to the fragmentation is too much empty space in the index pages because it’s not only consumes extra disk space but RAM as well. But it doesn’t necessary means that we need to scan all the indexes every day/week for the empty space (LIMITED option will not provide space_used, must use SAMPLED or DETAILED). I would rather check for the biggest empty space consumed by database/table/index in the buffer pool (which is very inexpensive operation because they are already in RAM) and then fragment them individually with fill-factor closer to 100%.
Good points! And sorry for being a bit late to reply. So far, I wanted to test the seq/rand I/O aspect. There are two other things I want to try for:
1. “Medium” selectivity (whatever that might be). Perhaps 1000 rows? I’m thinking that reading more pages into buffer pool (when fragmented) will increase execution time. And… add to the below point.
2. Bringing more pages into memory, wasting precious memory. I’m thinking that read-ahead will read stuff into memory that you really didn’t need. This should go along with above test – medium selectivity. I just need a good metric to use (suggestions?).
Yes, when the page split happens, you do take a hit. Hence going back to the “How much does it hurt?” question (running regular defrag jobs, that is). If “Not at all”, then do the splits (so to speak) once a week at night time. If, “a lot”, then perhaps only take those hits when needed, whenever they occur. No simple answers here. 🙂