Does index fragmentation matter?

Short answer: probably not.

Long answer: read on.

You know the story. Every week or so, we defragment the indexes. Many of us uses Ola Hallengren’s great script for this, some uses Maintenance Plans, and there are of course other alternatives as well. But are we just wasting time and effort? Quite probably we are. I’m going to start with some basics, and then do some reasoning, and finally give you some numbers of a very simple test that I ran. The T-SQL code is available. If you give it a try, please let us know your finding for your environment by adding a comment. I will do some generalizations and simplifications, to avid this post being 10 times longer.

 

What is index fragmentation?

By index here I mean b-tree indexes. You know, the regular types of indexes. I’m not referring to “odd” indexes such as columnstore, hash, geospatial, fulltext or xml indexes. For a b-tree index (from now on, I will just say “index”), the leaf level is stored according the order of the index key. I.e., the leaf is sorted. As you modify data, you can get page splits. Imagine inserting a row “in the middle” and there’s no room on the page. SQL server will split the page, by inserting a page from somewhere into the linked list, move 50% to that new page, and then add the row to where it should be.

  • If you now follow the linked list, you jump back and forth. Aka external fragmentation.
  • The page split left us with two (nearly) half-full pages. Aka internal fragmentation.

What do we do about it?

Duh, we defragment the indexes, of course! 🙂 The two commonly used methods are:

  • Rebuild the indexes. ALTER INDEX … REBUILD. This will create a new index and then remove the old index. This requires free space in the database file(s) about the same size as the index.
  • Reorganize the index. ALTER INDEX … REORGANIZE. Basically walk the linked list and for every time you have to go backward in the file when going to the next page, you swap these two pages. It does a bit more than that (re-introduce the fillfactor), but essentially swapping places of the pages not going forwards physically in the file.

While doing defrag, we typically leave some free space on each page (aka fillfactor). Rebuild allow us to specify the fillfactor, while reorg will re-apply whatever value you specified when creating the index.

Why might fragmentation hurt us?

Several aspects here. Let us discuss them one by one:

Lots of empty space

Yes, a page split leaves us with two half-full pages. At that point in time. But as more rows are added, they will fit on that free space.

In the end, you will probably average at about 75% fullness (75% being in between half-full and full). That is for indexes where the inserts (and possibly updates) are inserted “all over the place”.

But we also have increasing indexes. Think columns such as “OrderDate”. Or identity/sequence columns, which are pretty often primary keys, which in turn pretty often are also the clustered index. For such an index, inserts will always add rows to the end of the linked list, and SQL Server will just add pages to the end – not “splitting” any page. (Yeah, adding a page to the end of the linked list is technically called a page split in SQL Server lingo, but that show more the problems that SQL server has with terminology than anything else.) The point here are that increasing indexes will not have free space in them because of inserts going to the end.

Common values I’ve seen for fillfactor are between 70 and 80%.

  • For increasing indexes, that just make the indexes bigger. And remember that it isn’t unlikely that the actual data is an increasing index (the clustered index). So, instead of having a 100 GB table, you now have a 120 GB table.
  • For other indexes, you probably get about the same size “naturally” (without doing defrag) as when doing a defrag (depending on your fillfactor, of course).

 

Pages are out of order

Ask your SAN people how much difference there is between sequential I/O and random I/O in your SAN. The answer to this question will tell you whether this point is relevant. For a single spinning disk, there is a big difference. With sequential I/O you don’t have the disk head movement. But as you introduce more disks in various RAID configs, add in a SAN in the mix and now have a bunch of disks and also other concurrent activity apart from your SQL Server in that SAN, the picture isn’t so clear anymore. Now add SSD or some similar technology to the mix. In the end, there might not be that much of a difference between random and sequential I/O on a modern storage solution.

Splitting a page takes time

Yup, sure does. If you can defragment frequently enough with a chosen fillfactor for each index so that they never split, then you avoid this. But how many of us do that? I.e., hand craft the chosen fillfactor for each index by observing the amount of fragmentation for each index since last time we did defrag, and repeat this week after week until we tuned the “perfect” fillfactor for each index. Some of us might do that, but as a consultant, I can tell you that none of my clients have opted for me to do that type of work. In the end, I’m not that certain that we reduce page splits by doing our regular defrag, at least not to the extent that we hope for.

Splitting writes to the transaction log.

Yes, it does. But again, how much difference is there in reality between defrag and not defrag regarding page splits occurring during business hours.

Why might doing defrag hurt us?

Of course there is another side of the coin! Defrag isn’t free. When people asking me whether to defrag, a short question back from me is “How much does it hurt?” If the answer is “None at all”, then sure go ahead! If the answer is “A lot!”, then we want to think twice.

It reduces concurrency because of locking

  • Rebuild has an ONLINE option available if you are on Enterprise Edition (EE), which makes the rebuild essentially online. But using online causes the rebuilt to take longer time compared to offline.
  • If not on EE, then the whole table will be locked.
    • Shared lock if the rebuild is on a nonclustered index – this allow concurrent reads but not modifications.
    • Exclusive locks if the index is a clustered index – disallowing both read and modifications.
  • Reorg will only put a brief lock of the two pages “where it currently is”. This makes reorg much more online-ly than rebuild on non-EE.

It puts load your system

Say you have a 200 GB database, and rebuild everything. Now you shuffle 200 GB data. This isn’t free. Even if you use smarter scripts to only defrag what is actually fragmented in the first place, defrag will probably shuffle quite a lot of data. Your VM/SAN/SQL/whatever probably have other things to do at night-time. We all fight over these precious night/weekend hours. And for systems that are expected to be online pretty much all the time, this is even more pressing.

You will need a larger LDF file

If you rebuild everything (blindly) then the ldf file need to be the same size as the (sum of) data file(s). A smart script perhaps only rebuild 30% of te data, cutting this down to some 30% of the data size. (This doesn’t apply to simple recovery model.)

The following log backup will be large

Again, if you rebuild 100 GB worth of indexes, then the subsequent log backup will be 100 GB in size. Etc. You get the picture by now. (Again this doesn’t apply to simple recovery model.)

 

Is there a bottom line?

I don’t know. Maybe there is. But as you already know, we need to throw in a bunch of “it depends” in the mix. But I believe that there are a bunch of defrag going on for a bunch of SQL servers just because “that is the way we always did it”. Our environments aren’t the same as they were 20 years ago.

 

Can we quantify this?

Now we are getting to the really hard part. Our environments aren’t the same. The HW, SAN, data, VMs, load pattern. Etc.

Still, I was curious to see if I could measure any difference between a fragmented and a non-fragmented index, on an SSD. I used my Ultrabook, 2 core proc with 4 logical cores, with an SSD disk. (I did not disable parallelism.) This is not how your production SQL Servers are setup, I hope. But I still wanted to test the SSD aspect, and I think that many of you are moving towards flash and tiered SANS, if you didn’t do that transformation already. Anyhow, at the end of this post, you’ll find the SQL I ran. In short:

  • I used the Stack Overflow database, available to download. The smallest one (roughly 10 GB of data). Here I used the Posts table, which has a clustered index on an identity column. This is my wide index. The table (i.e., the index) has 37 million rows and is about 8.5 GB in size (when I run my tests).
  • I created my own table for the narrow index. I wanted to test both. This has 20 million rows, and is only about 1 GB in size.
  • I have a stored procedure that scans the whole index following the linked list (for both above, one query per each). No data is returned to the client.
  • I run the queries twice in the proc, so I can compare cold (physical I/O) and warm (no physical I/O).
  • I did this first for when the indexes are fragmented, ran the proc 4 times.
  • I then did a defrag, and ran the proc 4 times again.
  • I selected the fillfactor so we end up with same index size as in the first run. This can be debated, but what I wanted to look at is the random vs sequential aspect.
  • I run the proc 4 times. I was prepared to delete if any one execution had weird values because of external things going on. That was not necessary in the end.
  • I averaged duration_ms, cpu_ms, physical_reads and logical_reads.
  • I captured the metrics using an Extended Events trace, which I post-processed using T-SQL and X/Query.

If you want to use the script at the end, you will need to go through it and adapt to your environment. It will perhaps take 30 minutes to understand what it does and adapt file name for trace etc. And then some 30 minutes for the execution.

My findings

For my environment, I found no relevant difference between a fragmented index and a non fragmented index when doing a scan over the whole index, from first page in the linked list to the last page. I’m not surprised considering that I’m on SSD, abut I still wanted to see this with my own eyes, in a sort of controlled environment. Here are the numbers (not nicely formatted):

cold_or_warm index_width fragmented duration_ms cpu_ms physical_reads logical_reads
cold narrow_index n 1848 2543 123787 124471
cold narrow_index y 1877 2550 124307 125571
cold wide index n 15067 2758 1019433 1020312
cold wide index y 15217 2722 1018283 1019295
warm narrow_index n 1149 4104 0 124942
warm narrow_index y 1105 4101 0 126517
warm wide index n 475 1867 0 1028017
warm wide index y 490 1945 0 1031488

References

Paul White: https://sqlperformance.com/2015/01/t-sql-queries/allocation-order-scans

The T-SQL

Here goes:

———————————————————————————–
———————————————————————————–
–Setup trace section

SET NOCOUNT ON

ALTER EVENT SESSION frag_test ON SERVER STATE = STOP
WAITFOR DELAY ’00:00:02′

–Delete XE file, using xp_cmdshell (bad, I know)
EXEC sp_configure ‘xp_cmdshell’, 1 RECONFIGURE WITH OVERRIDE
EXEC xp_cmdshell ‘DEL R:\frag_test*.xel’, no_output
EXEC sp_configure ‘xp_cmdshell’, 0 RECONFIGURE WITH OVERRIDE

IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = ‘frag_test’)
DROP EVENT SESSION frag_test ON SERVER

CREATE EVENT SESSION frag_test ON SERVER
ADD EVENT sqlserver.sp_statement_completed()
ADD TARGET package0.event_file(SET filename=N’R:\frag_test’)
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=2 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

ALTER EVENT SESSION frag_test ON SERVER STATE = START

———————————————————————————–
———————————————————————————–
–Disable IAM order scan, so we know that SQL Server will follow the linked list
–See https://sqlperformance.com/2015/01/t-sql-queries/allocation-order-scans

EXEC sp_configure ‘cursor threshold’, 1000000
RECONFIGURE

———————————————————————————–
———————————————————————————–
–Setup section

–You might want to grow the data and log files for StackOverflow database pre-running this test. Or run it twice.

USE StackOverflow

DROP TABLE IF EXISTS narrow_index

–Create the table for the narrow index
SELECT TOP(1000*1000*20) ROW_NUMBER() OVER( ORDER BY (SELECT NULL)) AS c1, CAST(‘Hello’ AS char(8)) AS c2
INTO narrow_index
FROM sys.columns AS a, sys.columns AS b, sys.columns AS c

CREATE CLUSTERED INDEX x ON narrow_index(c1)

–Modify Posts table, so we condense cl ix and make it non-fragmented. So we (soon) can fragment it.
UPDATE Posts SET LastEditorDisplayName = ”
ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 100)
GO

———————————————————————————–
———————————————————————————–
–Investigate the data if you want

/*
–wide index
SELECT TOP(100) * FROM Posts
EXEC sp_indexinfo Posts –Found on my web-site
EXEC sp_help ‘Posts’

–narrow index
SELECT TOP(100) * FROM narrow_index
EXEC sp_indexinfo narrow_index –Found on my web-site
EXEC sp_help ‘narrow_index’
*/

———————————————————————————–
———————————————————————————–
–Create the proc that executes our SQL

CREATE OR ALTER PROC run_the_sql
@fragmented varchar(20)
AS
DECLARE
@sql varchar(1000)

–Empty cache
CHECKPOINT
DBCC DROPCLEANBUFFERS

–Cold cache
SET @sql = ‘DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ‘ + @fragmented + ‘] FROM Posts)’
EXEC (@sql)
SET @sql = ‘DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ‘ + @fragmented + ‘] FROM narrow_index)’
EXEC (@sql)

—-Warm cache
SET @sql = ‘DECLARE @a int SET @a = (SELECT COUNT_BIG(OwnerUserId) AS [cl_ix_scan ‘ + @fragmented + ‘] FROM Posts)’
EXEC (@sql)
SET @sql = ‘DECLARE @a int SET @a = (SELECT COUNT_BIG(c1) AS [nc_ix_scan ‘ + @fragmented + ‘] FROM narrow_index)’
EXEC (@sql)

–Note size of index and frag level, should be comparative between executions
SELECT OBJECT_NAME(object_id), index_type_desc, CAST(avg_fragmentation_in_percent AS decimal(5,1)) AS frag, page_count/1000 AS page_count_1000s
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, ‘LIMITED’)
WHERE index_level = 0 AND alloc_unit_type_desc = ‘IN_ROW_DATA’ AND OBJECT_NAME(object_id) IN(‘Posts’, ‘narrow_index’)
ORDER BY index_id
GO

———————————————————————————–
———————————————————————————–
–1: cause fragmentation in both indexes

–Fragment cl ix
UPDATE Posts SET LastEditorDisplayName = REPLICATE(‘x’, 39)

–Fragment nx ix
UPDATE narrow_index SET c1 = c1 + 1 WHERE c1 % 100 = 0

–Run the queries
EXEC run_the_sql @fragmented = ‘high_frag_level’
EXEC run_the_sql @fragmented = ‘high_frag_level’
EXEC run_the_sql @fragmented = ‘high_frag_level’
EXEC run_the_sql @fragmented = ‘high_frag_level’

———————————————————————————–
———————————————————————————–
–2: no frag in either index, fillfactor set to make same size as when fragmented

ALTER INDEX PK_Posts__Id ON Posts REBUILD WITH (FILLFACTOR = 60)
ALTER INDEX x ON narrow_index REBUILD WITH (FILLFACTOR = 50)

–Run the queries
EXEC run_the_sql @fragmented = ‘low_frag_level’
EXEC run_the_sql @fragmented = ‘low_frag_level’
EXEC run_the_sql @fragmented = ‘low_frag_level’
EXEC run_the_sql @fragmented = ‘low_frag_level’

———————————————————————————–
———————————————————————————–
–Reset
EXEC sp_configure ‘cursor threshold’, -1
RECONFIGURE

–Stop trace
ALTER EVENT SESSION frag_test ON SERVER STATE = STOP

———————————————————————————–
———————————————————————————–
–Work the trace data

–Extract into a temp table
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2

SELECT CAST(event_Data AS XML) AS StatementData
INTO myXeData
FROM sys.fn_xe_file_target_read_file(‘R:\frag_test*.xel’, NULL, NULL, NULL);

–SELECT * FROM #myXeData;

–Use XQuery to transform XML to a table
WITH t AS(
SELECT
StatementData.value(‘(event/data[@name=”duration”]/value)[1]’,’bigint’) / 1000 AS duration_ms
,StatementData.value(‘(event/data[@name=”cpu_time”]/value)[1]’,’bigint’) /1000 AS cpu_ms
,StatementData.value(‘(event/data[@name=”physical_reads”]/value)[1]’,’bigint’) AS physical_reads
,StatementData.value(‘(event/data[@name=”logical_reads”]/value)[1]’,’bigint’) AS logical_reads
,StatementData.value(‘(event/data[@name=”statement”]/value)[1]’,’nvarchar(500)’) AS statement_
FROM myXeData AS evts
WHERE StatementData.value(‘(event/data[@name=”statement”]/value)[1]’,’nvarchar(500)’) LIKE ‘%frag_level%’
),
t2 AS (
SELECT
CASE WHEN t.physical_reads = 0 THEN ‘warm’ ELSE ‘cold’ END AS cold_or_warm
,CASE WHEN t.statement_ LIKE ‘%cl_ix_scan_%’ THEN ‘wide index’ ELSE ‘narrow_index’ END AS index_width
,CASE WHEN t.statement_ LIKE ‘%low_frag_level%’ THEN ‘n’ ELSE ‘y’ END AS fragmented
,duration_ms
,cpu_ms
,physical_reads
,logical_reads
FROM t)
SELECT *
INTO myXeData2
FROM t2;

–Raw data from the trace
SELECT * FROM myXeData2 ORDER BY index_width, cold_or_warm, fragmented
–Verify pretty consistent values in each quartet.
–If not, then something special occurred (checkppoint, or something external to SQL) – delete that row.

–Get avg values and compare them
SELECT
t2.cold_or_warm
,t2.index_width
,t2.fragmented
,AVG(t2.duration_ms) AS duration_ms
,AVG(t2.cpu_ms) AS cpu_ms
,AVG(t2.physical_reads) AS physical_reads
,AVG(t2.logical_reads) AS logical_reads
FROM myXeData2 aS t2
GROUP BY t2.cold_or_warm, t2.index_width, t2.fragmented
ORDER BY cold_or_warm, index_width, fragmented

———————————————————————————–
———————————————————————————–
–Cleanup
/*
DROP TABLE IF EXISTS narrow_index
DROP TABLE IF EXISTS myXeData
DROP TABLE IF EXISTS myXeData2
*/

Make sure you update the statistics

You might know that statistics can have a high impact on performance. As you add, remove and modify data, the statistics will be more and more outdated.

Sure, there’s the database option that updates statistics automatically, but it is a bit … rough. For a table with 10,000,000 rows, you have to modify 2,000,000 rows prior to 2016 (20%) and 100,000 rows as of 2016 with db compat level 2016 (SQRT(1000*@rows)). Also, when auto-update stats kicks in, it will sample the underlying data, in order for it not to take too long etc.

Many of us use Ola Hallengren’s excellent maintenance solution. I’m sure that some of us have our own favorite modifications we do to the jobs that the installation scrips creates for us. My point here is that the installation script do not update statistics by default. Here’s how job step in the “IndexOptimize – USER_DATABASES” job look like on my SQL Server 2017 instance. All by default:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@LogToTable = 'Y'

The default value for the @UpdateStatistics parameter is NULL which means “Do not perform statistics maintenance” See this page for reference.

So, by using the defaults, you end up defragmenting your indexes (which might not give you that much nowadays, considering that there isn’t that much of a difference between random and sequential I/O on modern disk subsystems). For some of the indexes that you defragment, you will get new statistics as a by-product. Those are the indexes that you rebuild – rebuild is internally creating a new index and the dropping the old one. But you likely have many indexes that don’t reach the 30% fragmentation threshold. And consider indexes over a key which is steadily increasing or decreasing. Inserting new rows will not cause fragmentation, but the statistics will become out-of-date.

What to do? Easy, just add a new job in which you call the IndexOptimize procedure with options to update all statistics where at least one row has been modified. Ola even has such example on his web-site, Example D. Here it is, I just added a parameter to log the commands to the CommandLog table:

EXECUTE dbo.IndexOptimize
@Databases = 'USER_DATABASES',
@FragmentationLow = NULL,
@FragmentationMedium = NULL,
@FragmentationHigh = NULL,
@UpdateStatistics = 'ALL',
@OnlyModifiedStatistics = 'Y',
@LogToTable = 'Y'

Schedule above as you wish. I prefer to do it every early morning if I can. But your circumstances like database size, how the data is modified etc will influence the frequency.

Another option is toe modify the existing “IndexOptimize – USER_DATABASES” job and just add below. This will both defrag your indexes and also update statistics.

@UpdateStatistics = 'ALL'

What about Maintenance Plans, you might think? Well, we all know that they aren’t that … smart – which is why we use scripts like Ola’s in the first place. Regarding statistics updates, they will update all stats, regardless of whether any rows at all has been modified since last time. It is a complete waste of resources to update stats if nothing has changed. Even sp_updatestats is smarter in this regard.

Note: This is in no way a criticism of the maintenance solution that Ola provides. He has no way of knowing our requirements in the job he creates. If the installation script could read our minds, I’m sure that it would schedule the jobs for us as well. OTOH, if Ola could create a T-SQL installation script that could read our minds, then he probably would do something else in the first place. 🙂

Rebuilding indexes online, wasting time and resources?

Many SQL server installations does regular index rebuild or reorg in order to defragment indexes. One can question how much we really gain by doing this when using modern SANs, SSD, flashdisks, etc. But, that is not the topic for this blog post. What I want to raise with this blog is whether to use ONLINE when rebuilding indexes. Some background first::

  • With ONLINE = OFF (default), you will be blocked from accessing the data while the index rebuild is being performed. Depending on the type of index, you either get a schema modification lock on the table which prohibits any type of access, or a shared lock which allow reads only.
  • With ONLINE = ON (requires Enterprise Edition), you get a very brief lock at beginning and end of operation. I will refer to the modes as ONLINE and OFFLINE from now on.

There’s more details to what locks are acquired depending on various factors. Check out https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-index-transact-sql for more info. The point is that ONLINE = ON, allow for users to both read and modify data while the operation is on-going. That is a good thing, right? Well, it comes with a price…

The time it takes to rebuild the index can be substantially longer for ONLINE. Many of us has other things to do with the database during night-time and/or weekends. Yes, these are the typical window in time where we try to find things such as index rebuilds. Say that you do it night-time and it currently take 4 hours. Wouldn’t it be nice if you could cut that time down to 1.5 hours? That would leave more time for imports, massaging of data, CHECKDB and other things you want to do. Sure, you can do it ONLINE, but it will slow down access during the rebuild. Also the more data you modify during the rebuild, the more space you need in tempdb.

My point is to not do ONLINE unless you actually need it. Some system don’t have to be “open” during night-time anyhow. Only you can tell what your requirements are, of course. To get some numbers I decided to create a few indexes on a copy of the FactResellerSalesXL_PageCompressed found in the AdventureworksDW2016 database, and I named that table FactResellerSalesXL. I created a copy without data compression and also without any columnstore index. And then I created a few non-clustered index. Here’s the index and space usage for the table, using my sp_indexinfo (with some columns omitted and others re-arranged):

Not a large table by any means, but it gives us something to play with. And it allowed me to do the rebuilds several times in order to rule out figures out of the ordinary. I tested this both in FULL and SIMPLE recovery. The numbers are average after several executions,  minutes : seconds:

  • FULL recovery
    • ONLINE: 3:45
    • OFFLINE: 1:03
  • SIMPLE recovery
    • ONLINE: 1:40
    • OFFLINE: 0:31

 

I frequently use Ola Hallengren’s scripts for database maintenance. (We all do, right? :-)) The default for Ola’s solution (IndexOptimize) is to do the rebuild ONLINE if possible. Or, to be more specific, the defaults are:

  • < 5% fragmentation: do nothing
  • Fragmentation between 5 and 30 %: do REORGANIZE.
  • Fragmentation over 30%: do REBUILD with ONLINE = ON

I.e., when using Ola’s scripts you will rebuild your indexes ONLINE, assuming you are on Enterprise Edition (or similar). For Maintenance Plans (anybody remember those?), the checkbox for ONLINE is not checked by default.

Bottom line: If you have problems with index rebuilds taking a long time and you don’t need other people accessing the data while this is being done, consider doing it OFFLINE instead. Or perhaps not defragment in the first place? Or, raise the bar for to some 60-70 % so you only defragment the indexes that are really fragmented in the first place.

Free book about execution plans

Grant Fritchey has released the third edition of his book about execution plans. And it is freely available thanks to the cooperation with Redgate.

There’s a lot of good stuff in there. Introduction to execution plans, plan caching, reading plans, etc. and then of course perhaps the “meat of the book”: a discussion about the various operators that you can find in an execution plan. Sure, these are documented in Books Online, but far from the level of this book. Say you find a Hash Match in the plan. Is it good? Is it bad? Under what circumstances? And, exactly what does it really do? Things like that…

Get it here .

Managing tempdb

This post is all about the tempdb database. It is not about capacity planning, like size or number of data files – there is plenty of info about that out there. Instead, this is about managing it, in the sense of changing initial file size, adding a file, removing a file etc. SSMS isn’t obvious for some of these things for the tempdb database, and for some things SSMS is flat out wrong.

I tested all operations on SQL server 2017 and SSMS 17.8.1.

 

Background

Tempdb is re-created at start-up. Or, rather re-initiated (in lack of better words). SQL Server do not delete the files and create new ones, that would take ages for large files if you don’t have Instant File Initialization (which you never have for log files). So how does SQL Server know what files to create, the logical and physical names, size etc? The answer is sys.master_files.

 

sys.master_files

This is the “template” information for the tempdb files. I will refer to sys.master_files as the template below (not to be confused with model which is template for other things, but not the file aspects of tempdb). Here you see how the tempdb files will be created/initialized at next start-up. Don’t confuse this with tempdb.sys.database_files, which show the current state for the tempdb files. Below two queries will show you the current state for the tempdb files and also the template state:

--Current
SELECT
'tempdb' AS db_name_
,file_id
,name
,physical_name
,size * 8/1024 AS size_MB
,type_desc
,CASE WHEN is_percent_growth = 1 THEN CAST(growth AS varchar(3)) + ' %' ELSE CAST(growth * 8/1024 AS varchar(10)) + ' MB' END AS growth
,max_size * 8/1024 AS max_size_MB
FROM tempdb.sys.database_files
ORDER BY type, file_id

--Template
SELECT
DB_NAME(database_id) AS db_name_
,file_id
,name
,physical_name
,size * 8/1024 AS size_MB
,type_desc
,CASE WHEN is_percent_growth = 1 THEN CAST(growth AS varchar(3)) + ' %' ELSE CAST(growth * 8/1024 AS varchar(10)) + ' MB' END AS growth
,max_size * 8/1024 AS max_size_MB
FROM master.sys.master_files
WHERE DB_NAME(database_id)  = 'tempdb'
ORDER BY db_name_, type, file_id

 

Increasing current and template file size to larger than what it currently is

This is pretty straight-forward, both using T-SQL directly and also using SSMS. Here’s what it looks like in SSMS:

(The word “Initial” for the header “Initial Size (MB)” is pure nonsense. What you see is the current file size, picked up from sys.database_files. I am happy to see that the word “Initial” is removed in SSMS 18.0.)

To increase the file size you just type the desired, larger, size for the tempdb file. The T-SQL command to do this is also very straight forward, example:

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'tempdev', SIZE = 5GB )

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'temp2', SIZE = 5GB )

Note that SSMS will generate the file size in KB fpr the T-SQL command, but the T-SQL command accepts KB, MB, GB and even TB.

This operation increases the current file size as well as the template size.

 

Decreasing current file size

To reduce the current size, you can attempt a shrink operation, for instance using such as below T-SQL command:

USE tempdb
DBCC SHRINKFILE(tempdev, 100)

The second parameter is the desired size in MB. This will only change the current size, not the template. In my experience, you should be prepared for a limited success to do shrink file a tempdb file. In many cases, you will have to re-start SQL Server so it will be created with the template size. Or, hunt down whatever is using it and take it from there.

 

Changing template file size to smaller than what it currently is

This is a bit trickier using the GUI. If you type a size which is smaller than current size in the GUI, then it will generate a DBCC SHRINKFILE command, as discussed above and that is not what we wanted to do here. So, what you can do is to specify a larger size than the current size, script the command to a T-SQL query window and in there just specify whatever size you want. It will generate a command such as:

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'tempdev', SIZE = 100MB )

If the specified size is smaller than the current file size, then it will only change the template, not the current size.

 

Adding file

This is dead-easy. In SSMS, just press the “Add” button and specify the desired name, size etc. This will generate T-SQL such as:

ALTER DATABASE tempdb 
 ADD FILE ( NAME = N'tempdb3', FILENAME = N'R:\SqlData\a\tempdb_mssql_3.ndf' , SIZE = 5GB , FILEGROWTH = 200MB )

This will add it immediately and also to the template (both sys.database_files and sys.master_files).

 

Remove file

This is simple enough if the file is empty. In SSMS you select the file and press the “Remove” button.  It will generate T-SQL such as:

ALTER DATABASE tempdb 
 REMOVE FILE tempdb3

But if the file is current used you will get an error message and nothing was changed: not the current state, nor the template.

You can always attempt to shrink first using the EMPTYFILE option:

USE tempdb
DBCC SHRINKFILE(tempdb3, EMPTYFILE)

If you’re lucky, then the file was emptied and you can now use the ALTER command with the REMOVE FILE option. If not, then you are back to where you started.

You can try to re-start SQL Server and see if that allow you to remove the file. If not, then perhaps using the “failsafe” startup switch: /f (see Erin Stellato’s blog about this: https://www.sqlskills.com/blogs/erin/remove-files-from-tempdb/ ). Not be connected from SSMS when executing the commands might also help.

 

Changing other file attributes

If you want to change things like autogrow or maxsize, then just use SSMS or T-SQL directly. Here’s an example T-SQL command to change both autogrow and maxsize:

ALTER DATABASE tempdb 
 MODIFY FILE ( NAME = N'tempdev', MAXSIZE = 500GB , FILEGROWTH = 500MB )

 

Moving a file

This operation requires a re-start. SSMS doesn’t allow you to do this, so use T-SQL directly, such as:

ALTER DATABASE tempdb 
 MODIFY FILE (NAME = tempdev, FILENAME = 'R:\SqlData\b\tempdb.mdf')

Warning: Henrik reminded me in the comments section to add a warning here. “Make sure that R:\SqlData\b is present and that SQL Server has sufficient rights to create files here!” He is of course correct. In worst case you end up with SQL Server refusing to start if it cannot create tempdb where you have specified. In such case you can try the -f startup switch, perhaps a topic for another blog post.

Now, this is a sneaky one. It does change both sys.database_files and sys.master_files. So, looking only at the catalog views, you will be fooled that the file was moved immediately. But that didn’t happen, check the disk. Fortunately, the command comes with a message such as:

The file "tempdev" has been modified in the system catalog.
The new path will be used the next time the database is started.

That makes it pretty obvious. Note that the old file will not be removed when you restart your SQL Server. Go delete that file, after verifying that the new file was indeed created (if the “old” one is still the current one, then you won’t be able to delete it as long as SQL Server is running).

 

Summary

Managing tempdb isn’t that hard. As soon as you understand that we have the current state (tempdb.sys.master_files) and the template (sys.master_files), it is easier to understand. And, yes, SSMS is a bit quirky for these operations. The reason, methinks, is that it uses the same GUI as for other databases, which doesn’t have the aspect of a “template”.

Did I miss any operation? Let me know and I’ll might add it.

 

Bug in SSMS when using Extended Events

I’d like to share one bug that has been irritating me over a long time. It is very easy to handle, once you know what the problem is. I’ve seen it when I want to open a file XE target:

Here’s the relevant text in the message box, to facilitate searching:

The storage failed to initialize using the provided parameters. (Microsoft.SqlServer.XEventStorage)

Cannot view the function ‘fn_MSXe_read_event_stream’, because it does not exist or you do not have permission. (Microsoft SQL Server, Error: 15151)

The problem is that Object Explorer in SSMS tries to use that function from the current database, not the master database (where the function exists, so to speak). So the solution is to open Object Explorer making sure that the current database for Object Explorer is the master database.

This bug has been reported on UserVoice, April 2016. Go vote for it if you want to add your voice: https://feedback.azure.com/forums/908035-sql-server/suggestions/32902330-cannot-view-the-function-fn-msxe-read-event-strea

 

Tips for getting started with Extended Events

I know, I know. New habits are hard to learn. Many of us have been using SQL trace and the Profiler GUI for a very long time. And we know that we are supposed to move over to Extended Events (XE), but we postpone it for some later time. And then we give XE a try, and some thing doesn’t work as we want. So we go back to more familiar territories.

But XE has really grown on me over the last few years. I like to share the things that I initially didn’t like with XE, and how I overcame them. And also some other of my tips to make it easier to be productive with XE. I will deliberately minimize showing T-SQL and queries against the XE dynamic management views here. As you use XE more and more, you will probably use T-SQL to a higher degree. But this blog post is for those of you who want to “get into” XE and I find using a GUI is great as a starting point. Assuming the GUI is any good, that is. And I think the SSMS GUI is, for most parts.

Don’t use the ring buffer as target

This is possibly the single most important thing to remember. Some would argue that the limitation for the DMV where you read the data is limited to 4 MB as the reason, and I can appreciate that. But more important is that SSMS won’t parse the data for you. XE is XML based, meaning that when you look at the XE data you get XML. And SSMS does nothing to help you for the ring buffer. It will just show the XML as is!

 

 

 

To the left you have what SSMS show you for the ring buffer and to the right you see what you get when you click the hyperlinked XML: the raw XML.

Use the file as target

So the obvious target would then be to use a file instead:

Note that the wizard to create event session defaults to the ring buffer. So, if you use the wizard, you probably want to change that to a file instead.

Use the targets instead of “watch live data”

Sure, sometimes you want to see things rolling by, as they happens. But saving to a target and looking at that target instead of the live data is more powerful. See the section about sorting etc below.

Show fields as columns

Bu default SSMS only show the name and timestamp fields. But all you have to do is to right-click in a filed at the lower half and select to show that as a column.

Explore sorting, grouping and aggregations

For this you need to look at a trace target, not the live window (except for filtering), but you can display a target even if the trace is currently running. Then just explore and find out that you can easily sort, group and aggregate the data:

Use SSMS to explore events, columns, etc

There are so many things you can capture with XE. But it can also be a but daunting. In my SQL Server 2017 instance, I have 1511 events. Compare that to the 180 you have for the old trace engine! So, XE is definitely richer, but you probably won’t browse through all 1500 events to find whichever could be interesting for the task at hand. So, create an event session using either the wizard to the “real GUI”. Just type letters and the GUI will filter on those, as you type:

Don’t expect events etc to be documented

For the old trace engine, we could read about all events and columns in Books Online. The “documentation” for the XE objects is instead in the DMVs, like for instance sys.dm_xe_objects has a “description” column. You don’t have to use these DMVs directly, though. Follow my above advice and SSMS will pick up those descriptions for you:

Use the “XEvent Profiler” that was added in SSMS 17.3

MS has probably noticed that there is a hesitation to pickup on XE. In SSMS 17.3, they added the XEvent Profiler in SSMS. This is just a quick way to start an XE session with a live data window. Nothing fancy, really, but having it in the “root” of SSMS makes it very snappy to kick of that trace quickly. And you can also customize it. See my earlier blog-post on this.

Explore other targets

The Wizard only allow you to select from two target. The dreaded ring buffer and the file target. But there are other interesting targets. Like histogram, to for instance show how many SQL commands were submitted for each database. Or the event_counter which just counts how any of each events occurred, for instance how many attentions, deadlocks and log escalations have we had. There are other targets as well!

Watch out for that latency

By default you can wait 30 second after an event happened until it has been delivered to the target(s). This is fine for a trace that you run into a target and analyze that target later. But when you want to do something interactively it feels like ages. I’m thinking of the “watch live data” functionality. The Wizard doesn’t let you specify the value for the MAX_DISPATCH_LATENCY parameter (of the CREATE EVENT SESSION command). So, use the “real GUI” or just script for the wizard and add that parameter. Of course, the shorter latency, the higher overhead. So, use with care.

Note that filters and actions are added per event

You can for most cases think of actions as “extra columns” you want to add. Anyhow, XE is flexible in many ways. For instance you might want to filter one event on something, and some other event on something else. Or accept the overhead of collecting one action for one event but not the other. But the wizard doesn’t work this way. The filtering and actions you specify will be applied for all events in the trace. You can of course script from the wizard and modify that script. But the “real GUI” os more true in the sense that you specify filter and actions for each event.

Don’t be afraid to use T-SQL

You can of course automate and customize enormously when you start using T-SQL to a higher degree. As usual, you can script things in SSMS that can serve as a great starting point. Below is an example of a query that “parses” the XML for a ring buffer trace into columns in the result set:

SELECT 
event_data.value('(@name)[1]', 'varchar(50)') AS event_name
,event_data.value('(@timestamp)[1]', 'datetime2(0)') AS time_stamp
,event_data.value('(data[@name="statement"]/value)[1]','nvarchar(50)') AS statement_
,event_data.value('(data[@name="duration"]/value)[1]','bigint') / 1000 AS duration_ms
,event_data.value('(data[@name="cpu_time"]/value)[1]','bigint') /1000 AS cpu_ms
,event_data.value('(data[@name="logical_reads"]/value)[1]','bigint') AS logical_reads
,event_data.value('(data[@name="physical_reads"]/value)[1]','bigint') AS physical_reads
,event_data.value('(data[@name="row_count"]/value)[1]','bigint') AS row_count
,event_data.value('(action[@name="database_name"]/value)[1]','nvarchar(128)') AS database_name_
FROM #myXeData AS evts
CROSS APPLY StatementData.nodes('//RingBufferTarget/event') AS XEventData(event_data)
--ORDER BY time_stamp
ORDER BY duration_ms DESC

Note that I have saved the trace into a temp table. If you don’t, those XQuery operations will take ages.

Etc

I feel I could go on and on. But I’ll stop here. I might add something at a later time, if I come to think of something obvious. Feel free to add your tips in the comments.

Should it be [ola] or [olahallengren]?

I’m (of course 😉 ) talking about the schema that Ola Hallengren plan to move his objects to. If you don’t get what I’m talking about, then visit Ola’s database maintenance solution site here.

Ola is soliciting feedback for the schema naming at Github now, go and express your opinion at: https://github.com/olahallengren/sql-server-maintenance-solution/issues/91.

Agent no longer terminates on error

It is funny how you keep going, assuming things are as they used to be. Sometimes.

I had this thing with SQL Server agent and T-SQL job step types. It used to be that such a job step would just stop if an error occurs. This isn’t how other tools like SSMS or SQLCMD behaves. So to get the behavior that we are used to, we would schedule our SQL using a CmdExec job step and call out to SQLCMD.EXE.

I usually point to Ola Hallengren’s great maintenance scripts as an example of this. I recently noticed that he nowadays uses T-SQL jobstep when he call his stored procedures. That made me curious, and he told me in an email conversation that Agent doesn’t stop on error anymore. Since SQL Server 2012! Hehe, so it has been a while.

I wanted to test this, of course. Not that I doubt Ola, but seeing is believing. So here’s the T-SQL I use in a job step to test this:

RAISERROR('10', 10, 1)
RAISERROR('11', 11, 1)
RAISERROR('12', 12, 1)
RAISERROR('13', 13, 1)
RAISERROR('14', 14, 1)
RAISERROR('15', 15, 1)
RAISERROR('16', 16, 1)
RAISERROR('17', 17, 1)
RAISERROR('18', 18, 1)
PRINT 'Do we get to here?'
SELECT 1/0
PRINT 'What about after division by zero?'

If you run above from SSMS you probably see the messages out of sequence. You don’t see that in the output from the agent job step. I don’t know why that is. I added WITH NOWAIT for the RAISERROR messages (the old trick to flush the output buffer), but that din’t change the order SSMS displays the messages. SQL Server agent do show the messages in the expected order, however, so I won’t dig further in that.

Bottom line is that Agent doesn’t stop on (normal) errors anymore. I tested this on SQL Server 2017, but I don’t doubt Ola when he says that he tested this back to 2012.

Explaining Activity Monitor

This post is not about how to use the Activity Monitor (AM) tool in SQL Server Management Studio (SSMS) – there are loads of such posts written already. Also, it is not about dissing AM, you will find such posts as well.

What I want to do is to explain the information in AM, for instance what time span the information covers. I see lots of confusion about this, and incorrect assumptions can be made because of that. This aspect is typically not mentioned in other blog posts on AM that I have seen.

The SQL Server documentation is very vague on the topic. In fact, the most recent documentation article explaining the information in AM that I found is for SQL Server 2008 R2. And that documentation is at many places vague, or even flat out incorrect. For more recent versions of SQL Server, there’s not even an attempt to explain the information. (Please let me know if you find such official articles.)

I know that lots of people uses the Activity Monitor tool in SQL Server, especially the less experienced DBA. The more experienced DBA often uses other tools like Adam Machanic‘s sp_whoisactive (see this and this) , Brent Ozar’s sp_BlitzFirst, etc.

Say for instance that you had massive amount of I/O for one of your databases for the last day, up until a minute ago. I.e., the I/O load for the database varies a bit, but on average is very high. You look in AM which show this database as silent since you happened to have low I/O the last minute, and AM show some other database as being the one with high load.

So, we need to think about the time dimension here. AM does a refresh at certain intervals. By default it is every 10 seconds, but you can change that by right-clicking somewhere in AM and change in that context menu. Keep this in mind. It is important. We will refer to it as the most recent refresh interval, or snapshot.

I’m using SQL Server Management Studio (SSMS) 2016, and SQL Server 2016. It is possible that other version combination does other things. With the information in this blog post, you will be able to find and determine that for yourself. Please comment if you find important deviations, or perhaps just confirmations (like “SSMS 2012 does the same thing”).

I mainly used tracing to spy on the SQL submitted by AM.

The four top graphs:

 
“% Processor Time” is picked up directly from the OS (using WMI, I believe). Most probably a Performance Monitor counter in the end.


“Database I/O” is the sum of I/O for all database files performed since the last snapshot. This is fine since we intuitively understand that, because we have the trail of prior snapshot values displayed in the graph. The information comes from sys.dm_io_virtual_file_stats, doing a SUM over num_of_bytes_read + num_of_bytes_written, converted to MB.


“Batch Requests/sec”
 is the number of batches we have submitted to our SQL Server since the last snapshot. Again, this is pretty intuitive since we have a trail of snapshot values in the graph. The information is from the performance counter “Batch Requests/sec” picked up from sys.sysperfinfo (bad Microsoft, you should use sys.dm_os_performance_counters 🙂 ).


“Waiting Tasks”
 show how many that are waiting for something (a lock to be released, for instance). This is not as straight-forward as the others. The information comes from sys.dm_os_wait_stats UNION ALL with sys.dm_os_waiting_tasks.

The values are compared to those from the prior snapshot. However, a higher weight in that calculation will be given to the prior snapshot values if you have a short refresh interval. Say that you have a 1-second refresh interval. Then only a weight of 9% is from the current interval and 91% is from the prior interval. Since the current interval value will become the prior value for the next snapshot, a “trail” is kept back in time with a diminishing weight the longer back in time you go.

If you refresh every 10 seconds, then current interval weight is 60% and previous interval weight is 40%. It pretty quickly approaches 100% for current snapshot the longer refresh interval you are using. Hats off to Microsoft for so clearly documenting this in the temporary stored procedures that AM is using. It is in the source code, all you need to do is to grab it in a trace and read it. The name of the procedure is #am_generate_waitstats, and it is created when you open the AM window in SSMS.

Note that not all wait types are represented here. See the section below named ‘The “Resource Waits” pane’ for more information. The “Waiting Tasks” diagram and the “Resource Waits” pane shares some T-SQL code.

The “Processes” pane
This is pretty straight forward so I won’t spend much time on it here. It shows information about the sessions you have at the moment the snapshot is produced. It uses a query joining DMVs such as sys.dm_exec_sessions, sys.dm_exec_requests, sys.dm_os_tasks, etc. Go grab the query in a trace and paste into a query window if you want to dig into it.

The “Resource Waits” pane
The idea here is to show where SQL Server is waiting, “wait stats”.

It uses the same procedure as the “Waiting Tasks” diagram uses, #am_generate_waitstats, to get the information. See the above section for “Waiting Tasks” to understand the time dimension for this. For simplicity, we can say that it shows only wait stats for the past 30-60 seconds. This is important. Imagine that you had loads of a certain wait stats, but none just for the last minute. This pane can now fool you that you didn’t have any waits of that kind, just because you didn’t for the past minute. Note, though, that the “Cumulative Wait Time” column is the sum of wait in the group since SQL Server was re-started or since we last cleared the wait state (DBCC SQLPERF(“sys.dm_os_wait_stats”,CLEAR)).

In an attempt to be friendly, it will group and summarize wait stats into various groups. That would be fine if there were some documentation about which individual wait type is in each group. Also, some wait types are ignored. One of the ignored wait types is CXPACKET, another is THREADPOOL.

AM creates a table named #am_wait_types when you open the AM window, which it populates with various wait types and the group each wait stats is in. This table has a column named “ignore”. The two wait types I mentioned above has 1 in this “ignore” column. There are 35 rows which has 1 for the “ignore” column. To be fair, most are benign but the two which I immediately reacted on are the ones I mentioned above.

But hang on, how many rows do we have in this #am_wait_types table in total?  The answer is 263. Are there more than 263 wait types in 2016? You bet! I did a select from sys.dm_os_wait_stats and I got 875 rows. So, 633 of the wait types in 2016 are not at all considered by AM. That of course begs the question whether I found any interesting wait types that aren’t in #am_wait_types? I didn’t go through them all, but I glanced only quickly and for instance SOS_SCHEDULER_YIELD caught my attention. If you want to go through them, then I highly recommend Paul Randal’s wait types library. If you find anything that stands out, then please post a comment.

SELECT ws.wait_type FROM sys.dm_os_wait_stats AS ws
WHERE ws.wait_type NOT IN(
SELECT wt.wait_type FROM #am_wait_types AS wt WHERE wt.ignore = 0
)
ORDER BY wait_type

The “Data File I/O” pane
This shows I/O activity per database file since the last snapshot. Again, you could for instance have had lots of I/O for a database the last day, but if it was silent the past minute, then this dialog will potentially mislead you.

The “Recent Expensive Queries” pane
This shows the most expensive queries, based on what column you sort on, executed since the last snapshot. If you have, say, a 10 second snapshot interval, you will only see the queries executed during these 10 seconds. AM uses a procedure named #am_get_querystats to collect the information. There are a few things going on inside this procedure, but at the most basic level, it uses sys.dm_exec_query_stats and sys.dm_exec_requests to get queries from cache and currently executing queries. It then does some processing and store the result in temp tables so we later can sort on different columns depending on what metric we are interested in. I suggest that you spend some time with the source code if you want to dig deeper.

The “Active Expensive Queries” pane
This is very straight forward. It executes a query which uses sys.dm_exec_requests joined to a few other DMVs.

How to dig deeper
I thought about including snippets of AM’s source code, the list of wait stats grouping etc here. But I decided against that. It would litter this post, and the code might differ between releases and builds of SSMS. So, if you are interested in digging deeper, fire up your favorite trace tool (Extended Events, Profiler, Server-side trace, …) and catch the SQL submitted by AM.

When you open the AM window, it executes a few batches that creates procedures and tables that it will later use for each refresh interval. Here are the ones that I found:

  • The table #am_wait_types, which contains the wait types that AM bothers about, along with the grouping and which of those that are ignored. This is the one you want to investigate to see which wait types that are ignores by AM; either having 1 in the “ignore” column, or by not being in that table in the first place. Match against sys.dm_os_wait_stats.
  • The procedure #am_generate_waitstats which collects wait stats with some trail back in time, as explained above.
  • The table #am_dbfileio in which file I/O stats is stored.
  • The tables #am_request_countand and #am_fingerprint_stats_snapshots, used for query statistics.
  • The procedure #am_get_querystats, which collects and stores the query statistics.

At each refresh interval, you see 4 T-SQL batches submitted for the top 3 graphs that are T-SQL based (ignoring the “dead” graph that I have in SSMS 2016, and also ignoring “% Processor Time” since it is WMI-based).

If you have expanded the “Processes” pane, you also see a batch that collects that information at every refresh interval.

If you have expanded the “Resource waits” pane, you also see a batch that does a SELECT from the #am_resource_mon_snap table at every refresh interval, with grouping and SUM based on resource type.

If you have expanded the “Data File I/O” pane, you also see a batch that collects that information at every refresh interval.

If you have expanded the “Recent Expensive Queries” pane, you also see a batch that executes the #am_get_querystats procedure to collects that information. It is executed at refresh intervals, but not necessarily at every refresh interval. Check the source code for the procedure and you see that SM will execute this no more frequently than every 15 seconds.

If you have expanded the “Active Expensive Queries” pane, you also see a batch that executes a query to collects that information. It seems to be limited so it doesn’t execute more frequently than every 5 seconds (even with a shorter refresh interval).

The bottom line
As always, with understanding of the data we see, we have a chance to make information out of it. The Activity Monitor certainly has its quirks, but if you do feel like using a GUI for these type of things, I hope that you are better equipped now to interpret what it is you are seeing. Personally, I find “Resource Waits”, “Data File I/O” and “Recent Expensive Queries” less useful because of the time dimension handling. As for expensive queries, IMO nothing beats the Query Store in SQL Server 2016.