Make sure you update the statistics

You might know that statistics can have a high impact on performance. As you add, remove and modify data, the statistics will be more and more outdated.

Sure, there’s the database option that updates statistics automatically, but it is a bit … rough. For a table with 10,000,000 rows, you have to modify 2,000,000 rows prior to 2016 (20%) and 100,000 rows as of 2016 with db compat level 2016 (SQRT(1000*@rows)). Also, when auto-update stats kicks in, it will sample the underlying data, in order for it not to take too long etc.

Many of us use Ola Hallengren’s excellent maintenance solution. I’m sure that some of us have our own favorite modifications we do to the jobs that the installation scrips creates for us. My point here is that the installation script do not update statistics by default. Here’s how job step in the “IndexOptimize – USER_DATABASES” job look like on my SQL Server 2017 instance. All by default:

EXECUTE [dbo].[IndexOptimize]
@Databases = 'USER_DATABASES',
@LogToTable = 'Y'

The default value for the @UpdateStatistics parameter is NULL which means “Do not perform statistics maintenance” See this page for reference.

So, by using the defaults, you end up defragmenting your indexes (which might not give you that much nowadays, considering that there isn’t that much of a difference between random and sequential I/O on modern disk subsystems). For some of the indexes that you defragment, you will get new statistics as a by-product. Those are the indexes that you rebuild – rebuild is internally creating a new index and the dropping the old one. But you likely have many indexes that don’t reach the 30% fragmentation threshold. And consider indexes over a key which is steadily increasing or decreasing. Inserting new rows will not cause fragmentation, but the statistics will become out-of-date.

What to do? Easy, just add a new job in which you call the IndexOptimize procedure with options to update all statistics where at least one row has been modified. Ola even has such example on his web-site, Example D. Here it is, I just added a parameter to log the commands to the CommandLog table:

EXECUTE dbo.IndexOptimize
@Databases = 'USER_DATABASES',
@FragmentationLow = NULL,
@FragmentationMedium = NULL,
@FragmentationHigh = NULL,
@UpdateStatistics = 'ALL',
@OnlyModifiedStatistics = 'Y',
@LogToTable = 'Y'

Schedule above as you wish. I prefer to do it every early morning if I can. But your circumstances like database size, how the data is modified etc will influence the frequency.

Another option is toe modify the existing “IndexOptimize – USER_DATABASES” job and just add below. This will both defrag your indexes and also update statistics.

@UpdateStatistics = 'ALL'

What about Maintenance Plans, you might think? Well, we all know that they aren’t that … smart – which is why we use scripts like Ola’s in the first place. Regarding statistics updates, they will update all stats, regardless of whether any rows at all has been modified since last time. It is a complete waste of resources to update stats if nothing has changed. Even sp_updatestats is smarter in this regard.

Note: This is in no way a criticism of the maintenance solution that Ola provides. He has no way of knowing our requirements in the job he creates. If the installation script could read our minds, I’m sure that it would schedule the jobs for us as well. OTOH, if Ola could create a T-SQL installation script that could read our minds, then he probably would do something else in the first place. 🙂

Logging wait stats over time

We all know how valuable wait statistics can be when doing performance analysis. One thing I feel is missing in SQL Server is a trail of various measures, for instance wait stats. I really wish Microsoft could include something in this area, which can be used as a baseline. I recently fount this, from the Tiger Team. It looks promising, but I haven’t implemented it yet.

Anyhow, I have created a script that log wait stats over time. I keep a trail of logged value which is denser – for instance once per minute. And also a trail which is courser, for instance once per day.

The solution has a few tables and then you create a couple of Agent jobs. The logging and clean-up part is methinks pretty much done. The part that need more work is the analysis of the information. I.e., expect to find queries and views added. Suggestions are of course very welcome!

You find the stuff here.

Adjust autogrow setting for your database files

I very frequently see too small autogrow value for database files. Even the default in many cases. The defaults prior to SQL Server 2016 are 1 MB for data files and 10% for log files, adjusted to a little bit better values in 2016. One of the tings I often do when I start working on a SQL Server is to adjust the values to something more reasonable.

I have been wanting to automate this for a long time, but other things got in between. But now I have finally created my sp_dbfiles_autogrow procedure. (I’m sure I’m not the first one, but I wanted my own.) It creates and populates a table with the values to generate, based on the current size for the database file, and end result is that it outputs the ALTER DATABASE commands as text (using PRINT). I.e., it doesn’t really change anything, only generate the commands. You can find the procedure here. All comments and bug findings are of course welcome!

 

Log Page Life Expectancy over time

You often see Page Life Expectancy referred to as an interesting performance monitor counter. And it can be! It indicates for how long a page is expected to stay in cache, from the time it was brought into cache.

But just looking at a snapshot value doesn’t say that much. It might be high, but that is because you haven’t had a high turnover of you pages for the past couple of hours. Or the other way around, you happen to look just after a very rare monster query. Furthermore, having a log can show you how PLE changes over time.

“The app was slow at two o’clock today.”

You look at the PLE log and you see how PLE dropped at that time, which indicates that something was executed that did lots of I/O. Or you don’t see PLE drop at all, and can assume that it probably was a blocking situation instead.

Some of you already have performance monitoring solutions that include PLE logging over time. For you that don’t, I created a very simple solution. It involved a table, a scheduled Agent job and queries to read the data. That’s it.

You find it at http://karaszi.com/log-page-life-expectancy-over-time .

Improvements and comments are of course appreciated. 🙂

Updated sp_indexinfo

It was time to give sp_indexinfo some love. The procedure is meant to be the “ultimate” index information procedure, providing lots of information about all indexes in a database or all indexes for a certain table. Here is what I did in this update:

  • Changed the second query that retrieves missing index information so it generates the index name (based on schema name, table name and column named – limited to 128 characters).
  • Re-arranged and shortened column names to make output more compact and more commonly used column moved to the right.
  • Uncommented some columns that were previously commented. (At least one, filter, has to be commented if you want to run this on 2005.)
  • Added support for columnstore indexes.
  • Decoded the type for columnstore indexes to col-store.

You find the procedure here. 

Wait random number of minutes

Why on earth would you want to do that? you ask. Say you have a job that is scheduled to start at the same time over a number of servers. This might be because you have an SQL Server Master/Target server environment (MSX/TSX) or you quite simply script a job and execute that script on several servers. You probably want to spread the load on your SAN and virtual machine host a bit. This is the exact reason I use this procedure. I frequently use MSX servers and I usually add a job step (executing this procedure) to wait a random number of minutes between 0 and 30.

You find the procedure here.

Analyzing the errorlog

How often do you do this? Look over each message (type) in the errorlog file and determine whether this is something you want to act on. Sure, some (but not all) of you have some monitoring solution in place, but are you 100% confident that it really will notify for all messages that you might find interesting? That there isn’t even one little message hiding in there that you would find valuable knowing about? Or how about messages that you typically don’t are about, but knowing that you have a high frequency can be valuable information?

So, this boils down to actually reading the errorlog file. Some of you probably already have scripts and tool that makes this easier than just reading every simple message from top to bottom. I wanted to share how I do it, and this is why I wrote my Analyze SQL Server logs article. Check it out. And, feedback is always welcome!

List columns where collation doesn’t match database collation

Below script lists all database/table/column where the column collation doesn’t match the database collation. I just wrote it for a migration project and thought I’d share it. I’m sure lots of tings can be improved, but below worked just fine for me for a one-time execution on a number of servers.

IF OBJECT_ID('tempdb..#res') IS NOT NULL DROP TABLE #res
GO

DECLARE
@db sysname
,@sql nvarchar(2000)

CREATE TABLE #res(server_name sysname, db_name sysname, db_collation sysname, table_name sysname, column_name sysname, column_collation sysname)

DECLARE c CURSOR FOR
SELECT name FROM sys.databases WHERE NAME NOT IN('master', 'model', 'tempdb', 'msdb') AND state_desc = 'ONLINE'

OPEN c
WHILE 1 = 1
BEGIN
FETCH NEXT FROM c INTO @db
IF @@FETCH_STATUS <> 0
BREAK
SET @sql =
'SELECT
@@SERVERNAME AS server_name
,''' + @db + ''' AS db_name
,CAST(DATABASEPROPERTYEX(''' + @db + ''', ''Collation'') AS sysname) AS db_collation
,OBJECT_NAME(c.object_id, ' + CAST(DB_ID(@db) AS sysname) + ') AS table_name
,c.name AS column_name
,c.collation_name AS column_collation
FROM ' + QUOTENAME(@db) + '.sys.columns AS c
INNER JOIN ' + QUOTENAME(@db) + '.sys.tables AS t ON t.object_id = c.object_id
WHERE t.type = ''U''
AND c.collation_name IS NOT NULL
AND c.collation_name <> CAST(DATABASEPROPERTYEX(''' + @db + ''', ''Collation'') AS sysname)
'
--PRINT @sql
INSERT INTO #res
EXEC(@sql)
END
CLOSE c
DEALLOCATE c
SELECT * FROM #res