Logging wait stats over time

We all know how valuable wait statistics can be when doing performance analysis. One thing I feel is missing in SQL Server is a trail of various measures, for instance wait stats. I really wish Microsoft could include something in this area, which can be used as a baseline. I recently fount this, from the Tiger Team. It looks promising, but I haven’t implemented it yet.

Anyhow, I have created a script that log wait stats over time. I keep a trail of logged value which is denser – for instance once per minute. And also a trail which is courser, for instance once per day.

The solution has a few tables and then you create a couple of Agent jobs. The logging and clean-up part is methinks pretty much done. The part that need more work is the analysis of the information. I.e., expect to find queries and views added. Suggestions are of course very welcome!

You find the stuff here.

Adjust autogrow setting for your database files

I very frequently see too small autogrow value for database files. Even the default in many cases. The defaults prior to SQL Server 2016 are 1 MB for data files and 10% for log files, adjusted to a little bit better values in 2016. One of the tings I often do when I start working on a SQL Server is to adjust the values to something more reasonable.

I have been wanting to automate this for a long time, but other things got in between. But now I have finally created my sp_dbfiles_autogrow procedure. (I’m sure I’m not the first one, but I wanted my own.) It creates and populates a table with the values to generate, based on the current size for the database file, and end result is that it outputs the ALTER DATABASE commands as text (using PRINT). I.e., it doesn’t really change anything, only generate the commands. You can find the procedure here. All comments and bug findings are of course welcome!

 

Log Page Life Expectancy over time

You often see Page Life Expectancy referred to as an interesting performance monitor counter. And it can be! It indicates for how long a page is expected to stay in cache, from the time it was brought into cache.

But just looking at a snapshot value doesn’t say that much. It might be high, but that is because you haven’t had a high turnover of you pages for the past couple of hours. Or the other way around, you happen to look just after a very rare monster query. Furthermore, having a log can show you how PLE changes over time.

“The app was slow at two o’clock today.”

You look at the PLE log and you see how PLE dropped at that time, which indicates that something was executed that did lots of I/O. Or you don’t see PLE drop at all, and can assume that it probably was a blocking situation instead.

Some of you already have performance monitoring solutions that include PLE logging over time. For you that don’t, I created a very simple solution. It involved a table, a scheduled Agent job and queries to read the data. That’s it.

You find it at http://karaszi.com/log-page-life-expectancy-over-time .

Improvements and comments are of course appreciated. 🙂

Updated sp_indexinfo

It was time to give sp_indexinfo some love. The procedure is meant to be the “ultimate” index information procedure, providing lots of information about all indexes in a database or all indexes for a certain table. Here is what I did in this update:

  • Changed the second query that retrieves missing index information so it generates the index name (based on schema name, table name and column named – limited to 128 characters).
  • Re-arranged and shortened column names to make output more compact and more commonly used column moved to the right.
  • Uncommented some columns that were previously commented. (At least one, filter, has to be commented if you want to run this on 2005.)
  • Added support for columnstore indexes.
  • Decoded the type for columnstore indexes to col-store.

You find the procedure here. 

Wait random number of minutes

Why on earth would you want to do that? you ask. Say you have a job that is scheduled to start at the same time over a number of servers. This might be because you have an SQL Server Master/Target server environment (MSX/TSX) or you quite simply script a job and execute that script on several servers. You probably want to spread the load on your SAN and virtual machine host a bit. This is the exact reason I use this procedure. I frequently use MSX servers and I usually add a job step (executing this procedure) to wait a random number of minutes between 0 and 30.

You find the procedure here.

Analyzing the errorlog

How often do you do this? Look over each message (type) in the errorlog file and determine whether this is something you want to act on. Sure, some (but not all) of you have some monitoring solution in place, but are you 100% confident that it really will notify for all messages that you might find interesting? That there isn’t even one little message hiding in there that you would find valuable knowing about? Or how about messages that you typically don’t are about, but knowing that you have a high frequency can be valuable information?

So, this boils down to actually reading the errorlog file. Some of you probably already have scripts and tool that makes this easier than just reading every simple message from top to bottom. I wanted to share how I do it, and this is why I wrote my Analyze SQL Server logs article. Check it out. And, feedback is always welcome!

List columns where collation doesn’t match database collation

Below script lists all database/table/column where the column collation doesn’t match the database collation. I just wrote it for a migration project and thought I’d share it. I’m sure lots of tings can be improved, but below worked just fine for me for a one-time execution on a number of servers.

IF OBJECT_ID('tempdb..#res') IS NOT NULL DROP TABLE #res
GO

DECLARE
@db sysname
,@sql nvarchar(2000)

CREATE TABLE #res(server_name sysname, db_name sysname, db_collation sysname, table_name sysname, column_name sysname, column_collation sysname)

DECLARE c CURSOR FOR
SELECT name FROM sys.databases WHERE NAME NOT IN('master', 'model', 'tempdb', 'msdb') AND state_desc = 'ONLINE'

OPEN c
WHILE 1 = 1
BEGIN
FETCH NEXT FROM c INTO @db
IF @@FETCH_STATUS <> 0
BREAK
SET @sql =
'SELECT
@@SERVERNAME AS server_name
,''' + @db + ''' AS db_name
,CAST(DATABASEPROPERTYEX(''' + @db + ''', ''Collation'') AS sysname) AS db_collation
,OBJECT_NAME(c.object_id, ' + CAST(DB_ID(@db) AS sysname) + ') AS table_name
,c.name AS column_name
,c.collation_name AS column_collation
FROM ' + QUOTENAME(@db) + '.sys.columns AS c
INNER JOIN ' + QUOTENAME(@db) + '.sys.tables AS t ON t.object_id = c.object_id
WHERE t.type = ''U''
AND c.collation_name IS NOT NULL
AND c.collation_name <> CAST(DATABASEPROPERTYEX(''' + @db + ''', ''Collation'') AS sysname)
'
--PRINT @sql
INSERT INTO #res
EXEC(@sql)
END
CLOSE c
DEALLOCATE c
SELECT * FROM #res

Make sure you play with data collector on a virtual machine

I’m in a situation where I have configured the new data collector functionality for three instances. And there’s no way to undo the config performed by the wizard! It cannot be undone by the wizard, and BOL doesn’t have information on how to do this. In fact, I suspect that you in the end need to use some of the undocumented data collector procedures to get rid of this configuration (like sp_syscollector_delete_jobs).

I’m not knocking data collector per se – it seems like a great way to get a baseline going etc. But my tip is that while you are playing with it in order to understand it – do it virtually.

Lara has reported this on connect, btw: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=334180