Looking for strange (in your SQL Server)

Do you know if you have weird stuff going on in your SQL Server? Perhaps you walk through the errorlog every now and then? (Do you, really? You should!).

But there are lots of thing that doesn’t end up in the errorlog. For these kind of things, I created what I refer to as my KMDFM trace. This is from a tune of theirs, “Looking for strange“.

I decided to have this as an article, on my web-site. The reason is that I think if it as something that may evolve over time. Hopefully I’ll get feedback, with suggestions to refine the trace (adding filters, events, etc).

Needless to say, I’m using Extended Events. Instead of capturing each event, which can get pretty noisy, I use the event_counter target. This just count how many has happened.

You find the article here, with the details: https://karaszi.com/looking-for-strange

Here are some info from that article. In particular, I included below what events I added, and why:

General events

Attention. This fires when a client times out. The client might do a re-try, so then end-user isn’t even aware that this is happening, except that the performance sucks. The time-out period is specified at the client side, in the database API. “Indicates that a cancel operation, client-interrupt request, or broken client connection has occurred. Be aware that cancel operations can also occur as the result of implementing data access driver time-outs.

Auto_stats. If you have lots of auto update statistics going on, then you might want to do something about that. The user submitting a query when auto-update occurs will notice a halt while the update happens (unless set to async, but then the query is ran with inferior statistics). Perhaps you want to run manual update more frequently? Or remove some table/index/stats from the auto-update handling? I had to work quite heavily with filtering for this to ger rid of noise. “Occurs when index and column statistics are automatically updated. This event can be generated multiple times per statistics collection when the update is asynchronous.

Database_file_size_change. The change can be for several reasons. A grow or shrink can be automatic or manual. This should capture all. You should be aware of manual size changes, and they shouldn’t be frequent. But frequent autogrow you might want to handle by manually growing the file or changing the increment. And frequent autosrink is so bad and I doubt I have to explain any further on this. Hopefully you don’t have autosrink turned on for any of your databases. If this happens a lot,. consider adding a filter to only catch shrink operations and/or add a file target. “Occurs when any of the data or log files for a database changes size. Use this event to monitor changes to the size of database files.

Database_started. You probably want to know if databases are created, attached frequently. Not to mention f you have autoclose turned on for any databases and that causes a database to be started. “Database started.“‘

Lock_timeout_greater_than_0. If a programmer specified to get a time-out while waiting for a log, then this can fire. Why not the evant simply named “lock_timeout”? For two reasons: One is that I noticed it fires more than once for each occurrence. And the other reason is that a developer might have deliberately set a timeout of 0, to immediately get an exception, and those you probably don’t want to catch. “Occurs when a request for a lock times out because of a blocking lock that is being held by a different resource. This event is identical to lock_timeout, but excludes cases in which the time-out value is 0, such as when lock probes are used. Time-outs with a duration of 0 do not necessarily indicate a blocking problem.

Long_io_detected. You might have seem messages in the log that an I/O operation took longer than 15 seconds. This event fires when that happens. I didn’t include this in the performance section since 15 seconds is such a ridiculous long time for an I/O operation and something is clearly not normal if this event fires. “Long IO detected

Lock_deadlock. Hopefully this is obvious. If you have lots of deadlocks, then you want to do something about it. Perhaps run a trace and catch the xml_deadlock_report event, and then pass this on to the developers. “Occurs when an attempt to acquire a lock is canceled for the victim of a deadlock.

Lock_escalation. When SQL server escalates a log to a higher level (table), then it first acquired lots is fine-grained locks, then the use the sledge by acquiring a table level lock. You don’t want this to happen too frequently, adding indexes or other measures can reduce lock escalations. “Occurs when a fine-grain lock is converted into a coarser-grain lock, such as when a row lock is converted into an object lock. Use this event to monitor how locks are being used on the server.

I also wanted to have the plan_affecting_convert event, to capture when some covers happens in a WHERE clause so that it makes it impossible to use an index seek. But SQL server seems to include anything with a conversion here, including conversion in the column list. This makes the even, IMO, useless.

Performance related events

Check the trace definitions for these. Some might happen a lot. If you aren’t interested in these, then just skip them. Perhaps you can’t do anything about it in the first place?

Bug in SSMS when using Extended Events

I’d like to share one bug that has been irritating me over a long time. It is very easy to handle, once you know what the problem is. I’ve seen it when I want to open a file XE target:

Here’s the relevant text in the message box, to facilitate searching:

The storage failed to initialize using the provided parameters. (Microsoft.SqlServer.XEventStorage)

Cannot view the function ‘fn_MSXe_read_event_stream’, because it does not exist or you do not have permission. (Microsoft SQL Server, Error: 15151)

The problem is that Object Explorer in SSMS tries to use that function from the current database, not the master database (where the function exists, so to speak). So the solution is to open Object Explorer making sure that the current database for Object Explorer is the master database.

This bug has been reported on UserVoice, April 2016. Go vote for it if you want to add your voice: https://feedback.azure.com/forums/908035-sql-server/suggestions/32902330-cannot-view-the-function-fn-msxe-read-event-strea


Tips for getting started with Extended Events

I know, I know. New habits are hard to learn. Many of us have been using SQL trace and the Profiler GUI for a very long time. And we know that we are supposed to move over to Extended Events (XE), but we postpone it for some later time. And then we give XE a try, and some thing doesn’t work as we want. So we go back to more familiar territories.

But XE has really grown on me over the last few years. I like to share the things that I initially didn’t like with XE, and how I overcame them. And also some other of my tips to make it easier to be productive with XE. I will deliberately minimize showing T-SQL and queries against the XE dynamic management views here. As you use XE more and more, you will probably use T-SQL to a higher degree. But this blog post is for those of you who want to “get into” XE and I find using a GUI is great as a starting point. Assuming the GUI is any good, that is. And I think the SSMS GUI is, for most parts.

Don’t use the ring buffer as target

This is possibly the single most important thing to remember. Some would argue that the limitation for the DMV where you read the data is limited to 4 MB as the reason, and I can appreciate that. But more important is that SSMS won’t parse the data for you. XE is XML based, meaning that when you look at the XE data you get XML. And SSMS does nothing to help you for the ring buffer. It will just show the XML as is!




To the left you have what SSMS show you for the ring buffer and to the right you see what you get when you click the hyperlinked XML: the raw XML.

Use the file as target

So the obvious target would then be to use a file instead:

Note that the wizard to create event session defaults to the ring buffer. So, if you use the wizard, you probably want to change that to a file instead.

Use the targets instead of “watch live data”

Sure, sometimes you want to see things rolling by, as they happens. But saving to a target and looking at that target instead of the live data is more powerful. See the section about sorting etc below.

Show fields as columns

Bu default SSMS only show the name and timestamp fields. But all you have to do is to right-click in a filed at the lower half and select to show that as a column.

Explore sorting, grouping and aggregations

For this you need to look at a trace target, not the live window (except for filtering), but you can display a target even if the trace is currently running. Then just explore and find out that you can easily sort, group and aggregate the data:

Use SSMS to explore events, columns, etc

There are so many things you can capture with XE. But it can also be a but daunting. In my SQL Server 2017 instance, I have 1511 events. Compare that to the 180 you have for the old trace engine! So, XE is definitely richer, but you probably won’t browse through all 1500 events to find whichever could be interesting for the task at hand. So, create an event session using either the wizard to the “real GUI”. Just type letters and the GUI will filter on those, as you type:

Don’t expect events etc to be documented

For the old trace engine, we could read about all events and columns in Books Online. The “documentation” for the XE objects is instead in the DMVs, like for instance sys.dm_xe_objects has a “description” column. You don’t have to use these DMVs directly, though. Follow my above advice and SSMS will pick up those descriptions for you:

Use the “XEvent Profiler” that was added in SSMS 17.3

MS has probably noticed that there is a hesitation to pickup on XE. In SSMS 17.3, they added the XEvent Profiler in SSMS. This is just a quick way to start an XE session with a live data window. Nothing fancy, really, but having it in the “root” of SSMS makes it very snappy to kick of that trace quickly. And you can also customize it. See my earlier blog-post on this.

Explore other targets

The Wizard only allow you to select from two target. The dreaded ring buffer and the file target. But there are other interesting targets. Like histogram, to for instance show how many SQL commands were submitted for each database. Or the event_counter which just counts how any of each events occurred, for instance how many attentions, deadlocks and log escalations have we had. There are other targets as well!

Watch out for that latency

By default you can wait 30 second after an event happened until it has been delivered to the target(s). This is fine for a trace that you run into a target and analyze that target later. But when you want to do something interactively it feels like ages. I’m thinking of the “watch live data” functionality. The Wizard doesn’t let you specify the value for the MAX_DISPATCH_LATENCY parameter (of the CREATE EVENT SESSION command). So, use the “real GUI” or just script for the wizard and add that parameter. Of course, the shorter latency, the higher overhead. So, use with care.

Note that filters and actions are added per event

You can for most cases think of actions as “extra columns” you want to add. Anyhow, XE is flexible in many ways. For instance you might want to filter one event on something, and some other event on something else. Or accept the overhead of collecting one action for one event but not the other. But the wizard doesn’t work this way. The filtering and actions you specify will be applied for all events in the trace. You can of course script from the wizard and modify that script. But the “real GUI” os more true in the sense that you specify filter and actions for each event.

Don’t be afraid to use T-SQL

You can of course automate and customize enormously when you start using T-SQL to a higher degree. As usual, you can script things in SSMS that can serve as a great starting point. Below is an example of a query that “parses” the XML for a ring buffer trace into columns in the result set:

event_data.value('(@name)[1]', 'varchar(50)') AS event_name
,event_data.value('(@timestamp)[1]', 'datetime2(0)') AS time_stamp
,event_data.value('(data[@name="statement"]/value)[1]','nvarchar(50)') AS statement_
,event_data.value('(data[@name="duration"]/value)[1]','bigint') / 1000 AS duration_ms
,event_data.value('(data[@name="cpu_time"]/value)[1]','bigint') /1000 AS cpu_ms
,event_data.value('(data[@name="logical_reads"]/value)[1]','bigint') AS logical_reads
,event_data.value('(data[@name="physical_reads"]/value)[1]','bigint') AS physical_reads
,event_data.value('(data[@name="row_count"]/value)[1]','bigint') AS row_count
,event_data.value('(action[@name="database_name"]/value)[1]','nvarchar(128)') AS database_name_
FROM #myXeData AS evts
CROSS APPLY StatementData.nodes('//RingBufferTarget/event') AS XEventData(event_data)
--ORDER BY time_stamp
ORDER BY duration_ms DESC

Note that I have saved the trace into a temp table. If you don’t, those XQuery operations will take ages.


I feel I could go on and on. But I’ll stop here. I might add something at a later time, if I come to think of something obvious. Feel free to add your tips in the comments.

Explaining Activity Monitor

This post is not about how to use the Activity Monitor (AM) tool in SQL Server Management Studio (SSMS) – there are loads of such posts written already. Also, it is not about dissing AM, you will find such posts as well.

What I want to do is to explain the information in AM, for instance what time span the information covers. I see lots of confusion about this, and incorrect assumptions can be made because of that. This aspect is typically not mentioned in other blog posts on AM that I have seen.

The SQL Server documentation is very vague on the topic. In fact, the most recent documentation article explaining the information in AM that I found is for SQL Server 2008 R2. And that documentation is at many places vague, or even flat out incorrect. For more recent versions of SQL Server, there’s not even an attempt to explain the information. (Please let me know if you find such official articles.)

I know that lots of people uses the Activity Monitor tool in SQL Server, especially the less experienced DBA. The more experienced DBA often uses other tools like Adam Machanic‘s sp_whoisactive (see this and this) , Brent Ozar’s sp_BlitzFirst, etc.

Say for instance that you had massive amount of I/O for one of your databases for the last day, up until a minute ago. I.e., the I/O load for the database varies a bit, but on average is very high. You look in AM which show this database as silent since you happened to have low I/O the last minute, and AM show some other database as being the one with high load.

So, we need to think about the time dimension here. AM does a refresh at certain intervals. By default it is every 10 seconds, but you can change that by right-clicking somewhere in AM and change in that context menu. Keep this in mind. It is important. We will refer to it as the most recent refresh interval, or snapshot.

I’m using SQL Server Management Studio (SSMS) 2016, and SQL Server 2016. It is possible that other version combination does other things. With the information in this blog post, you will be able to find and determine that for yourself. Please comment if you find important deviations, or perhaps just confirmations (like “SSMS 2012 does the same thing”).

I mainly used tracing to spy on the SQL submitted by AM.

The four top graphs:

“% Processor Time” is picked up directly from the OS (using WMI, I believe). Most probably a Performance Monitor counter in the end.

“Database I/O” is the sum of I/O for all database files performed since the last snapshot. This is fine since we intuitively understand that, because we have the trail of prior snapshot values displayed in the graph. The information comes from sys.dm_io_virtual_file_stats, doing a SUM over num_of_bytes_read + num_of_bytes_written, converted to MB.

“Batch Requests/sec”
 is the number of batches we have submitted to our SQL Server since the last snapshot. Again, this is pretty intuitive since we have a trail of snapshot values in the graph. The information is from the performance counter “Batch Requests/sec” picked up from sys.sysperfinfo (bad Microsoft, you should use sys.dm_os_performance_counters 🙂 ).

“Waiting Tasks”
 show how many that are waiting for something (a lock to be released, for instance). This is not as straight-forward as the others. The information comes from sys.dm_os_wait_stats UNION ALL with sys.dm_os_waiting_tasks.

The values are compared to those from the prior snapshot. However, a higher weight in that calculation will be given to the prior snapshot values if you have a short refresh interval. Say that you have a 1-second refresh interval. Then only a weight of 9% is from the current interval and 91% is from the prior interval. Since the current interval value will become the prior value for the next snapshot, a “trail” is kept back in time with a diminishing weight the longer back in time you go.

If you refresh every 10 seconds, then current interval weight is 60% and previous interval weight is 40%. It pretty quickly approaches 100% for current snapshot the longer refresh interval you are using. Hats off to Microsoft for so clearly documenting this in the temporary stored procedures that AM is using. It is in the source code, all you need to do is to grab it in a trace and read it. The name of the procedure is #am_generate_waitstats, and it is created when you open the AM window in SSMS.

Note that not all wait types are represented here. See the section below named ‘The “Resource Waits” pane’ for more information. The “Waiting Tasks” diagram and the “Resource Waits” pane shares some T-SQL code.

The “Processes” pane
This is pretty straight forward so I won’t spend much time on it here. It shows information about the sessions you have at the moment the snapshot is produced. It uses a query joining DMVs such as sys.dm_exec_sessions, sys.dm_exec_requests, sys.dm_os_tasks, etc. Go grab the query in a trace and paste into a query window if you want to dig into it.

The “Resource Waits” pane
The idea here is to show where SQL Server is waiting, “wait stats”.

It uses the same procedure as the “Waiting Tasks” diagram uses, #am_generate_waitstats, to get the information. See the above section for “Waiting Tasks” to understand the time dimension for this. For simplicity, we can say that it shows only wait stats for the past 30-60 seconds. This is important. Imagine that you had loads of a certain wait stats, but none just for the last minute. This pane can now fool you that you didn’t have any waits of that kind, just because you didn’t for the past minute. Note, though, that the “Cumulative Wait Time” column is the sum of wait in the group since SQL Server was re-started or since we last cleared the wait state (DBCC SQLPERF(“sys.dm_os_wait_stats”,CLEAR)).

In an attempt to be friendly, it will group and summarize wait stats into various groups. That would be fine if there were some documentation about which individual wait type is in each group. Also, some wait types are ignored. One of the ignored wait types is CXPACKET, another is THREADPOOL.

AM creates a table named #am_wait_types when you open the AM window, which it populates with various wait types and the group each wait stats is in. This table has a column named “ignore”. The two wait types I mentioned above has 1 in this “ignore” column. There are 35 rows which has 1 for the “ignore” column. To be fair, most are benign but the two which I immediately reacted on are the ones I mentioned above.

But hang on, how many rows do we have in this #am_wait_types table in total?  The answer is 263. Are there more than 263 wait types in 2016? You bet! I did a select from sys.dm_os_wait_stats and I got 875 rows. So, 633 of the wait types in 2016 are not at all considered by AM. That of course begs the question whether I found any interesting wait types that aren’t in #am_wait_types? I didn’t go through them all, but I glanced only quickly and for instance SOS_SCHEDULER_YIELD caught my attention. If you want to go through them, then I highly recommend Paul Randal’s wait types library. If you find anything that stands out, then please post a comment.

SELECT ws.wait_type FROM sys.dm_os_wait_stats AS ws
WHERE ws.wait_type NOT IN(
SELECT wt.wait_type FROM #am_wait_types AS wt WHERE wt.ignore = 0
ORDER BY wait_type

The “Data File I/O” pane
This shows I/O activity per database file since the last snapshot. Again, you could for instance have had lots of I/O for a database the last day, but if it was silent the past minute, then this dialog will potentially mislead you.

The “Recent Expensive Queries” pane
This shows the most expensive queries, based on what column you sort on, executed since the last snapshot. If you have, say, a 10 second snapshot interval, you will only see the queries executed during these 10 seconds. AM uses a procedure named #am_get_querystats to collect the information. There are a few things going on inside this procedure, but at the most basic level, it uses sys.dm_exec_query_stats and sys.dm_exec_requests to get queries from cache and currently executing queries. It then does some processing and store the result in temp tables so we later can sort on different columns depending on what metric we are interested in. I suggest that you spend some time with the source code if you want to dig deeper.

The “Active Expensive Queries” pane
This is very straight forward. It executes a query which uses sys.dm_exec_requests joined to a few other DMVs.

How to dig deeper
I thought about including snippets of AM’s source code, the list of wait stats grouping etc here. But I decided against that. It would litter this post, and the code might differ between releases and builds of SSMS. So, if you are interested in digging deeper, fire up your favorite trace tool (Extended Events, Profiler, Server-side trace, …) and catch the SQL submitted by AM.

When you open the AM window, it executes a few batches that creates procedures and tables that it will later use for each refresh interval. Here are the ones that I found:

  • The table #am_wait_types, which contains the wait types that AM bothers about, along with the grouping and which of those that are ignored. This is the one you want to investigate to see which wait types that are ignores by AM; either having 1 in the “ignore” column, or by not being in that table in the first place. Match against sys.dm_os_wait_stats.
  • The procedure #am_generate_waitstats which collects wait stats with some trail back in time, as explained above.
  • The table #am_dbfileio in which file I/O stats is stored.
  • The tables #am_request_countand and #am_fingerprint_stats_snapshots, used for query statistics.
  • The procedure #am_get_querystats, which collects and stores the query statistics.

At each refresh interval, you see 4 T-SQL batches submitted for the top 3 graphs that are T-SQL based (ignoring the “dead” graph that I have in SSMS 2016, and also ignoring “% Processor Time” since it is WMI-based).

If you have expanded the “Processes” pane, you also see a batch that collects that information at every refresh interval.

If you have expanded the “Resource waits” pane, you also see a batch that does a SELECT from the #am_resource_mon_snap table at every refresh interval, with grouping and SUM based on resource type.

If you have expanded the “Data File I/O” pane, you also see a batch that collects that information at every refresh interval.

If you have expanded the “Recent Expensive Queries” pane, you also see a batch that executes the #am_get_querystats procedure to collects that information. It is executed at refresh intervals, but not necessarily at every refresh interval. Check the source code for the procedure and you see that SM will execute this no more frequently than every 15 seconds.

If you have expanded the “Active Expensive Queries” pane, you also see a batch that executes a query to collects that information. It seems to be limited so it doesn’t execute more frequently than every 5 seconds (even with a shorter refresh interval).

The bottom line
As always, with understanding of the data we see, we have a chance to make information out of it. The Activity Monitor certainly has its quirks, but if you do feel like using a GUI for these type of things, I hope that you are better equipped now to interpret what it is you are seeing. Personally, I find “Resource Waits”, “Data File I/O” and “Recent Expensive Queries” less useful because of the time dimension handling. As for expensive queries, IMO nothing beats the Query Store in SQL Server 2016.

SSMS 17.3 has built-in Profiler

We all know that Profiler and its background functionality SQL Trace has been deprecated since 2012. Right?

And we have all switched to using Extended Events (XE), a long time ago, right?

No, the reality is that many of us still uses the old tracing infrastructure. This can be for various reasons, like:

  • Familiarity
  • Existing templates and stuff
  • Ease of use
  • Lack of functionality in the replacement


I have been gradually switching over to XE over the last few years. And I have been gradually becoming to like it more and more. It is a learning curve, for sure, but it is so superior and even the GUI is in my opinion more powerful if you are a) looking at a saved trace and b) use a trace file instead of the ring buffer. Just the ability to be able to sort, group and aggregate with a few clicks is for me great.

But enough about the general stuff, I’m actually going somewhere with this.

One functionally for which I haven’t been using XE much is to look at live data. I’m talking about scenarios such a “I wonder what happen when you press this button?”. Or, something I very frequently do is to look at the resource usage for SQL command from a trace instead of using commands such as SET STATISTICS IO ON. For these type of tasks, I often just fire up Profiler and use my own Profiler templates.

A very promising new functionality in SQL Server Manager Studio (SSMS) 17.3 is the new XE Profiler. This is a Profiler-like experience built-in to SSMS.

XE Profiler

Just double-click any of the two entries and you have a live trace window. This is built on the SSMS XE “Watch Live Data” functionality. There’s actually no magic going on here. What happens is that SSMS creates a trace session if it doesn’t exist, starts that session and opens a live data window for that trace session. There’s no target for the trace, live data doesn’t require a target. The event sessions that will be created are named:

  • Standard          QuickSessionStandard
  • TSQL               QuickSessionTSQL

The above corresponds to the Profiler templates with the same names. The really cool thing is that you can customize these sessions. I, for once, frequently want to see resource usage for queries. I modify QuickSessionTSQL to grab the completed events instead of the started events. Don’t worry if you mess it up – just delete the trace session and let SSMS re-create it for you next time you open a that Quick Session!

Here’s an example

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name = 'QuickSessionTSQL')
ADD EVENT sqlserver.rpc_completed(
ADD EVENT sqlserver.sql_batch_completed(

When you Launch Session then by default it will show the columns that was relevant for the session definitions that shipped with SSMS. What you might want to do is to re-configure the columns that is shown.

To remove a column: Right-click on the column header that you don’t want to see and remove that column.

To add a column in the display: You can in the lower window, the “Details” section, right-click a field and “Show Column in Table”. Or you can in the column header in the top column header section right-click and “Choose Columns”.
Choose Columns 2Choose Columns

I might over time build some more “alternate” templates for XE Profiler, which I will add here.


Did we have recent autogrow?

I just read the question how to find out when autogrow in tempdb happened. And again I want to push for the default trace. The simple answer is that you already have this information available, just go and get it! (Unless you explicitly disabled the default trace, of course…)

Adjust the file name for your most recent default trace file in below query:

SELECT te.namet.DatabaseNamet.FileNamet.StartTimet.ApplicatioNname
FROM fn_trace_gettable('C:\Program Files\Microsoft SQL Server\MSSQL.3\MSSQL\LOG\log_331.trc', NULL) AS t
INNER JOIN sys.trace_events AS te ON t.EventClass te.trace_event_id
WHERE te.name LIKE '%Auto Grow'

Whats in the default trace?

As you probably know, there’s a trace running by default in SQL Server 2005. The directory for the trace file is the SQL Server log directory, and you can turn off and on this trace with sp_configure.

But how do we find out what events and columns are traced to this? We use a trace function and some trace catalog views:

The function fn_trace_geteventinfo returns what columns and events are captured by a configured trace. But we don’t want to see the column id and event id, we want the names. So we join this to the following functions:

Here’s the end result:

SELECT cat.name AS CategoryName, e.name AS EventName, c.name AS ColumnName
FROM fn_trace_geteventinfo(1) AS rt
INNER JOIN sys.trace_events AS e
ON rt.eventid = e.trace_event_id
INNER JOIN sys.trace_columns AS c
ON rt.columnid = c.trace_column_id
INNER JOIN sys.trace_categories AS cat
ON e.category_id = cat.category_id
ORDER BY CategoryName, EventName, ColumnName

And here’s one with only category and event:

SELECT DISTINCT cat.name AS CategoryName, e.name AS EventName
FROM fn_trace_geteventinfo(1) AS rt
INNER JOIN sys.trace_events AS e
ON rt.eventid = e.trace_event_id
INNER JOIN sys.trace_categories AS cat
ON e.category_id = cat.category_id
ORDER BY CategoryName, EventName