Explaining Activity Monitor

This post is not about how to use the Activity Monitor (AM) tool in SQL Server Management Studio (SSMS) – there are loads of such posts written already. Also, it is not about dissing AM, you will find such posts as well.

What I want to do is to explain the information in AM, for instance what time span the information covers. I see lots of confusion about this, and incorrect assumptions can be made because of that. This aspect is typically not mentioned in other blog posts on AM that I have seen.

The SQL Server documentation is very vague on the topic. In fact, the most recent documentation article explaining the information in AM that I found is for SQL Server 2008 R2. And that documentation is at many places vague, or even flat out incorrect. For more recent versions of SQL Server, there’s not even an attempt to explain the information. (Please let me know if you find such official articles.)

I know that lots of people uses the Activity Monitor tool in SQL Server, especially the less experienced DBA. The more experienced DBA often uses other tools like Adam Machanic‘s sp_whoisactive (see this and this) , Brent Ozar’s sp_BlitzFirst, etc.

Say for instance that you had massive amount of I/O for one of your databases for the last day, up until a minute ago. I.e., the I/O load for the database varies a bit, but on average is very high. You look in AM which show this database as silent since you happened to have low I/O the last minute, and AM show some other database as being the one with high load.

So, we need to think about the time dimension here. AM does a refresh at certain intervals. By default it is every 10 seconds, but you can change that by right-clicking somewhere in AM and change in that context menu. Keep this in mind. It is important. We will refer to it as the most recent refresh interval, or snapshot.

I’m using SQL Server Management Studio (SSMS) 2016, and SQL Server 2016. It is possible that other version combination does other things. With the information in this blog post, you will be able to find and determine that for yourself. Please comment if you find important deviations, or perhaps just confirmations (like “SSMS 2012 does the same thing”).

I mainly used tracing to spy on the SQL submitted by AM.

The four top graphs:

 
“% Processor Time” is picked up directly from the OS (using WMI, I believe). Most probably a Performance Monitor counter in the end.


“Database I/O” is the sum of I/O for all database files performed since the last snapshot. This is fine since we intuitively understand that, because we have the trail of prior snapshot values displayed in the graph. The information comes from sys.dm_io_virtual_file_stats, doing a SUM over num_of_bytes_read + num_of_bytes_written, converted to MB.


“Batch Requests/sec”
 is the number of batches we have submitted to our SQL Server since the last snapshot. Again, this is pretty intuitive since we have a trail of snapshot values in the graph. The information is from the performance counter “Batch Requests/sec” picked up from sys.sysperfinfo (bad Microsoft, you should use sys.dm_os_performance_counters 🙂 ).


“Waiting Tasks”
 show how many that are waiting for something (a lock to be released, for instance). This is not as straight-forward as the others. The information comes from sys.dm_os_wait_stats UNION ALL with sys.dm_os_waiting_tasks.

The values are compared to those from the prior snapshot. However, a higher weight in that calculation will be given to the prior snapshot values if you have a short refresh interval. Say that you have a 1-second refresh interval. Then only a weight of 9% is from the current interval and 91% is from the prior interval. Since the current interval value will become the prior value for the next snapshot, a “trail” is kept back in time with a diminishing weight the longer back in time you go.

If you refresh every 10 seconds, then current interval weight is 60% and previous interval weight is 40%. It pretty quickly approaches 100% for current snapshot the longer refresh interval you are using. Hats off to Microsoft for so clearly documenting this in the temporary stored procedures that AM is using. It is in the source code, all you need to do is to grab it in a trace and read it. The name of the procedure is #am_generate_waitstats, and it is created when you open the AM window in SSMS.

Note that not all wait types are represented here. See the section below named ‘The “Resource Waits” pane’ for more information. The “Waiting Tasks” diagram and the “Resource Waits” pane shares some T-SQL code.

The “Processes” pane
This is pretty straight forward so I won’t spend much time on it here. It shows information about the sessions you have at the moment the snapshot is produced. It uses a query joining DMVs such as sys.dm_exec_sessions, sys.dm_exec_requests, sys.dm_os_tasks, etc. Go grab the query in a trace and paste into a query window if you want to dig into it.

The “Resource Waits” pane
The idea here is to show where SQL Server is waiting, “wait stats”.

It uses the same procedure as the “Waiting Tasks” diagram uses, #am_generate_waitstats, to get the information. See the above section for “Waiting Tasks” to understand the time dimension for this. For simplicity, we can say that it shows only wait stats for the past 30-60 seconds. This is important. Imagine that you had loads of a certain wait stats, but none just for the last minute. This pane can now fool you that you didn’t have any waits of that kind, just because you didn’t for the past minute. Note, though, that the “Cumulative Wait Time” column is the sum of wait in the group since SQL Server was re-started or since we last cleared the wait state (DBCC SQLPERF(“sys.dm_os_wait_stats”,CLEAR)).

In an attempt to be friendly, it will group and summarize wait stats into various groups. That would be fine if there were some documentation about which individual wait type is in each group. Also, some wait types are ignored. One of the ignored wait types is CXPACKET, another is THREADPOOL.

AM creates a table named #am_wait_types when you open the AM window, which it populates with various wait types and the group each wait stats is in. This table has a column named “ignore”. The two wait types I mentioned above has 1 in this “ignore” column. There are 35 rows which has 1 for the “ignore” column. To be fair, most are benign but the two which I immediately reacted on are the ones I mentioned above.

But hang on, how many rows do we have in this #am_wait_types table in total?  The answer is 263. Are there more than 263 wait types in 2016? You bet! I did a select from sys.dm_os_wait_stats and I got 875 rows. So, 633 of the wait types in 2016 are not at all considered by AM. That of course begs the question whether I found any interesting wait types that aren’t in #am_wait_types? I didn’t go through them all, but I glanced only quickly and for instance SOS_SCHEDULER_YIELD caught my attention. If you want to go through them, then I highly recommend Paul Randal’s wait types library. If you find anything that stands out, then please post a comment.

SELECT ws.wait_type FROM sys.dm_os_wait_stats AS ws
WHERE ws.wait_type NOT IN(
SELECT wt.wait_type FROM #am_wait_types AS wt WHERE wt.ignore = 0
)
ORDER BY wait_type

The “Data File I/O” pane
This shows I/O activity per database file since the last snapshot. Again, you could for instance have had lots of I/O for a database the last day, but if it was silent the past minute, then this dialog will potentially mislead you.

The “Recent Expensive Queries” pane
This shows the most expensive queries, based on what column you sort on, executed since the last snapshot. If you have, say, a 10 second snapshot interval, you will only see the queries executed during these 10 seconds. AM uses a procedure named #am_get_querystats to collect the information. There are a few things going on inside this procedure, but at the most basic level, it uses sys.dm_exec_query_stats and sys.dm_exec_requests to get queries from cache and currently executing queries. It then does some processing and store the result in temp tables so we later can sort on different columns depending on what metric we are interested in. I suggest that you spend some time with the source code if you want to dig deeper.

The “Active Expensive Queries” pane
This is very straight forward. It executes a query which uses sys.dm_exec_requests joined to a few other DMVs.

How to dig deeper
I thought about including snippets of AM’s source code, the list of wait stats grouping etc here. But I decided against that. It would litter this post, and the code might differ between releases and builds of SSMS. So, if you are interested in digging deeper, fire up your favorite trace tool (Extended Events, Profiler, Server-side trace, …) and catch the SQL submitted by AM.

When you open the AM window, it executes a few batches that creates procedures and tables that it will later use for each refresh interval. Here are the ones that I found:

  • The table #am_wait_types, which contains the wait types that AM bothers about, along with the grouping and which of those that are ignored. This is the one you want to investigate to see which wait types that are ignores by AM; either having 1 in the “ignore” column, or by not being in that table in the first place. Match against sys.dm_os_wait_stats.
  • The procedure #am_generate_waitstats which collects wait stats with some trail back in time, as explained above.
  • The table #am_dbfileio in which file I/O stats is stored.
  • The tables #am_request_countand and #am_fingerprint_stats_snapshots, used for query statistics.
  • The procedure #am_get_querystats, which collects and stores the query statistics.

At each refresh interval, you see 4 T-SQL batches submitted for the top 3 graphs that are T-SQL based (ignoring the “dead” graph that I have in SSMS 2016, and also ignoring “% Processor Time” since it is WMI-based).

If you have expanded the “Processes” pane, you also see a batch that collects that information at every refresh interval.

If you have expanded the “Resource waits” pane, you also see a batch that does a SELECT from the #am_resource_mon_snap table at every refresh interval, with grouping and SUM based on resource type.

If you have expanded the “Data File I/O” pane, you also see a batch that collects that information at every refresh interval.

If you have expanded the “Recent Expensive Queries” pane, you also see a batch that executes the #am_get_querystats procedure to collects that information. It is executed at refresh intervals, but not necessarily at every refresh interval. Check the source code for the procedure and you see that SM will execute this no more frequently than every 15 seconds.

If you have expanded the “Active Expensive Queries” pane, you also see a batch that executes a query to collects that information. It seems to be limited so it doesn’t execute more frequently than every 5 seconds (even with a shorter refresh interval).

The bottom line
As always, with understanding of the data we see, we have a chance to make information out of it. The Activity Monitor certainly has its quirks, but if you do feel like using a GUI for these type of things, I hope that you are better equipped now to interpret what it is you are seeing. Personally, I find “Resource Waits”, “Data File I/O” and “Recent Expensive Queries” less useful because of the time dimension handling. As for expensive queries, IMO nothing beats the Query Store in SQL Server 2016.

SSMS 17.3 has built-in Profiler

We all know that Profiler and its background functionality SQL Trace has been deprecated since 2012. Right?

And we have all switched to using Extended Events (XE), a long time ago, right?

No, the reality is that many of us still uses the old tracing infrastructure. This can be for various reasons, like:

  • Familiarity
  • Existing templates and stuff
  • Ease of use
  • Lack of functionality in the replacement

 

I have been gradually switching over to XE over the last few years. And I have been gradually becoming to like it more and more. It is a learning curve, for sure, but it is so superior and even the GUI is in my opinion more powerful if you are a) looking at a saved trace and b) use a trace file instead of the ring buffer. Just the ability to be able to sort, group and aggregate with a few clicks is for me great.

But enough about the general stuff, I’m actually going somewhere with this.

One functionally for which I haven’t been using XE much is to look at live data. I’m talking about scenarios such a “I wonder what happen when you press this button?”. Or, something I very frequently do is to look at the resource usage for SQL command from a trace instead of using commands such as SET STATISTICS IO ON. For these type of tasks, I often just fire up Profiler and use my own Profiler templates.

A very promising new functionality in SQL Server Manager Studio (SSMS) 17.3 is the new XE Profiler. This is a Profiler-like experience built-in to SSMS.

XE Profiler

Just double-click any of the two entries and you have a live trace window. This is built on the SSMS XE “Watch Live Data” functionality. There’s actually no magic going on here. What happens is that SSMS creates a trace session if it doesn’t exist, starts that session and opens a live data window for that trace session. There’s no target for the trace, live data doesn’t require a target. The event sessions that will be created are named:

  • Standard          QuickSessionStandard
  • TSQL               QuickSessionTSQL

The above corresponds to the Profiler templates with the same names. The really cool thing is that you can customize these sessions. I, for once, frequently want to see resource usage for queries. I modify QuickSessionTSQL to grab the completed events instead of the started events. Don’t worry if you mess it up – just delete the trace session and let SSMS re-create it for you next time you open a that Quick Session!

Here’s an example

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name = 'QuickSessionTSQL')
DROP EVENT SESSION QuickSessionTSQL ON SERVER
GO
CREATE EVENT SESSION QuickSessionTSQL ON SERVER
ADD EVENT sqlserver.rpc_completed(
ACTION(sqlserver.client_app_name,sqlserver.database_id,sqlserver.database_name,sqlserver.server_principal_name,sqlserver.session_id)),
ADD EVENT sqlserver.sql_batch_completed(
ACTION(sqlserver.client_app_name,sqlserver.database_id,sqlserver.database_name,sqlserver.server_principal_name,sqlserver.session_id))
WITH (MAX_MEMORY=8192 KB, MAX_DISPATCH_LATENCY=5 SECONDS, MAX_EVENT_SIZE=0 KB, MEMORY_PARTITION_MODE=PER_CPU)

When you Launch Session then by default it will show the columns that was relevant for the session definitions that shipped with SSMS. What you might want to do is to re-configure the columns that is shown.

To remove a column: Right-click on the column header that you don’t want to see and remove that column.

To add a column in the display: You can in the lower window, the “Details” section, right-click a field and “Show Column in Table”. Or you can in the column header in the top column header section right-click and “Choose Columns”.
Choose Columns 2Choose Columns

I might over time build some more “alternate” templates for XE Profiler, which I will add here.



        

Local Books Online, as of 2017 August

This seems to be a never ending story, having a locally installed Books Online for SQL Server, that is. Some of you recall that I have posted about this before. This post is about my experiences for the current state of Books Online. Things has changed since I last blogged this. Among other things, MS has re-worked the documentation, including moving the online bit from MSDN to docs.

Background: I have a new machine, and with that I of course want to have a local Books Online for SQL Server. Following any of my previous post didn’t quite work. Here is what I installed:

  • Windows 10, fresh installation
  • SQL Server 2016 (Db engine, SSIS, SSRS)
  • Patch above to current service pack and CU.
  • SQL Server Data Tools – BI
  • SSMS 17.2
  • Visual Studio 2017

There are of course other stuff on the machine as well, but I doubt that affect the SQL Server BOL experience. Here’s what I did to get local BOL inside SSMS:

  1. Help
  2. Add and Remove Help Content
  3. Under “SQL Server” header, I just added the bits that I found interesting. In my case everything.
  4. Update
  5. Now it took a long while, since I missed the VS installation a monster-sized doc for Framework. I later removed the VS bits.
  6. I was also afraid that I experienced a bug in Help Viewer, where it would hang. It took a long while with status something like removing temp files. But it ended just fine, I didn’t have to do the “kill, update help config file, and stuff”, like I did before.

Bottom line is that it worked just as I expected, or perhaps I should say “hoped for”. 🙂
(Well, almost. Some entries are doubled. I don’t actually bothered (much) about that so I just ignore that. For instance under “SQL Server Technical Documentation” I have two of both “Overview” and “Release Notes”.)

What more? Well, we probably want local BOL to be the “F1”-help:

  1. Help
  2. Set Help Preference
  3. Launch in Help Viewer

Also, your might want to create a shortcut (on desktop, start menu, or wherever), which points to:
C:\Program Files (x86)\Microsoft Help Viewer\v2.2\HlpViewer.exe” /catalogName SSMS17 /locale en-US

For reference, here are two prior posts on the subject. But don’t go there, they will confuse you. I only list them here as a reference, and *in case* you might end up with a hung help viewer etc…
Bol 2016 blog 1
Bol 2016 blog 2

Logging wait stats over time

We all know how valuable wait statistics can be when doing performance analysis. One thing I feel is missing in SQL Server is a trail of various measures, for instance wait stats. I really wish Microsoft could include something in this area, which can be used as a baseline. I recently fount this, from the Tiger Team. It looks promising, but I haven’t implemented it yet.

Anyhow, I have created a script that log wait stats over time. I keep a trail of logged value which is denser – for instance once per minute. And also a trail which is courser, for instance once per day.

The solution has a few tables and then you create a couple of Agent jobs. The logging and clean-up part is methinks pretty much done. The part that need more work is the analysis of the information. I.e., expect to find queries and views added. Suggestions are of course very welcome!

You find the stuff here.

Adjust autogrow setting for your database files

I very frequently see too small autogrow value for database files. Even the default in many cases. The defaults prior to SQL Server 2016 are 1 MB for data files and 10% for log files, adjusted to a little bit better values in 2016. One of the tings I often do when I start working on a SQL Server is to adjust the values to something more reasonable.

I have been wanting to automate this for a long time, but other things got in between. But now I have finally created my sp_dbfiles_autogrow procedure. (I’m sure I’m not the first one, but I wanted my own.) It creates and populates a table with the values to generate, based on the current size for the database file, and end result is that it outputs the ALTER DATABASE commands as text (using PRINT). I.e., it doesn’t really change anything, only generate the commands. You can find the procedure here. All comments and bug findings are of course welcome!

 

Keep more history for your Agent jobs

SQL Server Agent is in my opinion way too restrictive when it removes old job history information.

This information is store in the dbo.sysjobhistory table in the msdb database. Unfortunately, Agent will by default only keep 1000 rows in this table. I have on many occasions wanted to for instance check last execution of the weekly job. Or the trend for the weekly job, like how ling it takes to execute. Only to find that the information is no longer there.

I prefer to expand this to 50000 rows in total and 200 rows per job.

You can do this in the GUI: right-click Agent, Properties, History. Or just execute below SQL command on your SQL Servers:

EXEC msdb.dbo.sp_set_sqlagent_properties
@jobhistory_max_rows=50000
,@jobhistory_max_rows_per_job=200

Log Page Life Expectancy over time

You often see Page Life Expectancy referred to as an interesting performance monitor counter. And it can be! It indicates for how long a page is expected to stay in cache, from the time it was brought into cache.

But just looking at a snapshot value doesn’t say that much. It might be high, but that is because you haven’t had a high turnover of you pages for the past couple of hours. Or the other way around, you happen to look just after a very rare monster query. Furthermore, having a log can show you how PLE changes over time.

“The app was slow at two o’clock today.”

You look at the PLE log and you see how PLE dropped at that time, which indicates that something was executed that did lots of I/O. Or you don’t see PLE drop at all, and can assume that it probably was a blocking situation instead.

Some of you already have performance monitoring solutions that include PLE logging over time. For you that don’t, I created a very simple solution. It involved a table, a scheduled Agent job and queries to read the data. That’s it.

You find it at http://karaszi.com/log-page-life-expectancy-over-time .

Improvements and comments are of course appreciated. 🙂

Update for Books Online with functioning index (2016)

Good news, everybody! Microsoft recently released an update for the local Books Online, with a functioning index for the relational engine.

Earlier versions were missing those index bits, but not for everyone (a theory is that it depends on what server you hit when you downloaded the BOL bits). Basic steps to get them (see my earlier blog, linked to at the end of this, for details):

  • SSMS 2016
  • Help
  • Add and Remove Help Content. This opens Help Viewer 2.2 (HW).
  • If you have the old BOL bits, you should see that there is an update:
    • Remove them
    • Update
    • HW might hang at the end, force-finish it.
  • Add the relevant BOL bits (under Recommended Documentation), the ones under “SQL Server” is for 2012.
  • Update
  • HW might hang at the end, force-finish it.

There are other issues with BOL to hopefully be addressed. See my earlier blog post for details.

July SSMS 2016 changes transaction semantics

This can be a very nasty one if you’re not observant. The July update of SSMS now has the option to change XACT_ABORT setting for a query window using

  • Tools
  • Options
  • Query Execution
  • SQL Server
  • Advanced
  • SET XACT_ABORT
Now, exposing one more SET option can’t be bad, right? But Erland Sommarskog found out that this is checked by default. Now, using XACT_ABORT might now be a bad thing, au contraire. See this one of Erland’s series of error handle articles.
The problem is that it is a change. Quite of a sudden, your transaction semantics isn’t the same, and if you execute that script from somewhere else, or even an older SSMS, it doesn’t behave the same. For instance, if you don’t use TRY CATCH (which you should be using), then the batch will now stop and rollback on most errors, instead of allowing you to do the old eighties style IF @@ERROR <> 0 dance.
Personally, I SET my set commands instead of checking a checkbox in SSMS. But I don’t SET all set options there are, just the ones I want to change from default. I hope that this one can be reverted so it isn’t checked by default.

Books Online for SQL Server 2016

This one is for those of you who prefer to use a local SQL Server Documentation, a.k.a. Books Online (BOL), instead of using the web-based BOL. A local BOL is essential for me. Navigating the contents, switching between pages and searching is lightyears ahead and quicker in the local BOL.

(Update: Added bits about the missing index.)

(Update 2: there now is a functioning index.)

Erland Sommarskog (http://www.sommarskog.se/) is possibly even more adamant that I am about having a local BOL and he has been very helpful. Also thanks to Carla Sabotta and Jennifer Hubbard for the help and patience.

Getting a functional BOL for SQL Server 2016 is a bit more complicated than for 2014. In this earlier blog post I discussed how to get a proper Table of Contents for the 2014 BOL.

Some basics about the local BOL (from now on I call this BOL):

  • It uses HelpViewer.
  •    SQL Server 2012 and 2014 used HelpViewer 1.1.
  •    SQL server 2016 uses HelpViewer 2.2.
  • F1 from within SSMS will always take you to the web based BOL.
  • When you install the SQL Server 2016 engine (not SSMS) you get two program entries. Ignore these:
  •    “SQL Server Documentation” opens the old HelpViewer 1.1. There is no 2016 BOL for HelpViewer 1.1.
  •    “Manage Help Settings” takes us to the old Help Library Manager, which is for HelpViewer 1.1.

There is no stand-alone download for the 2016 books (which you would unpack on your harddrive, as we had to do for BOL 2014). You need to do it from inside the HelpViewer 2.2 program. Here’s how to do it:

From inside SQL Server Management Studio (SSMS):

  • “Help”
  • “Add and Remove Help Content”. This opens HelpViewer 2.2.
  • Make sure that “Online” is selected.
  • Add the SQL Server 2016 parts. Don’t do what I did and look for a “SQL Server 2016” header in bold typeface. The 2016 parts is right under your nose, under “Recommended Documentation”.
  • “Update”

Unfortunately there is currently a bug in HelpViewer 2.2 and it will freeze. See https://msdn.microsoft.com/library/mt654096.aspx on how to handle this. You only need to do these steps once, but you will probably need to end HelpViewer in the Task Manager whenever you update your content.

Now you should have the 2016 BOL. Selecting “Help”, “View Help” from SSMS will still open the web-based BOL, though (a web-browser inside SSMS). So to open the local BOL 2016 from inside SSMS, you will use:

  • “Help”
  • “Add and Remove Help Content”. (Yes, the same that you used to add the books.)

(Update) Note that the index isn’t there for the relational database engine, including T-SQL reference. If you for instance in the index type GROUP BY or @@SERVERNAME you will either get no hits, or be taken to the SQL Server 2012 documentation assuming you also installed that). We’ll see if MS will build an index for the relational database topics. My contacts at MS say that they do have an index, where I don’t. What do you see? Please let us know below if you, after installed BOL 2016 according to this blog post has a working index for the relational database engine. For instance if you get a 2016 hit for GROUP BY.(/Update)

(Update 2): I today, 2016-09-05, noticed that MS recently released an update for BOL 2016 and the index is there. Yay! (/Update2)

How about a program icon from which you can start BOL 2016 outside of SSMS? Create shortcut somewhere which point to below:
“C:\Program Files (x86)\Microsoft Help Viewer\v2.2\HlpViewer.exe” /catalogName SSMS16 /locale en-US

What if you don’t have SSMS installed, but still want a local BOL? There’s nothing from the SQL Server team to help you here. You would have to install HelpViewer 2.2 on your own (something I haven’t bothered to investigate).

See https://msdn.microsoft.com/en-US/library/mt736393(SQL.130).aspx and https://msdn.microsoft.com/en-us/library/mt703151(v=sql.1).aspx
(the later is a bit more generic).