Where’s that sys.messages management pack?

I’ve often been thinking that “somebody” should produce a list and perhaps some ready-made TSQL code of the SQL Server error messages we want to be alerted for. Something like a Management Pack, but not tied to any particular sys mgmt software. Essentially, what error messages would we typically want to be alerted for, and some ready made TSQL code which defines an alert (msdb.dbo.sp_add_alert, see also http://www.sqlskills.com/BLOGS/PAUL/post/Easy-monitoring-of-high-severity-errors-create-Agent-alerts.aspx) and operator for each of those.

Honestly, how many of you have been going through all SQL Server messages and determined which of those we want to be alerted for? Sure, we can decide to specify alerts for all high-severity level messages (like 19, 20, 21, 22, 23, 24 and 25). But unfortunately the severity classification isn’t as coherent and we would hope.

For instance, Paul Randal just recently blogged about error 825, which has as low severity as 10! Do we want to be alerted for 825? You bet! Check out Paul’s blog at http://www.sqlskills.com/BLOGS/PAUL/post/A-little-known-sign-of-impending-doom-error-825.aspx.

We can, however make it a little bit easier for us. Since Agent detects the message by sniffing the eventlog, we can immediately discard all messages which aren’t written to the eventlog (there’s no use defining alert for something which the alerting mechanism never detects, right?):

SELECT 
FROM sys.messages
WHERE language_id 1033
AND is_event_logged 1
ORDER BY severity

Now we are down to 699 messages. It used to be that we could configure whether a message should be written to eventlog (sp_altermessage), but this functionality disappeared in 2005. I haven’t checked Connect, but my guess is that if there is such a wish/entry in the first place, it doesn’t have too many votes.

My suggestion is that we create some small community project somewhere where we add what messages we want to be alerted for. This can be as simple as a table with error numbers and some insert, and then we use this to generate our sp_add_alert commands based on that. Question is where to do this, so it can be maintained and refined over time? Or if it has already been done by somebody…

 

Troubleshooting login problems

I’ve been learning a few things about troubelshooting login problems over the last few months. I first want to say that I’ve learned most of this through other blogs, comments etc. At the end of this blog post you will find an URL list where you can look for more details.

It started a few months back where I somewhere read a blog stating that we can use the state in a “login failed” error message to determine why the login failed. As you can imagine, there can be several reasons for login failure (user doesn’t exist, pwd doesn’t match, windows login name isn’t a trusted login etc). For some reason, I didn’t save that URL and of course I needed it some time later and couldn’t find it.

One place where you can find such a list of states is… Books Online. BOL 2008 documents every error number, so it is a matter of knowing what error number to search for: 18456. You can also search for “login failed” (pretty easy to remember) and the first hit is a different page (from above) but with similar information. I just tried a search in 2005 BOL for the same and had similar hits. This information might have been added to BOL 2005 in a more recent BOL 2005 release, though.

Now, don’t be surprised if the client app only receive state 1 for the login failed messages. This is a good thing, and on purpose. We don’t want the system to disclosure too much information to a malicous user (app) about why a login fails. So what we do is to look in the errorlog or for the “real” state (the EventLog doesn’t seem to carry this information, for some reason). Below is an example from SQL Server 2005, sp2, with timestamps removed:

Error: 18456, Severity: 14, State: 5.
Login failed for user ‘gurka’. [CLIENT: <local machine>]

So, state 5 tell us, according to SQL Server 2005 BOL, that “User ID is not valid.”.

Now, doing the same in SQL Server 2008 was interesting. Looking in the EventLog, we do see an explanation as to why the login failed. We still don’t see the state in the EventLog, but we have an explanation so we might not need the state. And in the errorlog file, I had below message:

Error: 18456, Severity: 14, State: 5.
Login failed for user ‘gurka’. Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]

So, not only do we have a state, we also have a decription for why the login failed. Is it pathetic that such a thing can make you happy? Perhaps it is… ūüėČ

Here are some other blog posts on the subject:
http://blogs.msdn.com/sql_protocols/archive/2006/02/21/536201.aspx
http://jmkehayias.blogspot.com/2008/03/understanding-login-failed-error-18456.html

German translation of my updated datetime article

Frank Kalis has been kind enough to translate my datetime article (http://www.karaszi.com/SQLServer/info_datetime.asp) to German. I updated my article a while ago to reflect the new date and time related types in SQL Server 2008, and I just learned that Frank now has finished the translation of my updated article to to German. Check it out at:

http://www.insidesql.org/beitraege/datetime/der-ultimative-guide-fuer-die-datetime-datentypen

What does RESTORE do?

I was replying to a newsgroup post today, explaining the restore process. I find some confusion in various places about what actually happens during restore, and hopefully below can help a bit:

Fact: All backups contains log records. A log backup contains only of log records (more later on bulk-logged recovery). The different types of database backup contain the log records that was produced while the backup was running – in addition to data pages.
The restore command can perform several things:
  1. Delete the existing database. This happens if you specify the REPLACE option.
  2. Create the existing database. This happens if the database name you specify in the RESTORE command doesn’t exist.
  3. Copy data pages to the same address in each database file as they were when the backup was produced. And of course also copy log records to ldf file. Source for these copy operations is the backup file that you restore from.
  4. REDO. Roll forward transactions based on the log records copied in above step.
  5. UNDO. Rollback all open transaction. Not performed if you specify NORECOVERY (more later on STANDBY option). Database is now accessible, and no further backups can be restored (diff or log backups).
A couple of special cases:
If you are in bulk-logged recovery model, then a log backup performed if you did minimally logged operations since last log backup will contain also data pages (in addition to log records). This type of backup cannot be performed if any of the data files are lost. When you restore from this type of backup, you cannot do point-in-time restore.
The STANDBY option of the RESTORE command does perform UNDO but saves that work to a file you specify. This so that the UNDO work can be undone when you restore a subsequent backup.
I think I managed to compress the topic pretty well, so unless you worked a bit with backup and restore in SQL Server, you might want to read above a couple of times. ūüôā

Does the Resource Governor govern?

Two weeks ago, we did the “SQL¬†2008 Summit” roadshow here in Sweden. We did 4 cities around Sweden in 4 days (http://www.expertzone.se/sql2k8/). It was a bit exhaustive, but even more fun – being able to travel and spend some time with persons wish I could meet more often (Kalen), others I meet regularly but only at workplace (Roger, Patrik, Anna) and yet other persons I just met (George, Per).

One of my presentations was on Resource Governor (RG), and I has this super-simple demo meaning to show CPU throttling. I classified connections to one of two Workload Groups based on login name. One group used a Resource Pool with max CPU at 10% and the other a Resource Pool with max CPU at 90%. Since I have two CPU cores, I started two execution loops for each login. An execution loop uses SQLCMD to login using the appropriate loginID and execute a proc which loops and outputs a counter using RAISERROR and NOWAIT (so we see something happening in the console).

For two of my presentations it worked just as expected. For two¬†presentations it didn’t: the CPU usage looked very very strange – nowhere near what we expected. So, during the final day, I managed to spend some time with Mikael Colliander from MS Sweden. First we couldn’t reproduce the strange behavior, but after several restart, re-boot etc. we could. We now finally got to look at what scheduler each connection was using and there was the answer. One connection (ending up in the pool with max 10% CPU) was alone on one scheduler meaning alone on one CPU! The other three connections (one one on 10% CPU and two on¬†max 90% CPU) was using the other CPU. So for the CPU where we had only one connection (belonging to the pool to be capped at 10% CPU) we just had no contention. So this connection could use everything on that CPU since nobody else was assigned to the CPU.

Now when I understand why this happened, it doesn’t look that strange. But I think we need to be very careful when we monitor resource usage for our connections and are using resource governor. The more CPUs we have the less chance we will see the (perhaps expected) distribution of load.

Preparing for the SQL Summit 2008

I’ve started to produce my presentations for the “SQL Summit 2008”. It is always fun to dive down into the details about some certain feature. For instance one of my presentations is about policy based management, which will also include a few words on configuration servers and also news in general in SSMS. So I took the time to look around in SSMS for new stuff and I found surprisingly much. This is one of the upsides with speaking, you need to take your time to actually study a broader area about a subject, and then dive down into details.

The SQL Summit 2008 will take place from Oct 6 to Oct 10, over 4 cities in Sweden (Ume√•, Malm√∂, G√∂teborg and Stockholm). I’m happy to be in good company among other speakers like Per Westerlund, Roger Cevung, George Th√∂rngren and Patrik L√∂wendahl. And I’m really happy that the keynote (and other presentations) will be delivered by my good friend Kalen Delaney. The slightly sad part is that one my my presentations is scheduled at the same time as one of Kalen’s. But who knows, perhaps my room will be empty <g>?

Check it out at SQL Summit 2008.

Error messages in 2008 Books Online

I just learned from Gail E. at MS that the system error messages are all documented in the SQL Server 2008 Books Online.

For instance, search for 823. See the hits? Now, for instance, select the “Errors 1 – 999” page. Press the “Sync with Table of Contents” button. Look to the left, in the “Contents” section. You now see that the errors messages are divided into sections (1-999, 1000-1999, …). You also see where in BOL to find these sections:

SQL Server 2008 Books Online
Database Engine
Technical Reference
Errors and Events Reference
Database Engine Events and Errors
System Error Messages

Also note that some messages has a hyperlink which takes us to a section where we can read more about this particular error message (823 is such an example).

Rebuild master in SQL Server 2008

You might have read an earlier blog entry about my problems to remove a Data Collector (DC) information in SQL Server 2008. I still haven’t found any info on how to do this, but my questions in the MVP group triggered some activity.

Bob Ward looked into how to rebuild the system databases. This was on my list to try (after removing DC config and if that didn’t work rebuilding msdb). But Books Online had very little information on how to actually do the rebuild. Since there were quite many changes in setup between 2005 and 2008, I didn’t feel like trial and error based on how we did this in 2005.

Bob helped me out with some information on how to do the rebuild and it is very easy! I now have a bat file which does rebuild of three instances – and it only takes 5 minutes. And even better: no installation media is required – and it also remembers the collation for my system databases!

Enough said. Check out Bob’s blog post at: http://blogs.msdn.com/psssql/archive/2008/08/29/how-to-rebuild-system-databases-in-sql-server-2008.aspx

Execution plan re-use, sp_executesql and TSQL variables

Let me start by saying that the contents of this post is not very advanced. If you have read the excellent paper “Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005”,¬†https://technet.microsoft.com/en-us/library/cc966425.aspx ¬†and understood it, you would already know below, and much more…

I was reading a thread in the open newsgroups today (.tools, posted by Mike), about an advice that the SQL Server 2005 Database Engine Tuning Advisor apparently gave. It advices to replace sp_executesql usage with declaring TSQL variables and using those in the WHERE clause. Translated to the Adventureworks database, it advices that instead of 1) below, we should use 2) below.

1)
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM Person.Person
WHERE LastName = @P1',
N'@P1 nvarchar(50)', 'Diaz'

2)
DECLARE @P1 NVARCHAR(50)
SET @P1 = 'Diaz'
SELECT FirstName, LastName, PersonType, Title
FROM Person.Person
WHERE LastName = @P1

Now, I could not reproduce this (make DTA give me the same advice, to replace 1) with 2) ). Perhaps I misunderstood the poster in the group, it is because I’m running SQL Server 2008 DTA and engine, I’m not looking in the right place for that advice, my data isn’t representative, I’m running the DTA with some other settings, etc..¬†But say that DTA does indeed give such an advice, would would it do that? To be honest, I don’t know. It can hardly have enough information to determine whether 1) or 2) is the best choice.

In short: Say we have an index on the LastName column and the name we look for can either be a very common name, like perhaps “Smith”, or a not so common name, like “Karaszi”. For the more common name, a table scan might be the best choice, where for the not-so-common name, an index seek it probably best thing. OK, a very simple example, but serves well for this discussion.

Back to the difference between 1) and 2). There are potentially very important differences between the two:

For 1), SQL Server will determine a plan based on the contents of the parameter when the plan is created. I.e., it can determine selectivity based on that and determine whether it is a good idea to use an index or not. The plan is then cached and can be re-used. But what if we for the first execution pass in something which is very selective, but typically we are not very selective? Or the other way around? I,e, the plan for one case might not be optimal for some other case. This is where we have new optimizer hints in 2005 like OPTIMIZE FOR and the RECOMPILE hints. I will not go into details about these here, since I’m focusing on the differences between 1) and 2). See the white paper I mentioned, and of course Books Online, for more details.

For 2), SQL Server (or rather: the optimizer) has no knowledge of the contents of the variable when the plan is produced. So it cannot use the statistics to determine selectivity. In above case, it instead uses density (stored with the statistics, assuming such exists for the column). Density is basically the 1/ number of unique values for the column(s). This might be a good representation for a typical lastname, but perhaps not too good for a very common or a very uncommon lastname. Now, in 2005, we have hints for these situations as well (RECOMPILE), but again, that is not the point here.

In order for DTA to give the best advice here, it would have to know about the distribution over the data for that column and also have many executions of that query to see if “all” executions are using a typical value (sp_executesql might be better) or if the stored density value is a good representation for “all” queries that are passed from the clients. I very much doubt that DTA has this level of sophistication. Basically, I don’t know why it advices this. There might be other aspects, like “avoid dynamic SQL” (which whether that holds for this case we could argue in another post), but DTA is about performance, not best practices.

Bottom line is that these things are not simple and we should be very cautious with “rules of thumbs”.

Here’s some TSQL code to demonstrate the differences between 1) and 2). As always, only execute after you read and understood the code!

--Create a copy of the person table
--We will have lots of "Diaz" and very few "Gimmi"
IF OBJECT_ID('p') IS NOT NULL DROP TABLE p
GO
SELECT * INTO p
FROM Person.Person

CREATE INDEX x ON dbo.p(LastName)

--Create lots of Diaz
INSERT INTO p
SELECT  BusinessEntityID + 30000, PersonType, NameStyle, Title, FirstName, MiddleName, N'Diaz', Suffix, EmailPromotion, AdditionalContactInfo, Demographics, rowguid, ModifiedDate
FROM p

--Make sure we have up-to-date statistics
UPDATE STATISTICS p WITH FULLSCAN

--Verify execution plan and I/O cost
--for table scan with low selectivity
--and index seek with high selectivity
SET STATISTICS IO ON

DBCC FREEPROCCACHE
SELECT * FROM p WHERE LastName = 'Diaz'
--20183 rows, table scan, 7612 pages

DBCC FREEPROCCACHE
SELECT * FROM p WHERE LastName = 'Gimmi'
--1 row, index seek, 3 pages

------------------------------------------------------------
--sp_execute alternative
------------------------------------------------------------

--Table scan will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)', 'Diaz'
--20183 rows, table scan, 7612 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)', 'Gimmi'
--1 row, table scan, 7612 pages

--Other way around
--Index search will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)', 'Gimmi'
--1 row, index seek, 3 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)', 'Diaz'
--20183 rows, index seek, 20291 pages

------------------------------------------------------------------
--Alternative using variable
------------------------------------------------------------------
DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 = 'Diaz'
SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1
GO
--20183 rows, index seek, 20291 pages

DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 = 'Gimmi'
SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1
GO
--1 rows, index seek, 1 pages

--Same plan even though very different selectivity
--and emptying plan cache in between

--Estimated 33 rows for both above.
--See if that is drawn from statistics density?

--Formula for density: 1/#OfUniqueValuesInColumn
SELECT 1./COUNT(DISTINCT LastName) FROM p
--0.00082918739

--Does that match density in index statistics?
DBCC SHOW_STATISTICS('p', 'x')
--Yes

--How many rows in the table?
SELECT COUNT(*) FROM p
--39944

--So how many rows would we estimate based on density?
SELECT 0.00082918739 * 39944
--Yep, 33 rows.

--I.e, for the variable alternative, SQL Server has no
--knowledge of contents of those variables so it must use density instead.

--Clean up:
IF OBJECT_ID('p') IS NOT NULL DROP TABLE p

Rebuilding msdb on SQL Server 2008

Because of the problems I had removing Data Collector I decided to rebuild msdb. You probably heard about instmsdb.sql, but it was a long time since I actually used it. I asked about rebuilding in the MVP group and Dan Jones (MS) pointed me to a Blog post from Paul Randal on how to do this on SQL Server 2005. Here’s Paul’s blog post:

http://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/06/619304.aspx

Since above is for SQL Server 2005 I realized that it might not work smoothly on 2008. And It didn’t. Below are some of the things I discovered (also posted as a comment on Paul’s blog). Read below in light of Paul’s blog. I should also say that nowhere does Paul states that his instructions work on 2008. It was me taking a chance. ūüôā

You need to add startup parameter -s <instancename> if it is a named instance. Now, this I knew, but for the sake of other potential readers…
I initially started the instance from the Windows services applet by adding -T3608. That didn’t allow for detaching msdb. So I started from an OS command prompt and also added -c. This allowed me to detach msdb.
I now ran instmsdb, but that produced a number of errors. Here are a few comments about some of them:
* Complaints on xp_cmdshell. I did try enabling this first and then ran instmsdb again but same result.
* Bunch of errors when creating various Data Collector objects. This wasn’t good, because cleaning up DC was the reason to rebuild msdb in the frist place.
* 3 errors about sp_configure and -1 wasn’t allowed value (two for Agent Xps and one for xp_cmdshell).
Just for the sake of trying, I now tried to connect to the instance using SSMS Object Explorer. But I now got some error regarding Agent Xp’s when connecting. I tried to explicitly enabling Agent XP’s using sp_configure but same error. When connected there’s no node in Objects Explorer for Agent.
I took this as an indication that Agent isn’t healthy. Whether it was me doing something fishy or it isn’t as easy as just running insmsdb.sql for SQL Server 2008 – I don’t know. But I’m in for a rebuild of system databases. This isn’t that bad since it is a just a test machine. But these issues might serve as example why you want to follow Paul’s initial advice: always backup msdb (also on test machines).