Analyzing the errorlog

How often do you do this? Look over each message (type) in the errorlog file and determine whether this is something you want to act on. Sure, some (but not all) of you have some monitoring solution in place, but are you 100% confident that it really will notify for all messages that you might find interesting? That there isn’t even one little message hiding in there that you would find valuable knowing about? Or how about messages that you typically don’t are about, but knowing that you have a high frequency can be valuable information?

So, this boils down to actually reading the errorlog file. Some of you probably already have scripts and tool that makes this easier than just reading every simple message from top to bottom. I wanted to share how I do it, and this is why I wrote my Analyze SQL Server logs article. Check it out. And, feedback is always welcome!

List columns where collation doesn’t match database collation

Below script lists all database/table/column where the column collation doesn’t match the database collation. I just wrote it for a migration project and thought I’d share it. I’m sure lots of tings can be improved, but below worked just fine for me for a one-time execution on a number of servers.

IF OBJECT_ID('tempdb..#res') IS NOT NULL DROP TABLE #res
GO

DECLARE
@db sysname
,@sql nvarchar(2000)

CREATE TABLE #res(server_name sysname, db_name sysname, db_collation sysname, table_name sysname, column_name sysname, column_collation sysname)

DECLARE c CURSOR FOR
SELECT name FROM sys.databases WHERE NAME NOT IN('master', 'model', 'tempdb', 'msdb') AND state_desc = 'ONLINE'

OPEN c
WHILE 1 = 1
BEGIN
FETCH NEXT FROM c INTO @db
IF @@FETCH_STATUS <> 0
BREAK
SET @sql =
'SELECT
@@SERVERNAME AS server_name
,''' + @db + ''' AS db_name
,CAST(DATABASEPROPERTYEX(''' + @db + ''', ''Collation'') AS sysname) AS db_collation
,OBJECT_NAME(c.object_id, ' + CAST(DB_ID(@db) AS sysname) + ') AS table_name
,c.name AS column_name
,c.collation_name AS column_collation
FROM ' + QUOTENAME(@db) + '.sys.columns AS c
INNER JOIN ' + QUOTENAME(@db) + '.sys.tables AS t ON t.object_id = c.object_id
WHERE t.type = ''U''
AND c.collation_name IS NOT NULL
AND c.collation_name <> CAST(DATABASEPROPERTYEX(''' + @db + ''', ''Collation'') AS sysname)
'
--PRINT @sql
INSERT INTO #res
EXEC(@sql)
END
CLOSE c
DEALLOCATE c
SELECT * FROM #res

Send email after Agent job execution

After executing a job, I want to have an email with the following information:

  • Success or failure in the mail subject (not just in the body)
  • Success or failure and execution time for each job step in mail body
  • Have output file(s) for the job steps attached to email

Agent has none of above built-in so I created my own procedure for this. This is meant to be added as a final job step for your jobs. You find all details and the procedure here: http://karaszi.com/mailafterjob-send-email-after-agent-job-include-ouput-files. Feedback is of course very welcome.

Who owns your jobs?

Yep, another Agent blog. The topic of job owner has been discussed here before, but this is a specific situation, which potentially is a ticking bomb. First a quick recap of how agent does things:

When a job is owned by somebody who is sysadmin, then no impersonation is performed.

  • T-SQL jobsteps will be executed as Agent’s own Windows authenticated login.
  • Other jobsteps (which operates as the OS level) is done in Agent’s service account’s context.

Above is, assuming you didn’t ask for impersonation for the jobstep (Run As).

Last week, at a SQL Server admin class, Andreas Jarbéus from Swedish Match asked about job owned by Windows account and what happens when that person leaves the company. I was about to say “no problem, if the job owner is sysadmin”, but I felt there was a bit more to it. The scenario is:

  • The job owner is member of a Windows/AD group. Say you have a “SQL Server DBA” group in your AD.
  • This group is added as a login to your SQL Servers. The persons in that group do not have individual logins in SQL Server – only through this group.
  • This group is sysadmin. (I’m not saying that having jobs in general owned by sysadmins is a best practice. You can for instance have vendors installing some software and you don’t want to make them sysadmin. In such a case, clearly we don’t want the job to keep running if that person would disappear. So, I’m targeting jobs owned by sysadmin here, and I would bet you have bunch of those.)
  • Somebody in this group creates an Agent job.
  • The person in the group leaves the company and is removed from this Windows/AD group.

Above is a pretty common scenario. The thing now is that the job no longer works. And here’s the reason:

If the owner was somebody in Windows/AD who’s SID exists in SQL Server (the individual was added as a login to SQL Server), then the job will still work. This also applies for SQL Logins which might be disabled (like “sa” for instance). The reason for this is that the SID for the job owner exists in sys.server_principals and SQL Server will know that this SID is sysadmin. Since Agent won’t do impersonation for sysadmins, there’s no need to “go out to Windows” and see if this SID exists in Windows/AD. Remove the login, and the job will stop working, though – but you have at least don’t something explicitly inside your SQL Server to cause this (removed the login).

But when the owner’s SID don’t exist in sysadmin we have a problem. The problem is that the recorded owner of the job is the SID for the Windows user, even though that SID doesn’t exist in sys.server_principals. As long as this still exists in Windows/AD, the job will work just fine. Agent will run this job owned by ?, ask Windows who this SID is and see that this SID is a member of a group which exists as a login in SQL Server and that is sysadmin. Now, imagine what happen if the SID doesn’t exist in Windows anymore. The job is owned by ?, and that is all that SQL Server knows. The SID no longer exist in Windows/AD so you there’s nobody to tell SQL Server “Yeah, this used to be a sysadmin in SQL Server – through a group membership in Windows/AD”.

Even worse, the job seems to keep working until you re-start the SQL Server service, some caching going on. Makes it even more difficult to determine why the job suddenly stops working: “We deleted this Windows account 4.5 months ago.” This is easy to repro, and you don’t even need a domain (local Windows accounts and groups work just fine):

  • Create a Windows group for this.
  • Create a Windows account for this.
  • Add the Windows account to above group.
  • Add above group as a login to SQL Server.
  • Make this group sysadmin.
  • Create a job, with two jobsteps:
  • T-SQL which executes: PRINT ORIGINAL_LOGIN() + ‘ | ‘ + SUSER_SNAME() + ‘ | ‘ + USER_NAME()
  • CmdExec which executes: WHOAMI.EXE
  • (Above isn’t really necessary, but useful for testing other impersonation aspects of Agent)
  • Specify the Windows account as owner of the job.
  • Run the job, make sure it works fine.
  • Remove the Windows account from the Windows group.
  • Run the job and see it fails.

I’m sure this has been blogged before, but I was a bit surprised by it. I have now added a check for this in my “Check best practices” stored procedure I run every day… On the flip side, this doesn’t seem to happen for multi-server (MSX) jobs, the ownership seems to have special handling for these – you will see if you check the owner_sid in sysjobs.

Agent Alerts Management Pack updated

I have updated the Agent Alerts Management Pack, cleaned up the T-SQL code which creates the alerts. You can now just copy, paste and execute that code. Earlier, for some of the some of the alert definitions it just generated the calls to the query result window, and you had to copy this to a query window and execute it. Now you just copy the code and execute it. You find it here: http://karaszi.com/agent-alerts-management-pack.

Restricting memory using Resource Governor

You might know that Resource Governor (RG) can be used to allow you to affect resource allocation inside SQL Server. RG was introduced with SQL Server 2008 and requires Enterprise Edition or similar/higher. There are two things you can govern using RG: cpu and memory.

For introductory information on RG, see for instance these BOL topics.

A misconception I see from time to time is that the memory setting is to restrict the Buffer Pool, BP, (mainly used for caching pages from disk). This seems difficult since a page in BP has a cost, but isn’t owned by whoever brought it into cache or who referenced it last time. So, it seems difficult with current implementation of cache handling and ageing to govern the BP memory for RG. What RG does govern is “working memory” for a query. Think memory used for sorting, locks, hashing and such.

We just had such a discussion in the MSDN forums, and I decided to do a test to see if we could show that RG do not govern the BP. Brief outline of the script

  • Restrict the BP to 300 MB
  • Create two databases (a and b)
  • Create a table with size of 255MB in each database
  • Create two logins with name a and b, which will be used to execute queries
  • The logins end up with two different resource pools (aPool and bPool), where aPool has max mem 80% and bPool has max mem 20%
  • A loop which generates queries that are executed using xp_cmdshell and SQLCMD
  • The loop first scan over the first approx 9% of data, first login a in database a, then in login b in database b
  • Next iteration is then done over the following 9 % of data
  • After 11 iteration, we’ve done most data
  • Finally check how much memory in the BP each database is using

If RG were to govern the BP, we expect to see database a using lots of more pages than database b. That not what we are seeing. You will probably see that database b is using slightly more memory than a. That is because you done exactly the same queries over exactly the same data for each database and the most recent database you queried was database b (pushing out some of the pages with same cost that database a had in cache). You might want to execute the loop a few times to possibly see things even out a bit.

Seeing is believing, they say, and to me this show that Resource Governor do not govern the Buffer Pool.

If you believe there are faults in how the script does things, feel free to discuss. If you have some alternative script you want us to try, I suggest you open a thread in the MSDN forum (for example) and there post a script which is pretty immediately executable for us(possibly with modifications of file path and server name). Unless the changes you propose are so minor that can be immediately explained in your text.

Use script at own risk.


USE MASTER
SET NOCOUNT ON

--Configure Buffer Pool for max 300 MB memory
EXEC sp_configure 'max server memory', 300
RECONFIGURE

--Drop and create databases a and b
IF DB_ID('a') IS NOT NULL DROP DATABASE a
IF DB_ID('b') IS NOT NULL DROP DATABASE b
GO
CREATE DATABASE a
ON  PRIMARY
(NAME = N'a', FILENAME = N'R:\a.mdf' , SIZE = 250MB , FILEGROWTH = 5MB )
LOG ON
( NAME = N'a_log', FILENAME = N'R:\a_log.ldf' , SIZE = 30MB , FILEGROWTH = 5MB)

CREATE DATABASE b
ON  PRIMARY
(NAME = N'b', FILENAME = N'R:\b.mdf' , SIZE = 250MB , FILEGROWTH = 5MB )
LOG ON
( NAME = N'b_log', FILENAME = N'R:\b_log.ldf' , SIZE = 30MB , FILEGROWTH = 5MB)
GO
ALTER DATABASE a SET RECOVERY SIMPLE
ALTER DATABASE b SET RECOVERY SIMPLE

--Create tables and populate with data, approx 250 MB each
CREATE TABLE a..t(c1 INT IDENTITY PRIMARY KEY CLUSTERED, c2 INT, c3 CHAR(100))
GO
INSERT INTO a..t (c2, c3)
SELECT TOP (90000) ROW_NUMBER() OVER(ORDER BY a.OBJECT_ID), 'hello'
FROM sys.columns AS a CROSS JOIN sys.columns AS b
GO 25
CREATE TABLE b..t(c1 INT IDENTITY PRIMARY KEY CLUSTERED, c2 INT, c3 CHAR(100))
GO
INSERT INTO b..t (c2, c3)
SELECT TOP (90000) ROW_NUMBER() OVER(ORDER BY a.OBJECT_ID), 'hello'
FROM sys.columns AS a CROSS JOIN sys.columns AS b
GO 25

--Logins
IF SUSER_ID('a') IS NOT NULL DROP LOGIN a
IF SUSER_ID('b') IS NOT NULL DROP LOGIN b
GO
CREATE LOGIN a WITH PASSWORD = 'pwd'
CREATE LOGIN b WITH PASSWORD = 'pwd'

--Users
USE a
IF DATABASE_PRINCIPAL_ID('a') IS NOT NULL DROP USER a
GO
CREATE USER a
GO
GRANT SELECT ON t TO a
GO
USE b
IF DATABASE_PRINCIPAL_ID('b') IS NOT NULL DROP USER b
GO
CREATE USER b
GO
GRANT SELECT ON t TO b
GO

--Configure RG:
USE MASTER
ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = NULL);
ALTER RESOURCE GOVERNOR RECONFIGURE;
IF EXISTS(SELECT * FROM sys.resource_governor_workload_groups WHERE name = 'aGroup')
DROP WORKLOAD GROUP aGroup
IF EXISTS(SELECT * FROM sys.resource_governor_workload_groups WHERE name = 'bGroup')
DROP WORKLOAD GROUP bGroup
IF EXISTS(SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'aPool')
DROP RESOURCE POOL aPool
IF EXISTS(SELECT * FROM sys.resource_governor_resource_pools WHERE name = 'bPool')
DROP RESOURCE POOL bPool

CREATE RESOURCE POOL aPool WITH (MAX_MEMORY_PERCENT = 80);
CREATE RESOURCE POOL bPool WITH (MAX_MEMORY_PERCENT = 20);

CREATE WORKLOAD GROUP aGroup USING aPool;
CREATE WORKLOAD GROUP bGroup USING bPool;

IF OBJECT_ID ('dbo.MyClassifier') IS NOT NULL DROP FUNCTION dbo.MyClassifier;
GO

CREATE FUNCTION dbo.MyClassifier ()
RETURNS SYSNAME WITH SCHEMABINDING
AS
BEGIN
DECLARE @GroupName SYSNAME;
IF SUSER_SNAME() = 'a'
SET @GroupName = 'aGroup';
ELSE IF SUSER_SNAME() = 'b'
SET @GroupName = 'bGroup';
ELSE SET @GroupName = 'Default';
RETURN @GroupName;
END;
GO

ALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = dbo.MyClassifier);
GO
ALTER RESOURCE GOVERNOR RECONFIGURE;
GO

--Allow xp_cmdshell
EXEC sp_configure 'xp_cmdshell', 1
RECONFIGURE

--Execute queries as A and B to read bunch of data
--We have 2,250,000 in the tables
--We will construct this command: SELECT COUNT(c2) FROM a..t WHERE c1 BETWEEN 1 AND 200000
--The start and end values will differ, with ranges in 200,000 increments. 11 loops will add up to 2,000,000.
DECLARE @v INT = 1, @sql VARCHAR(2000)
WHILE @v <= 2200000
BEGIN
SET @sql = 'SQLCMD /Slocalhost\a /Ua /Ppwd /Q"SELECT COUNT(c2) FROM a..t WHERE c1 BETWEEN ' + CAST(@v AS VARCHAR(20)) + ' AND ' + CAST(@v + 199999 AS VARCHAR(20))+ '"'
EXEC xp_cmdshell @sql, no_output
SET @sql = 'SQLCMD /Slocalhost\a /Ub /Ppwd /Q"SELECT COUNT(c2) FROM b..t WHERE c1 BETWEEN ' + CAST(@v AS VARCHAR(20)) + ' AND ' + CAST(@v + 199999 AS VARCHAR(20))+ '"'
EXEC xp_cmdshell @sql, no_output
SET @v = @v + 200000
END

--Check how many pages from each db in BP:
SELECT
DB_NAME(b.database_id) AS database_name
,(COUNT(*) * 8192) / (1024 * 1024)) AS buffer_count_MB
FROM  sys.dm_os_buffer_descriptors AS b
GROUP BY  b.database_id
ORDER BY database_name

EXEC sp_configure 'xp_cmdshell', 0
RECONFIGURE

/*--CleanupDROP DATABASE aDROP DATABASE bALTER RESOURCE GOVERNOR WITH (CLASSIFIER_FUNCTION = NULL);ALTER RESOURCE GOVERNOR RECONFIGURE;DROP WORKLOAD GROUP aGroupDROP WORKLOAD GROUP bGroupDROP RESOURCE POOL aPoolDROP RESOURCE POOL bPool*/

Are we Borg?

Is it time to stop remeber things? For real, this time?

Today I needed to find two pieces of SQL Server related information. One was straight foward, I posted it on #sqlhelp on twitter and a minute or so later I had the answer. The other was a bit more involved and I asked around in our MVP mail list – a couple of hours later I had bunch of suggestions and tips. These are only examples, it could just as well have been some web-forum, newsgroup, or some other forum. We’ve all had cases like this, but it made me think and reflect.

Travel back in time, say 20 years or so. Computers were hard to use. Well, not computers per se, but rather the software made them hard to use. (Without software computers are easy to use – they don’t do anything!) We were lucky if the software came with a printed manual. Whatever the quality of that manual.

Then came electronic help with the software. That was a big step forward. And it even had index and search!

With the Internet, things really started to pick up. Sure, there were BBS before that, but not broadly used. With Internet, we were given e-mail. Mail lists. Newsgroups. Altavista, Google and Bing (sounds like a supergroup). Web-forums. Facebook. Twitter. Etc. And not only that, we carry the Internet in our pockets, wherever we are.

So, this is what hit me today. I used to keep tips and tricks in a register. Nuggets I picked up, here and there. The past few years, I do less and less of this. There are so many great minds out there, who also are willing to lend a helping hand. So, almost when I realize I need some bit of information, that information is available. Sounds pretty much like the collective mind of the Borg to me. Perhaps not quite there yet, but give it a year or five.

But what worries me is: Where does that leave us, as human beings. Where is the difference between learning and remembering. Deducing versus asking and being given the information. I’m sure you can relate to this, at some level. For instance when we were forbidden to use pocket calculatos at school, because they were too powerful  Or the other way around.

To put it bluntly: If I don’t have to remember anything about SQL Server (the info is out there, right?), I won’t be a very profficient SQL Server person. I’m sure we can formulate lots of ways to differentiate between learning/creating/deducing and remembering. And, don’t get me wrong. I think the development is really interesting – and the community out there is way cool.

But, I just find the development a bit interesting… and worrying … and thrilling.

Upgrading from Express Edition to Standard Edition

Say you encounter an SQL Server which is Express Edition, and it really should have been some higher edition. Sounds familiar? It is common for me as a consultant to find plenty of SQL Servers at a customer’s site. Many of the databases in these will be moved (typically using backup and restore) to a “real” SQL Server. But in some cases, this might not be desirable. You want to convert the whole instance, from Express to a “real” SQL Server edition.

I’m attending a great SharePoint course for Daniel Bugday this week, and the topic of how SharePoint is installed was discussed. Selecting “Standalone”, SharePoint install will not only install all SharePoint parts on a single server, it will also install SQL Server Express Edition as the database engine. In short, SharePoint is one example where you might have an existing Express Edition which you want to make a “real” (higher) edition. There are other such scenarios, of course.

As of SQL Server 2008, you can let SQL Server’s setup program change from one edition to some other edition by just selecting “Maintenance”, “Edition Upgrade” in setup and specify the product key that came with the install media for the desired edition (which you have acquired first, of course). The cool thing here is that setup basically just registers the different product key and WHAM – you have the higher edition! This is what I wanted to test, and specifically going from Express to a higher edition. Below is my experience:

I have an Express edition which came with the SharePoint installation. It is SQL Server 2008 (not R2), with no tools (SSMS etc) installed. I want to make this a “real” Standard Edition. I have a Standard Edition 2008 installation media, with a product key.

The database engine
I first ran setup from the installed SQL Server (Start, All Programs, Microsoft SQL Server 2008, Configuration Tools, SQL Server Installation Center (64-bit)). I selected “Maintenance”, “Edition Upgrade” in setup and specify the product key that came with my standard Edition. This did not allow me to continue. I got an error (which is lost now) and I could not continue.
Next try was to run setup from the installation media (my standard Edition install media) instead. This did allow me to continue. The change was very quick (a couple of minutes). I then verified using SQLCMD that I now have Standard Edition. Great! The database engine done.

Tool-time
You might want to have the tools (a.k.a. Shared Components) as well on the machine. I refer to things such as SQL Server Management Studio, Integration Services etc. So, I just ran setup from the installation media and installed the desired shared components. Nothing strange here. I now have Management Studio, Books Online and whatever more I might want to have.

SQL Server Agent
SQL Server Express doesn’t come with SQL Server Agent, right? Not really. It comes with agent, but the service is disabled and you cannot start it even if you try. But what about after an edition upgrade? I could still not start Agent. However, checking SQLAGENT.OUT revealed the problem. The first thing agent does at startup is to login to SQL Server and verify that it is sysadmin. The SharePoint setup installed Agent service account as Network Service, which was added as a login in SQL Server, but the login isn’t sysadmin. Now, this is in general a good thing (don’t run your services with higher privileges than necessary), but in this case it causes Agent to not start. Some of the SharePoint services are configured to use Network Service, and apparently the server roles securityadmin and dbcreator is enough for SharePoint. So, this needs to be taken care of. I could use some other service account for Agent, and make sure that service account is sysadmin in SQL Server. For my test, though, it is OK to make the login in SQL Server for Network Service as sysadmin. After that was done, Agent started just as expected. And just for fun, I created a Maintennance Plan for all databases, which also worked just fine.

What about the rest?
Obviously, you might want to check other things, such as database options, file size, auto grow settings, maintenance for the databases etc. But that isn’t strictly related to the edition change, it is just general best practices. For my case, for instance, I might want to upgrade the SQL Server from 2008 to 2008 R2, while I’m at it. Specifically for SharePoint, it seems its setup program created the database files in the Hive (a SharePoint term for a folder in Program Files, Common Files), and you probably want to move the SharePoint database files to a more suitable location. But that is outside the scope for this post, and hopefully no sweat for your dba.

In short, my test show that is is easy to change from one edition to a different edition, even if you want to change from Express Edition.

Point-in-time restore of database backup?

SQL Server 2005 added the STOPAT option for the RESTORE DATABASE command. This sounds great – we can stop at some point in time during the database backup process was running! Or? No, we can’t. Here follows some tech stuff why not, and then what the option is really meant for:

A database backup includes all used extents and also all log records that were produced while the backup process was running (possibly older as well, to handle open transactions). When you restore such a backup, SQL Server will from backup media copy the data pages to mdf and ndf file(s), log records to ldf file(s) and then do REDO (roll forward the log records) and possibly also UNDO (rollback open transactions – this is skipped if you specify NORECOVERY option). Why does it need the log records? Because you can do modifications in the database while the database backup is running. After backup process was started (and before it finishes) you can both modify pages not yet copied to backup media, but also pages already copied to backup media. So, the data pages in themselves do not present a consistent state of the database. At restore time, SQL Server uses the log records included in the database backup to “sync” the modifications that were performed while the backup process were running. So, what you end up with is what the database looked like at the end time of the backup process. I’m sure this is well documented somewhere in Books Online so I won’t dive further into the tech stuff here.

So, what is the purpose of the STOPAT option for RESTORE DATABASE?

It is to give you an early warning, in case you try to restore a database backup which is too late for a subsequent STOPAT operation for a log restore. Here’s an example:

  1. 13:00 db backup
  2. 13:10 log backup
  3. 13:20 log backup
  4. 13:30 db backup
  5. 13:40 log backup
  6. 13:50 log backup

Say you now want to restore to 13:15. Intuitively, you say that you will restore 1, 2 and for 3 you do STOPAT 13:15. Yes, that is right.

But say you are under stress, and perhaps even used some tool which confuses things for you? So, you end up restore 4, 5 and for 6 you try STOPAT 13:15. Clearly, you can’t stopat 13:15 for backup number 6 (it only covers 13:40 to 13:50). I.e., restore of 6 gives you an error message. So, you have to re-start the restore process (1, 2 and 3), which could take a long time (if the backups are large). If you had specified STOPAT 13:15 when you restore backup 4, SQL Server would have given you an error immediately, so you wouldn’t have wasted valuable time restoring unnecessary backups!

That is the purpose of STOPAT for RESTORE DATABASE.