Keep more history for your Agent jobs

SQL Server Agent is in my opinion way too restrictive when it removes old job history information.

This information is store in the dbo.sysjobhistory table in the msdb database. Unfortunately, Agent will by default only keep 1000 rows in this table. I have on many occasions wanted to for instance check last execution of the weekly job. Or the trend for the weekly job, like how ling it takes to execute. Only to find that the information is no longer there.

I prefer to expand this to 50000 rows in total and 200 rows per job.

You can do this in the GUI: right-click Agent, Properties, History. Or just execute below SQL command on your SQL Servers:

EXEC msdb.dbo.sp_set_sqlagent_properties
@jobhistory_max_rows=50000
,@jobhistory_max_rows_per_job=200

Log Page Life Expectancy over time

You often see Page Life Expectancy referred to as an interesting performance monitor counter. And it can be! It indicates for how long a page is expected to stay in cache, from the time it was brought into cache.

But just looking at a snapshot value doesn’t say that much. It might be high, but that is because you haven’t had a high turnover of you pages for the past couple of hours. Or the other way around, you happen to look just after a very rare monster query. Furthermore, having a log can show you how PLE changes over time.

“The app was slow at two o’clock today.”

You look at the PLE log and you see how PLE dropped at that time, which indicates that something was executed that did lots of I/O. Or you don’t see PLE drop at all, and can assume that it probably was a blocking situation instead.

Some of you already have performance monitoring solutions that include PLE logging over time. For you that don’t, I created a very simple solution. It involved a table, a scheduled Agent job and queries to read the data. That’s it.

You find it at http://karaszi.com/log-page-life-expectancy-over-time .

Improvements and comments are of course appreciated. 🙂

Do you clean up your Database Mail log tables?

Database Mail has a couple of log tables in the msdb database. These can become large over time. I’ve seen MSDB databases over 1 GB in size, where normal size is less than 50 MB (heavy usage of old SSIS deployment model excluded).

Unfortunately Maintenance Plans do not have built-in functionality for this, nor does Ola Hallengren’s excellent maintenance solution ( http://ola.hallengren.com/ ). All you have to do is to schedule an agent job to be executed, say, every week, having one T-SQL jobstep containing: 

DECLARE @DeleteOlder DATETIME

SET @DeleteOlder = DATEADD(MONTH, -1, CURRENT_TIMESTAMP)

EXECUTE msdb.dbo.sysmail_delete_mailitems_sp @sent_before = @DeleteOlder

EXECUTE msdb.dbo.sysmail_delete_log_sp @logged_before = @DeleteOlder

Above removes mail history older than one month. Adjust to your liking, using the values in the DATEADD function.

As always, remember to comment your job and to specify appropriate database for the T-SQL jobstep (for documentation purposes, msdb in this case).

Wait random number of minutes

Why on earth would you want to do that? you ask. Say you have a job that is scheduled to start at the same time over a number of servers. This might be because you have an SQL Server Master/Target server environment (MSX/TSX) or you quite simply script a job and execute that script on several servers. You probably want to spread the load on your SAN and virtual machine host a bit. This is the exact reason I use this procedure. I frequently use MSX servers and I usually add a job step (executing this procedure) to wait a random number of minutes between 0 and 30.

You find the procedure here.

Resynchronizing a target server (MSX – TSX)

I often use SQL Server Agent master / target servers (MSX/TSX). I find it so convenient to create the job once and then just add whatever targetservers (TSX) should have this job. Especially when you later modify the job. Again, just modify it once. The usage of MSX in general, and how I use it, can easily become a series of blog posts in itself. But that is not the point here.

Sometimes a TSX goes out-of-sync with its master. I’ve never understood exactly under what circumstances, but it feels like it happened when you do things “too quickly”. Like change a job, push it out, and before the push has finished, you change it again. Or something like that. A TSX going out-of-sync doesn’t happen frequently. I’ve had it a handful of times. And every time, I have spent time searching etc. on how to fix it.

A couple of days ago, a client of mine had this case, and he had himself tracked down a possible way to fix this. We decided to go ahead with this, and it worked just fine. So, the purpose here is to document the (very easy) fix, for whenever this happens again. And for all of you out there who might benefit, of course. The error you see is something like:

[291] An unresolved problem exists with the download instructions (sysdownloadlist) for target server ‘Y’ at MSX ‘X’

X here is obviously the master and Y the target. And the solution was quite simply (in the msdb database):

EXEC dbo.sp_resync_targetserver, N’Y’

Are you using MSX/TSX? Have you had sync issues? How did you handle them?

Send email after Agent job execution

After executing a job, I want to have an email with the following information:

  • Success or failure in the mail subject (not just in the body)
  • Success or failure and execution time for each job step in mail body
  • Have output file(s) for the job steps attached to email

Agent has none of above built-in so I created my own procedure for this. This is meant to be added as a final job step for your jobs. You find all details and the procedure here: http://karaszi.com/mailafterjob-send-email-after-agent-job-include-ouput-files. Feedback is of course very welcome.

Who owns your jobs?

Yep, another Agent blog. The topic of job owner has been discussed here before, but this is a specific situation, which potentially is a ticking bomb. First a quick recap of how agent does things:

When a job is owned by somebody who is sysadmin, then no impersonation is performed.

  • T-SQL jobsteps will be executed as Agent’s own Windows authenticated login.
  • Other jobsteps (which operates as the OS level) is done in Agent’s service account’s context.

Above is, assuming you didn’t ask for impersonation for the jobstep (Run As).

Last week, at a SQL Server admin class, Andreas Jarbéus from Swedish Match asked about job owned by Windows account and what happens when that person leaves the company. I was about to say “no problem, if the job owner is sysadmin”, but I felt there was a bit more to it. The scenario is:

  • The job owner is member of a Windows/AD group. Say you have a “SQL Server DBA” group in your AD.
  • This group is added as a login to your SQL Servers. The persons in that group do not have individual logins in SQL Server – only through this group.
  • This group is sysadmin. (I’m not saying that having jobs in general owned by sysadmins is a best practice. You can for instance have vendors installing some software and you don’t want to make them sysadmin. In such a case, clearly we don’t want the job to keep running if that person would disappear. So, I’m targeting jobs owned by sysadmin here, and I would bet you have bunch of those.)
  • Somebody in this group creates an Agent job.
  • The person in the group leaves the company and is removed from this Windows/AD group.

Above is a pretty common scenario. The thing now is that the job no longer works. And here’s the reason:

If the owner was somebody in Windows/AD who’s SID exists in SQL Server (the individual was added as a login to SQL Server), then the job will still work. This also applies for SQL Logins which might be disabled (like “sa” for instance). The reason for this is that the SID for the job owner exists in sys.server_principals and SQL Server will know that this SID is sysadmin. Since Agent won’t do impersonation for sysadmins, there’s no need to “go out to Windows” and see if this SID exists in Windows/AD. Remove the login, and the job will stop working, though – but you have at least don’t something explicitly inside your SQL Server to cause this (removed the login).

But when the owner’s SID don’t exist in sysadmin we have a problem. The problem is that the recorded owner of the job is the SID for the Windows user, even though that SID doesn’t exist in sys.server_principals. As long as this still exists in Windows/AD, the job will work just fine. Agent will run this job owned by ?, ask Windows who this SID is and see that this SID is a member of a group which exists as a login in SQL Server and that is sysadmin. Now, imagine what happen if the SID doesn’t exist in Windows anymore. The job is owned by ?, and that is all that SQL Server knows. The SID no longer exist in Windows/AD so you there’s nobody to tell SQL Server “Yeah, this used to be a sysadmin in SQL Server – through a group membership in Windows/AD”.

Even worse, the job seems to keep working until you re-start the SQL Server service, some caching going on. Makes it even more difficult to determine why the job suddenly stops working: “We deleted this Windows account 4.5 months ago.” This is easy to repro, and you don’t even need a domain (local Windows accounts and groups work just fine):

  • Create a Windows group for this.
  • Create a Windows account for this.
  • Add the Windows account to above group.
  • Add above group as a login to SQL Server.
  • Make this group sysadmin.
  • Create a job, with two jobsteps:
  • T-SQL which executes: PRINT ORIGINAL_LOGIN() + ‘ | ‘ + SUSER_SNAME() + ‘ | ‘ + USER_NAME()
  • CmdExec which executes: WHOAMI.EXE
  • (Above isn’t really necessary, but useful for testing other impersonation aspects of Agent)
  • Specify the Windows account as owner of the job.
  • Run the job, make sure it works fine.
  • Remove the Windows account from the Windows group.
  • Run the job and see it fails.

I’m sure this has been blogged before, but I was a bit surprised by it. I have now added a check for this in my “Check best practices” stored procedure I run every day… On the flip side, this doesn’t seem to happen for multi-server (MSX) jobs, the ownership seems to have special handling for these – you will see if you check the owner_sid in sysjobs.

Agent Alerts Management Pack updated

I have updated the Agent Alerts Management Pack, cleaned up the T-SQL code which creates the alerts. You can now just copy, paste and execute that code. Earlier, for some of the some of the alert definitions it just generated the calls to the query result window, and you had to copy this to a query window and execute it. Now you just copy the code and execute it. You find it here: http://karaszi.com/agent-alerts-management-pack.

Using sa as owner for jobs and databases

This blog is not about avoiding logging in using the sa login. Hopefully we all know about this, and work towards avoidning this practice.

Instead I want to talk about using sa, but not to login (authenticate), but as owner for jobs and databases. I want keep these thing de-individualized – so we avoid things like person A leaving the company and we don’t dare to remove that login/Windows account. We can of course create some SQL login or Windows login especially for this purpose and use that. But sa is already there. Another advantage is that sa always has the same sid number (makes moving databases across instances a bit easier).

The way Agent work is that if the owner is member of sysadmin server role, then it won’t attempt any imersonation for the job steps. I.e., Agent won’t use SETUSER (2000 and earlier) or EXECUTE AS USER = (2005 or later). This means that Agent will never actually authenticate using sa (Agent will always authenticate using a Windoes authentication – and then verify that it is sysadmin). I.e., we can change password for sa, disable sa, or even run in Windows Only mode.

And, just to be obvious: If the job should be owned by some individual, in order for operating in a proper security context, then we should use that individual as owner and not sa!

How do you handle job and database ownership? Do you have situations where the owner does matter, details?

SQL Server Agent jobs and user contexts

In what user context does a job run? I recently found myself in a forum discussion and gave my stock reply, later realizing that I haven’t actually tested this for a long time (I used to demo this in class during 6.5 courses – when we actually had time for slight diversions). Lets start with my assumptions:

  1. Job owned by sysadmin, TSQL jobsteps:
    Agent log in to SQL Server using its own credential (windows authentication from service account) and execute the TSQL commands – no impersonation performed.
  2. Job owned by sysadmin, other jobsteps:
    Agent starts a process using its service account – no impersonation performed.
  3. Job owned by non-sysadmin, TSQL jobstep:
    Agent log in to SQL Server using its own credential (windows authentication from service account), then uses EXECUTE AS LOGIN = ‘<job_owner_login>’ to “impersonate” the login who owns the job. I.e., the TSQL commands will be executed in the security context of the job owner’s login.
  4. Job owned by non-sysadmin, other jobsteps:
    Agent starts a process using the service account as specified by the Agent proxy selected for this jobstep. (See Books Online “sp_add_proxy” and “CREATE CREDENTIAL” and this for more information about Agent Proxies and credentials in SQL Server.)

So, how to prove above? Lets use a combination of jobstep output file and a Profiler trace. The ouput files will be used to catch the login/user names, and the Profiler trace to verify what TSQL commands will be submitted by Agent (for TSQL jobstep). For the TSQL jobsteps, we execute:

SET NOCOUNT ON
SELECT
CAST(ORIGINAL_LOGIN() AS VARCHAR(20)) AS Original_login
,CAST(SUSER_SNAME() AS VARCHAR(20)) AS Effective_login
,CAST(USER_NAME() AS VARCHAR(20)) AS Db_user

And to test the other jobsteps, where we will be using a CmdExec jobstep as example (we are interested in what Windows account the process will be started as), and use whoami.exe which returns the windows user name. Each jobstep is configured to output the result to a file.

  • I change my real machine name to “Machine” in this text.
  • I’m not in a domain.
  • I’m logged in interactively (to Windows) as Windows account “Tibor”.
  • I’m logged in to SQL Server using Windows authentication.
  • I’m sysadmin.
  • My service account for both Agent and SQL Server services is “SQLService”.

Here’s the result from having the job owned by me (being sysadmin):

Original_login       Effective_login       Db_user
——————– ——————– ——————–
MACHINE\SqlService   MACHINE\SqlService   dbo

machine\sqlservice

As you see, no attempted impersonation performed. Steps 1 – 2 in above list confirmed.

We now going to test this for somebody who isn’t sysadmin:

  • Create SQL Server login named “Kalle”
  • Create user in msdb for above
  • Assign Kalle permission to create jobs (SQLAgentUser role in msdb)
  • Create account “MySqlProxy” in Windows.
  • Creade credential “c_MySqlProxy” in SQL Server for above.
  • Create proxy “p_MySqlProxy” for above, and allow login “Kalle” to use this for CmdExec jobsteps.

Now we modify the job and set the owner to Kalle, and also specify for the CmdExec jobstep to use the proxy p_MySqlProxy. Also, since non-sysadmins can’t use output files (I didn’t know that), we instead configure each step’s output to go to table. Execute job. No, finally we can check each job step output for result:

Original_login       Effective_login       Db_user
——————– ——————– ——————–
MACHINE\SqlService   Kalle                guest

machine\mysqlproxy

The first section above confirms that Agent uses it’s own login, but then changes login context to Kalle (verified by Profiler trace capturing the “EXECUTE AS LOGIN = N’Kalle’ WITH NO REVERT” command). The job step was configured to run in the master database; since I didn’t add Kalle as a user to master, you see the user name guest.

The second section verifies that The CmdExec step started a process using the mysqlproxy windows account, and specified using our SQL Server Agent Proxy and SQL Server credential object.

(I will not reply to support questions here. For support, I recommend you visit a forum, for instance this. Other discussions, comments and corrections are of course very welcome!)

Finally, I scripted out the job (as owned by me – being sysadmin), if you want to re-create it and play with it. Use at own risk, do apropriate modifications etc.

USE [msdb]
GO

/****** Object:  Job [CredTest]    Script Date: 09/19/2009 12:12:47 ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode 0
/****** Object:  JobCategory [[Uncategorized (Local)]]]    Script Date: 09/19/2009 12:12:47 ******/
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]’ AND category_class=1)
BEGIN
EXEC @ReturnCode msdb.dbo.sp_add_category @class=N’JOB’@type=N’LOCAL’@name=N'[Uncategorized (Local)]’
IF (@@ERROR <> OR @ReturnCode <> 0GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode =  msdb.dbo.sp_add_job @job_name=N’CredTest’,
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N’No description available.’,
@category_name=N'[Uncategorized (Local)]’,
@owner_login_name=N’MACHINE\Tibor’@job_id @jobId OUTPUT
IF (@@ERROR <> OR @ReturnCode <> 0GOTO QuitWithRollback
/****** Object:  Step [A_TSQL]    Script Date: 09/19/2009 12:12:48 ******/
EXEC @ReturnCode msdb.dbo.sp_add_jobstep @job_id=@jobId@step_name=N’A_TSQL’,
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0@subsystem=N’TSQL’,
@command=N’SET NOCOUNT ON
SELECT
CAST(ORIGINAL_LOGIN() AS VARCHAR(20)) AS Original_login
,CAST(SUSER_SNAME() AS VARCHAR(20)) AS Effective_user
,CAST(USER_NAME() AS VARCHAR(20)) AS Db_user’,
@database_name=N’master’,
@flags=8
IF (@@ERROR <> OR @ReturnCode <> 0GOTO QuitWithRollback
/****** Object:  Step [B_OS]    Script Date: 09/19/2009 12:12:48 ******/
EXEC @ReturnCode msdb.dbo.sp_add_jobstep @job_id=@jobId@step_name=N’B_OS’,
@step_id=2,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0@subsystem=N’CmdExec’,
@command=N’whoami.exe’,
@flags=16
IF (@@ERROR <> OR @ReturnCode <> 0GOTO QuitWithRollback
EXEC @ReturnCode msdb.dbo.sp_update_job @job_id @jobId@start_step_id 1
IF (@@ERROR <> OR @ReturnCode <> 0GOTO QuitWithRollback
EXEC @ReturnCode msdb.dbo.sp_add_jobserver @job_id @jobId@server_name N'(local)’
IF (@@ERROR <> OR @ReturnCode <> 0GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT 0ROLLBACK TRANSACTION
EndSave:

GO