Yep, another Agent blog. The topic of job owner has been discussed here before, but this is a specific situation, which potentially is a ticking bomb. First a quick recap of how agent does things:
When a job is owned by somebody who is sysadmin, then no impersonation is performed.
- T-SQL jobsteps will be executed as Agent’s own Windows authenticated login.
- Other jobsteps (which operates as the OS level) is done in Agent’s service account’s context.
Above is, assuming you didn’t ask for impersonation for the jobstep (Run As).
Last week, at a SQL Server admin class, Andreas Jarbéus from Swedish Match asked about job owned by Windows account and what happens when that person leaves the company. I was about to say “no problem, if the job owner is sysadmin”, but I felt there was a bit more to it. The scenario is:
- The job owner is member of a Windows/AD group. Say you have a “SQL Server DBA” group in your AD.
- This group is added as a login to your SQL Servers. The persons in that group do not have individual logins in SQL Server – only through this group.
- This group is sysadmin. (I’m not saying that having jobs in general owned by sysadmins is a best practice. You can for instance have vendors installing some software and you don’t want to make them sysadmin. In such a case, clearly we don’t want the job to keep running if that person would disappear. So, I’m targeting jobs owned by sysadmin here, and I would bet you have bunch of those.)
- Somebody in this group creates an Agent job.
- The person in the group leaves the company and is removed from this Windows/AD group.
Above is a pretty common scenario. The thing now is that the job no longer works. And here’s the reason:
If the owner was somebody in Windows/AD who’s SID exists in SQL Server (the individual was added as a login to SQL Server), then the job will still work. This also applies for SQL Logins which might be disabled (like “sa” for instance). The reason for this is that the SID for the job owner exists in sys.server_principals and SQL Server will know that this SID is sysadmin. Since Agent won’t do impersonation for sysadmins, there’s no need to “go out to Windows” and see if this SID exists in Windows/AD. Remove the login, and the job will stop working, though – but you have at least don’t something explicitly inside your SQL Server to cause this (removed the login).
But when the owner’s SID don’t exist in sysadmin we have a problem. The problem is that the recorded owner of the job is the SID for the Windows user, even though that SID doesn’t exist in sys.server_principals. As long as this still exists in Windows/AD, the job will work just fine. Agent will run this job owned by ?, ask Windows who this SID is and see that this SID is a member of a group which exists as a login in SQL Server and that is sysadmin. Now, imagine what happen if the SID doesn’t exist in Windows anymore. The job is owned by ?, and that is all that SQL Server knows. The SID no longer exist in Windows/AD so you there’s nobody to tell SQL Server “Yeah, this used to be a sysadmin in SQL Server – through a group membership in Windows/AD”.
Even worse, the job seems to keep working until you re-start the SQL Server service, some caching going on. Makes it even more difficult to determine why the job suddenly stops working: “We deleted this Windows account 4.5 months ago.” This is easy to repro, and you don’t even need a domain (local Windows accounts and groups work just fine):
- Create a Windows group for this.
- Create a Windows account for this.
- Add the Windows account to above group.
- Add above group as a login to SQL Server.
- Make this group sysadmin.
- Create a job, with two jobsteps:
- T-SQL which executes: PRINT ORIGINAL_LOGIN() + ‘ | ‘ + SUSER_SNAME() + ‘ | ‘ + USER_NAME()
- CmdExec which executes: WHOAMI.EXE
- (Above isn’t really necessary, but useful for testing other impersonation aspects of Agent)
- Specify the Windows account as owner of the job.
- Run the job, make sure it works fine.
- Remove the Windows account from the Windows group.
- Run the job and see it fails.
I’m sure this has been blogged before, but I was a bit surprised by it. I have now added a check for this in my “Check best practices” stored procedure I run every day… On the flip side, this doesn’t seem to happen for multi-server (MSX) jobs, the ownership seems to have special handling for these – you will see if you check the owner_sid in sysjobs.