Do you remember Database Gateways?

I replied to a question in a forum today about intercepting and changing a SQL query, before it reaches SQL Server. I started typing a reply, and realized that… wow… “Does this take me on a trip down memory lane, or what!”. So, just for fun, here is what I wrote (slightly modified):

There used to be products on the market, called “Database Gateways”, something that looks like one type of DBMS but actually accepts the SQL queries and submits them to some other type of DBMS. Basically an “in between” piece of software. For instance I used to work with such a product, which looked like a SQL Server but submitted the queries to IBM’s mainframe DB2.

That market pretty much disappeared around when ODBC was born. Yes, we are talking ancient here, around 1992.

There was a leftover from this time period in SQL Server, called “Open Data Services”, which was basically an API allowing you to write something that looks like a SQL Server. There was even a sample app which allowed you to intercept the SQL queries and then re-submit them to a SQL Server.

This API was in turn transformed over time to what became known as “extended stored procedures”, which in turn was deprecated with SQL Server 2005 and the SQL CLR support.

Mirroring: what happens if principal loses contact with both mirror and wittness?

Imagine a database mirroring setup where you have two LANs with a WAN link in between. Let’s call them site A and site B. Say that principal is currently in site A, and both mirror and witness are in site B. I.e., we are running database mirroring with a witness, and assuming we are running safety FULL (synchronous), we have auto-fail over.

Now, what is really fail over when it comes to mirroring? the simple answer is that the mirror will finish the recovery process (UNDO) and make the database available. So, what if the mirror and witness becomes isolated from the principal? In out setup, this would happen is site A is disconnected from site B. Well, the fail-over protocol is as follows:

If the mirror cannot contact the principal, it asks the witness if it has contact with the principal. If the witness responds and says it doesn’t have contact with the principal, then they consider this a quorum. They are in majority and fail over occurs. I.e., the mirror will make the database available (and essentially becomes the principal).

But, think about the poor principal now. It is isolated and knows nothing about what happens at the other end. The only logical thing to do is to make the database inaccessible. If that didn’t happen, then we would have a case where both sites had the database available, and we wouldn’t want that!

This is nothing new, and it is clearly stated in Ron Talmage’s great whitepaper on mirroring: http://technet.microsoft.com/en-us/library/cc917680.aspx.

This is easy to test. Setup mirroring (you can even do it between three instances in the same OS). Make sure you are running safety full and have a wittness. Now you check what Windows process ID the principal has:

SELECT SERVERPROPERTY(‘ProcessID’)

Based on above, you now use Task Manage to kill the *other* two SQL Servers. Make sure that SQL Server Agent isn’t running for those two, else agent will start those SQL Servers again. Management Studio will now show the database as “In Recovery”, and if you try to access it, for instance from a query window, you get below error:

Msg 955, Level 14, State 1, Line 1
Database Adventureworks2008 is enabled for Database Mirroring,
but the database lacks quorum: the database cannot be opened.
Check the partner and witness connections if configured.

Are log records removed from ldf file for rollbacks?

Seems like a simple enough question, right? This question (but more targeted, read on) was raised in an MCT forum. While the discussion was on-going and and I tried to come up with answers, I realized that this question are really several questions. First, what is a rollback? I can see three different types of rollbacks (there might be more, of course):

  1. Regular rollback, as in ROLLBACK TRAN (or lost/terminated connection)
  2. Rollback done by restore recovery. I.e., end-time of backup included some transaciton which wasn’t committed and you restore using RECOVERY, so SQL Server need to rollback this transaction (UNDO).
  3. Rollback done by crash (restart) recovery. I.e. SQL Server was shut down with some open transaction.

I’m going to try to show whether log records are removed or still present for these three types of rollback situations. I will use the fn_dblog function. This isn’t documented or supported, but search the internet and you will find how to use it. The result isn’t documented either, of course, so we have to guess a bit what the various values mean…

The TSQL script has some common parts (the same part executed repeatedly, once for each test):

  1. Create a database
  2. Make sure it is in full recovery
  3. Do a database backup
  4. Create a table (myTable)
  5. Insert a row into myTable. This last operation generates 5 log records for the myTable table: one for the PFS page, two for IAM pages, one format page for the heap and the last one is a LOP_INSERT_ROWS.
  6. Start a transaction
  7. Insert one more row into myTable. We now have one more log record for myTable (LOP_INSERT_ROWS). Looking at the transaction id for this last insert, we see two log records (one LOP_BEGIN_XACT and the LOP_INSERT_ROWS). Note that this transaction is now open!

Here is the above mentioned first part of the script:

USE master
IF DB_ID(‘x’IS NOT NULL DROP DATABASE x
GO
CREATE DATABASE x
GO
ALTER DATABASE SET RECOVERY FULL
BACKUP DATABASE 
TO DISK = ‘C:\x.bak’ WITH INIT

USE x
CREATE TABLE myTable(c1 INT IDENTITY)
INSERT INTO myTable DEFAULT VALUES

SELECT 
FROM fn_dblog(NULL, NULL) WHERE AllocUnitName LIKE ‘%myTable%’
–5 rows

BEGIN TRAN
INSERT INTO 
myTable DEFAULT VALUES

SELECT 
FROM fn_dblog(NULL, NULL) WHERE AllocUnitName LIKE ‘%myTable%’
–6 rows

SELECT FROM fn_dblog(NULL, NULL) 
WHERE [Transaction ID] (SELECT TOP(1[Transaction ID] 
                          
FROM fn_dblog(NULL, NULL)
                          
WHERE AllocUnitName LIKE ‘%myTable%’
                          
ORDER BY [Current LSN] DESC)
–2 rows

Now, on to the different cases: 

1. Regular rollback, as in ROLLBACK TRAN.
It seems reasonable to me that SQL Server will just add some “rollback transaction” log record here. So, let’s try that (continuing on above first part)… We now have 7 log records for myTable, with an added LOP_DELETE_ROWS which undo the previously insert. And for our transaction ID, we have 4 rows, with added two rows being LOP_DELETE_ROWS (compensation log record) and a LOP_ABORT_XACT.

–Regular ROLLBACK
ROLLBACK TRAN

SELECT 
FROM fn_dblog(NULL, NULL) WHERE AllocUnitName LIKE ‘%myTable%’
–7 rows

SELECT FROM fn_dblog(NULL, NULL) 
WHERE [Transaction ID] (SELECT TOP(1[Transaction ID] 
                          
FROM fn_dblog(NULL, NULL)
                          
WHERE AllocUnitName LIKE ‘%myTable%’
                          
ORDER BY [Current LSN] DESC)
–4 rows

2. Restore recovery
Now, what does this really mean? Restoring a transaction log backup is a good example. SQL Server read log records from the transaction log backup file and writes them into the LDF file. This is what we call the “data copy” phase. Then SQL Server performs REDO (a.k.a. roll forward). And finally, SQL Server performs UNDO (roll back), assuming we don’t do the restore using the NORECOVERY option. Restoring from a database backup isn’t any differene except the log records are of course read from the database backup file.

Here it seems likely that SQL Server will wipe more or less anything from the LDF file as soon as the database is restored and brought on-line. Why? The log in this newly restored database can’t serve as a starting point for a restore operation for this database. You first need a database backup. So, no use hanging onto log records either! Let’s see if we can verify that:

–Restore recovery

–Do this from different connection
BACKUP LOG TO DISK = ‘C:\x.bak’
–We now have open transaction in database!

–Perform RESTORE
ROLLBACK
USE 
master
RESTORE DATABASE FROM DISK = ‘C:\x.bak’ WITH REPLACENORECOVERY
RESTORE LOG FROM DISK = ‘C:\x.bak’ WITH FILE = 2RECOVERY

–Rollback was done, and database berought online. 
–What log records do we have?

SELECT FROM fn_dblog(NULL, NULL) WHERE AllocUnitName LIKE ‘%myTable%’
–0 rows

–Use the transaction ID from earlier SELECT
SELECT FROM fn_dblog(NULL, NULL) 
WHERE [Transaction ID] ‘0000:00000203’
–0 rows

So, SQL Server will remove user-defined stuff from LDF file after restore recovery was performed. Makes sense.

3. Crash recovery (a.k.a. restart or startup recovery)
I couldn’t really guess here. So, lets give it a spin immediately and see:

–Crash recovery

–Do below from separate connection
SHUTDOWN WITH NOWAIT

— startup SQL Server and examine the log records:

SELECT FROM fn_dblog(NULL, NULL) WHERE AllocUnitName LIKE ‘%myTable%’
–5 rows

–Use the transaction ID from earlier SELECT
SELECT FROM fn_dblog(NULL, NULL) 
WHERE [Transaction ID] ‘0000:00000204’
–0 rows

Hmm, so SQL Server removes the log record after the rollback was performed. Makes sense.

But this got me thinking some more. How can this be done… physically? SQL Server would just “reverse” the head of the log a bit. But what if we have later transactions for other connections, which has been committed? SQL Server can’t ignore those, of course. These need to be kept in the LDF file for subsequent log backups. OTOH, I doubt that SQL Server will somehow physically delete things “in the middle” of an ldf file. Time for yet another test:

–Crash recovery, part two

–Do below from separate connection
–First something which creates more recent log records
CREATE TABLE y(c1 INTINSERT INTO y(c1VALUES(1)
SHUTDOWN WITH NOWAIT

— startup SQL Server and examine the log records:

SELECT FROM fn_dblog(NULL, NULL) WHERE AllocUnitName LIKE ‘%myTable%’
–7 rows

–Use the transaction ID from earlier SELECT
SELECT FROM fn_dblog(NULL, NULL) 
WHERE [Transaction ID] (SELECT TOP(1[Transaction ID] 
                          
FROM fn_dblog(NULL, NULL)
                          
WHERE AllocUnitName LIKE ‘%myTable%’
                          
ORDER BY [Current LSN] DESC)
–4 rows

Now the log records for our rolled back transaction are still there! So, just as when we did a regular rollback, SQL Server inserted a LOP_DELETE_ROWS to reflect the undo of the INSERT, and then a LOP_ABORT_XACT.

Conclusion
Isn’t it beautiful when it all makes sense? Here are my conclusions, whether log records are kept or removed from transaction log file (LDF) for various types of rollback scenarios:

  • Regular rollback. Log records are not removed. Compensation log records are logged, reflecting undo of the modifications, and then an LOP_ABORT_XACT is logged.
  • Restore recovery. Log records are removed.
  • Crash recovery. It depdends. If you have a transaction which is at the very head of the log, then those log records can be removed. If there are other, subsequent committed transactions, then compensation log records are logged, reflecting undo of the modifications, and then a LOP_ABORT_XACT is logged.

SQL Client config, 32 and 64 bit

Say you want to change the preferred netlib connection order. Or add a server alias.

You can do this using the “SQL Server Configuration Manager” program. But installing this on each client machine where you want to do the modification might not feel that attractive.

Another option is to use a tool like regmon while doing the config on a reference machine, sniff the registry modifications and then shoot these out to the client machines. This might be overkill, though.

Yet another option is to use the cliconfg.exe tool, which ship with Windows. This tool is already available on your machine. However, on a 64 bit machine, you need to consider whether the client app is a 32 or 64 bit app. The processor architectore for that app will determine where in the registry the app (client libraries) will look. Below is my results from working on a x64 XP installation.
64 bit (native): HKLM\SOFTWARE\Microsoft\MSSQLServer\Client\SuperSocketNetLib
32 bit (WOW): HKLM\SOFTWARE\WOW6432Node\Microsoft\MSSQLServer\Client\SuperSocketNetLib

Razvan Socol was kind enough to enlighten me, a while ago, of the fact that runnig the 64 bit version of cliconfg.exe will modify in the “64 bit” registry entry (as per above) and vice versa for the 32 bit version of cliconfg.exe. Razvan mentioned that starting cliconfg.exe from Run will start the 64 bit version, and from a 32 bit app (Explorer for instance – which I couldn’t find how to do, but I’m sure somebody will enlighten me) will start the 32 bit version.

Above made me wonder in what folder each file is. Here are my findings (on the test machine I was using – a pretty clean XP x64 machine):

64 bit version of cliconfg.exe: C:\Windows\System32
32 bit version of cliconfg.exe: C:\Windows\SysWOW64

(And, before you ask, no, above is not a typo. There is some logic behind this. 🙂 )

Watch out for that autogrow bug

Under some circumstances, autogrow for database files can be set to some 12000 percent. I think this is limited to SQL Server 2005 and for databases upgraded from SQL Server 2000 (I didn’t bother to search – feel free to comment if you know). So, if you have a reasonably sized database and autogrow kicks in, you can do the maths and realize that pretty soon you are out of disk space.

I wrote a proc that I schedule that check for out-of-bounds values in sys.database files. The proc generates a bunch of messages it prints (handy if you have as Agent job with output file) and also constructs an error message and does RAISERROR (handy if you implemented alerting, for instance according to http://karaszi.com/agent-alerts-management-pack).

I prefer to schedule below as Agent job and use Agent alerts to notify me if we do have db with autogrow out-of-whack. As always, don’t use code if you don’t understand it.

USE maint
GO

IF OBJECT_ID('check_autogrow_not_percent'IS NOT NULL DROP PROC check_autogrow_not_percent
GO

CREATE PROC check_autogrow_not_percent
AS
DECLARE 
 @db sysname
,@sql NVARCHAR(2000)
,@file_logical_name sysname
,@file_phyname NVARCHAR(260
,@growth VARCHAR(20)
,@did_exist bit
,@msg NVARCHAR(1800)
,@database_list NVARCHAR(1000)

SET @did_exist CAST(AS bit)
SET @database_list ''

--For each database
DECLARE dbs CURSOR FOR
 SELECT name FROM sys.databases
OPEN dbs
WHILE 1
BEGIN
  FETCH NEXT FROM dbs INTO @db
  IF @@FETCH_STATUS <> BREAK

  SET @sql 'DECLARE files CURSOR FOR
 SELECT CAST(growth AS varchar(20)), physical_name, name
 FROM ' QUOTENAME(@db) + '.sys.database_files
 WHERE is_percent_growth = 1
 AND growth > 20'
  EXEC(@sql)
  OPEN files
  WHILE 1
  BEGIN
    FETCH NEXT FROM files INTO @growth@file_phyname@file_logical_name
    IF @@FETCH_STATUS <> BREAK
    SET @did_exist CAST(AS bit)
    SET @database_list @database_list '["' @db '": "' @file_logical_name '"]' CHAR(13) + CHAR(10)
   SET @msg 'Out-of-band autogrow in database "' @db '"' 
   ' with growth of ' @growth +
   ', logical file name "' @file_logical_name '"' 
   ', physical file name "' @file_phyname '"' 
   '.'
   RAISERROR(@msg101WITH NOWAIT
  END
  CLOSE files
  DEALLOCATE files
END
CLOSE dbs
DEALLOCATE dbs
IF @did_exist CAST(AS bit)
  BEGIN
   SET @msg 'Databases with out-of-control autogrow in databases: ' CHAR(13) + CHAR(10) + @database_list
   RAISERROR(@msg161WITH LOG
  END
GO

Remove transaction log files

Say you happened to get too many transaction log (ldf) files. Can you remove log files from the database? Yes, but only if a file isn’t in use, and you cannot remove the first (“primary”) log file.

So, be prepared to investigate the virtual file layout, using DBCC LOGINFO, to see if a log file is in use or not. You can find information about how to investigate the virtual log file layout in my shrink article. The basic steps are a bit similar to shrinking a log file: Investigate virtual log file layout, backup log, possibly shrink file, try removing it. Do this again as many times as it takes (repeat, rinse and lather).

Below is a script that, if you take the time to study it and play with it, will prepare you to remove transaction log files from a database. As always, don’t execute it if you don’t understand what it does!

USE master
IF DB_ID('rDb'IS NOT NULL DROP DATABASE rDb
GO

CREATE DATABASE rDb
ON
PRIMARY
NAME N'rDb'FILENAME N'C:\rDb.mdf' SIZE 50MB FILEGROWTH 1024KB )
LOG ON
(NAME N'rDb_log2'FILENAME N'C:\rDb_log2.ldf'SIZE 3MBFILEGROWTH 2MB)
,(NAME N'rDb_log3'FILENAME N'C:\rDb_log3.ldf'SIZE 3MBFILEGROWTH 2MB)
,(NAME N'rDb_log4'FILENAME N'C:\rDb_log4.ldf'SIZE 3MBFILEGROWTH 2MB)
GO

ALTER DATABASE rDb SET RECOVERY FULL
BACKUP DATABASE rDb TO DISK = 'C:\rDb.bak' WITH INIT
CREATE TABLE rDb..t(c1 INT IDENTITYc2 CHAR(100))

INSERT INTO rDb..t
SELECT TOP(15000'hello'
FROM syscolumns AS a
CROSS JOIN syscolumns AS b

--Log is now about 46% full
DBCC SQLPERF(logspace)

--Check virtual log file layout
DBCC LOGINFO(rDb)
--See that file 4 isn't used at all (Status = 0 for all 4's rows)

--We can remove file 4, it isn't used
ALTER DATABASE rDb REMOVE FILE rDb_log4

--Check virtual log file layout
DBCC LOGINFO(rDb)

--Can't remove 3 since it is in use
ALTER DATABASE rDb REMOVE FILE rDb_log3

--What if we backup log?
BACKUP LOG rDb TO DISK = 'C:\rDb.bak'

--Check virtual log file layout
DBCC LOGINFO(rDb)
--3 is still in use (status = 2)

--Can't remove 3 since it is in use
ALTER DATABASE rDb REMOVE FILE rDb_log3

--Shrink 3
USE rDb
DBCC SHRINKFILE(rDb_log3)
USE master

--... and backup log?
BACKUP LOG rDb TO DISK = 'C:\rDb.bak'

--Check virtual log file layout
DBCC LOGINFO(rDb)
--3 is no longer in use

--Can now remove 3 since it is not in use
ALTER DATABASE rDb REMOVE FILE rDb_log3

--Check explorer, we're down to 1 log file

--See what sys.database_files say?
SELECT FROM rDb.sys.database_files
--Seems physical file is gone, but SQL Server consider the file offline

--Backup log does it:
BACKUP LOG rDb TO DISK = 'C:\rDb.bak'
SELECT FROM rDb.sys.database_files

--Can never remove the first ("primary") log file
ALTER DATABASE rDb REMOVE FILE rDb_log2
--Note error message from above

sp_altermessage is back in business!

Just a quick note that we again can modify whether system messages are to go to eventlog/errorlog again. I.e., we can change the is_event_logged column in sys.messages. This is very valuable in general and specifically is you want to define Agent alerts (for which Agent polls the Eventlog). For instance:

SELECT * FROM sys.messages
WHERE message_id = 1205
AND language_id = 1033

Notice the value for the is_event_logged column. Now, run below:

EXEC sp_altermessage
 @message_id = 1205
,@parameter = ‘WITH_LOG’
,@parameter_value = ‘true’

Now, re-run the select statement and see that you modified the behavior for the system message. Now, re-run the sp_altermessage with ‘false’ to reset to default.

The ability to modify this behavior for system messages was available prior to SQL Server 2005, but some re-architecturing in 2005 removed the functionality. kozloski informed me in this blog post that 2005 sp3 re-introduced the functionality and obviously as of sp1 the functionlity is back in 2008 as well.

 

Is it only me or are "uncomfortable" answers largely ignored?

I do quite a bit of posting on forums, and I have noticed that in many cases when you take your time to explain how something work, or why you should do something perhaps in a not so obvious way, that answer tend to be ignored and never replied to.

Instead the original poster replies to the short answer which perhaps ignored things like best practices, performance, readability etc. And, very often I see endless discussion based on that reply, where the original poster never gets a satisfactory solution or even perhaps a working solution. And there I posted an elaboration which held the answer days ago! I also explained why solution X doesn’t work and solution Y only work in some circumstances etc.

In many cases, we have taken the time to write articles or blog posts where we have predicted most misunderstandings and explain why you should do it this or that way and what drawbacks workaround A has, and also what drawbacks workaround B has. So, we post the URL to the article only to find that it apparently never was read because the following discussions in the thread goes on for days and waste time to elaborate on things that could be understood by spending some 10 – 20 minutes to actually read the article or blog post we pointed to.

I find this state mildly depressing. Have we reached the sad state where we don’t tend to be interested in why something works, how it works, or even understand what we are actually doing? Quick fixes, anyone? Only 10 cents each!

Is this something new? I’ve been on forums for > 10 years now, let me try to think….. I believe that tendency has always been there, but perhaps it has been getting worse with time. What are your experiences? Am I only having a bad day?

Where’s that sys.messages management pack?

I’ve often been thinking that “somebody” should produce a list and perhaps some ready-made TSQL code of the SQL Server error messages we want to be alerted for. Something like a Management Pack, but not tied to any particular sys mgmt software. Essentially, what error messages would we typically want to be alerted for, and some ready made TSQL code which defines an alert (msdb.dbo.sp_add_alert, see also http://www.sqlskills.com/BLOGS/PAUL/post/Easy-monitoring-of-high-severity-errors-create-Agent-alerts.aspx) and operator for each of those.

Honestly, how many of you have been going through all SQL Server messages and determined which of those we want to be alerted for? Sure, we can decide to specify alerts for all high-severity level messages (like 19, 20, 21, 22, 23, 24 and 25). But unfortunately the severity classification isn’t as coherent and we would hope.

For instance, Paul Randal just recently blogged about error 825, which has as low severity as 10! Do we want to be alerted for 825? You bet! Check out Paul’s blog at http://www.sqlskills.com/BLOGS/PAUL/post/A-little-known-sign-of-impending-doom-error-825.aspx.

We can, however make it a little bit easier for us. Since Agent detects the message by sniffing the eventlog, we can immediately discard all messages which aren’t written to the eventlog (there’s no use defining alert for something which the alerting mechanism never detects, right?):

SELECT 
FROM sys.messages
WHERE language_id 1033
AND is_event_logged 1
ORDER BY severity

Now we are down to 699 messages. It used to be that we could configure whether a message should be written to eventlog (sp_altermessage), but this functionality disappeared in 2005. I haven’t checked Connect, but my guess is that if there is such a wish/entry in the first place, it doesn’t have too many votes.

My suggestion is that we create some small community project somewhere where we add what messages we want to be alerted for. This can be as simple as a table with error numbers and some insert, and then we use this to generate our sp_add_alert commands based on that. Question is where to do this, so it can be maintained and refined over time? Or if it has already been done by somebody…

 

Troubleshooting login problems

I’ve been learning a few things about troubelshooting login problems over the last few months. I first want to say that I’ve learned most of this through other blogs, comments etc. At the end of this blog post you will find an URL list where you can look for more details.

It started a few months back where I somewhere read a blog stating that we can use the state in a “login failed” error message to determine why the login failed. As you can imagine, there can be several reasons for login failure (user doesn’t exist, pwd doesn’t match, windows login name isn’t a trusted login etc). For some reason, I didn’t save that URL and of course I needed it some time later and couldn’t find it.

One place where you can find such a list of states is… Books Online. BOL 2008 documents every error number, so it is a matter of knowing what error number to search for: 18456. You can also search for “login failed” (pretty easy to remember) and the first hit is a different page (from above) but with similar information. I just tried a search in 2005 BOL for the same and had similar hits. This information might have been added to BOL 2005 in a more recent BOL 2005 release, though.

Now, don’t be surprised if the client app only receive state 1 for the login failed messages. This is a good thing, and on purpose. We don’t want the system to disclosure too much information to a malicous user (app) about why a login fails. So what we do is to look in the errorlog or for the “real” state (the EventLog doesn’t seem to carry this information, for some reason). Below is an example from SQL Server 2005, sp2, with timestamps removed:

Error: 18456, Severity: 14, State: 5.
Login failed for user ‘gurka’. [CLIENT: <local machine>]

So, state 5 tell us, according to SQL Server 2005 BOL, that “User ID is not valid.”.

Now, doing the same in SQL Server 2008 was interesting. Looking in the EventLog, we do see an explanation as to why the login failed. We still don’t see the state in the EventLog, but we have an explanation so we might not need the state. And in the errorlog file, I had below message:

Error: 18456, Severity: 14, State: 5.
Login failed for user ‘gurka’. Reason: Could not find a login matching the name provided. [CLIENT: <local machine>]

So, not only do we have a state, we also have a decription for why the login failed. Is it pathetic that such a thing can make you happy? Perhaps it is… 😉

Here are some other blog posts on the subject:
http://blogs.msdn.com/sql_protocols/archive/2006/02/21/536201.aspx
http://jmkehayias.blogspot.com/2008/03/understanding-login-failed-error-18456.html