Backup compression in SQL Server 2008

Having a few moments to spare, I decided to give this a spin.

Specifying for the backup to be compressed is really simple. Just specify COMPRESSION in the WITH clause of the BACKUP command. For example:

BACKUP DATABASE Adventureworks
TO DISK = 'C:\Advc.bak'
WITH INITCOMPRESSION

For fun, I compared backup file size and backup time between compressing and not compressing. I ran below after priming the cache (not that it should matter since backup read pages from disk, see http://sqlblog.com/blogs/tibor_karaszi/archive/2007/08/03/does-backup-utilize-pages-in-cache.aspx). I also deleted the backup files, if they exist, before execution.

DECLARE @dt datetime

SET @dt GETDATE()
BACKUP DATABASE Adventureworks
TO DISK = 'C:\Adv.bak'
WITH INIT
SELECT DATEDIFF(ms@dtGETDATE())

SET @dt GETDATE()
BACKUP DATABASE Adventureworks
TO DISK = 'C:\Advc.bak'
WITH INITCOMPRESSION
SELECT DATEDIFF(ms@dtGETDATE())

Size of backup file is 129 MB vs. 35 MB. Of course, compression ratio varies depending on what type of data there is in the database (string data tend to compress better than other data,. for instance). Time to execute commands were 7.5 seconds vs. 3.8 seconds. Now, this is a virtual machine on VPC and Adventureworks is a tiny database. But at least we can see clear potential for savings here.

So how do we know if a database backup was compressed? If you’ve used RESTORE HEADERONLY, you probably noticed this annoying column called “Compressed”. Well, this is the first time we see a value of “1” in that column.

It seems we cannot mix compressed and non-compressed backups on the same file, though. (Not that I often have multiple backups on the same file.) If I append a backup using COMPRESSION on a file were there already are non-compressed backup, I get an error. Or I do a backup without COMPRESSION on a file where there are compressed backups, the new backup will be compressed (even when not saying COMPRESSION). Something to look out for if you have several backups on the backup files.

So what about the RESTORE command? Well, there’s nothing to say, really. You don’t have to specify in the RESTORE command that the backup was compressed. Time for RESTORE was 10.4 vs 6.3 seconds (with the destination database already existing). I expect the difference to be bigger on real installation and realistic db size.

Owner/schema qualifying object names

It isn’t easy to remember all the details regarding object/schema qualifying object names. There are many variables involved, such as:

  • Version of SQL Server
  • Qualifying the owner when you call the proc (EXEC dbo.procname)
  • Qualifying object references inside the proc code
  • Qualifying object names if you aren’t using procedures
  • Whether the user is the same as the owner of the proc/table
  • Default schema for the user

So, I decided to give it a spin for different combinations and investigate both profiler events and also number of plans in cache. I won’t post all details here, that would be too much to write down. You can use the scripts at the end to do your own findings. I did not investigate the differences regarding compile locks. Anyhow, here are my conclusions:

  • I could not produce any recompile (SP:Recompile or SQLStmtRecompile) event for any combbination.
  • I did find SP:CacheMiss events on 2000 when you execute a proc and don’t qualify the proc name (for 2005 I always got those events). Then a subsequent SP_CacheHit will follow.
  • For straight SQL (no procedures) I noticed that each user get its separate plan when you don’t owner-qualify the table name. This makes sense. An interesting aspect on 2005 was that if you specify a default schema for the user (and two users has the same default schema), then the users will share the plan (basically the default schema becomes the “owner” of the plan).

Below are the scripts I used:

--Login as sysadmin:
USE master
IF DB_ID('myTestDb'IS NOT NULL DROP DATABASE myTestDb
IF EXISTS(SELECT FROM syslogins WHERE name 'Kalle'EXEC sp_droplogin 'Kalle'
IF EXISTS(SELECT FROM syslogins WHERE name 'Olle'EXEC sp_droplogin 'Olle'
GO
EXEC sp_addlogin 'Kalle''*hjk&6f' EXEC sp_addlogin 'Olle''*hjk&6f'
CREATE DATABASE myTestDb
GO
USE myTestDb
EXEC sp_grantdbaccess 'Kalle' EXEC sp_grantdbaccess 'Olle'
GO
CREATE TABLE dbo.t(c1 int identity PRIMARY KEYc2 char(30))
INSERT INTO dbo.t SELECT TOP 1000 'hello' FROM sysobjects a CROSS JOIN sysobjects b
CREATE INDEX ON t(c1)
GO
CREATE PROC dbo.p AS SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
CREATE PROC dbo.p_q AS SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
GRANT EXECUTE ON dbo.p TO KalleOlle
GRANT EXECUTE ON dbo.p_q TO KalleOlle
GRANT SELECT ON TO KalleOlle

--Number of plans in cache, run after executions of proc
SELECT OBJECT_NAME(objid), sqluid, *
FROM master..syscacheobjects
WHERE dbid DB_ID()
AND 
cacheobjtype 'Compiled Plan'
AND sql NOT LIKE '%PSTATMAN%'

--Run this three times, logged in as sysadmin (dbo), Kalle and Olle
USE myTestDb
GO
EXEC dbo.p
GO
EXEC dbo.p_q
GO
EXEC p
GO
EXEC p_q
GO
SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
USE master

Sorted views…

Here’s one that pops up regurarly. Something like:

“My views are no longer ordered in SQL Server 2005. I have ORDER BY in the view, but when I select from it, the rows are not returned according to the ORDER BY.”

Since I tend to post a reply often enough to above type of posts, I decided to put it in writing once and for all, so I can point to this blog post. (Laziness is a virtue 😉 .)

A view is by definition not sorted. A view is supposed to behave like a table (and thanks to that we have the same language to operate against views as we have to operate against tables). This is why ORDER BY is not by itself allowed in a view definition.

It is allowed to have ORDER BY if you also have TOP, but the purpose of ORDER BY is now to make sure that the correct rows are returned (5 most expensive books, for instance), not in any particular order.

At some point in time, the “workaround” to have TOP 100 PERCENT popped up. The thinking was that we now are allowed to have ORDER BY, which makes the view “sorted”. It doesn’t. However, SQL Server 2000’s optimizer wasn’t as smart as 2005’s optimizer, so for simple queries, you often found that data was returned according to your ORDER BY.

SQL Server 2005’s optimizer is smarter, and if it finds TOP 100 PERCENT and ORDER BY it realizes that both these operations doesn’t affect *which* rows to return, so both operations are removed from the execution plan. That is smart, since these doesn’t affect the data anyhow. Remeber that a view is by definition not sorted.

Some developers has relied on the 2000 behavior to create “sorted views”. This behavior was never documented, just a side effect of the execution plan, and was never guaranteed.

So what do we do? Well, same as when we run queries against a table. We have ORDER BY when we read data from the view!

Aside: The query builder which you can use when you create a view has a very strange behavior. It allow you to check a “sort” checkbox, and it will then add ORDER BY and TOP 100 PERCENT to the SELECT statement. According to above explanation, this is kind of silly, but I’m hoping that MS will tidy this up for the next version of SQL Server.

Find table and index name for fragmented indexes

Got this question from a newsgroup today. The answer is pretty simple, just use the dynamic management view sys.dm_db_index_physical_stats. I’m posting this here mostly so I have somewhere to refer to when asked this question…

I prefer to have a helper function to get the index name:

CREATE FUNCTION dbo.index_name (@object_id int@index_id int)
RETURNS sysname
AS
BEGIN
RETURN
(SELECT name FROM sys.indexes WHERE object_id @object_id and index_id @index_id)
END;
GO

And then a simple query:

SELECT
OBJECT_NAME(object_idAS tblName
,dbo.index_name(object_idindex_idAS ixName
,avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL)
WHERE avg_fragmentation_in_percent 20
AND index_type_desc IN('CLUSTERED INDEX''NONCLUSTERED INDEX')

Then you just adjust the search clause to your liking. One hint is to exclude nindexes with few pages (the page_count column).

 

 

 

TSQL script to do backup like Maintenance Plans

This is frequently asked question, both in forums, as well as in class:

How can I though TSQL create new backup files for each backup and have date and time in the file names?

The reason for above is most often that you want a higher degree of customization than what maint plans allow for. An obvious such in 2005 is to add the CHECKSUM option to the BACKUP command. I know there already exist several script like mine “out there”, this is just my simple version so I remember where to point whenever I get this question next. 🙂

Since I probably have some bugs in it, I have it on my website. This way I can fix the bugs as needed. You are most welcome to report bugs in the script if you spot any! And here’s the URL:

http://karaszi.com/tsql-script-to-create-backup-file-including-database-name-date-and-time

Xp_cmdshell and permissions

This blog post was inspired from a newsgroup discussion. The question basically is:
What do you need to do in order to use xp_cmdshell?

Note that there are obvious security implications of doing this. (I’m not recommending usage of xp_cmdshell in general, this is a technical blog post!) We first need to think about what happens here, from an architectural level:

Somebody who has logged in to SQL Server executes xp_cmdshell. For this, SQL Server need to spawn a process in Windows. A process in Windows need to execute in a Windows user account.

So, what Windows account is used? If the SQL Server login who is executing xp_cmdshell is sysadmin, then SQL Server will use the service account (it will not “pretend to be somebody else”). But if the login isn’t sysadmin, then we need to configure what Windows account to be used (using sp_xp_cmdshell_proxy_account). Note that this configuration is the same for all non-sysadmins!

But there’s a little bit more to it. Below is an outline of what need to be done. Step 2 and 3 are only needed if the one who is to execute xp_cmdshell isn’t sysadmin. Note that the steps don’t have to be performed in the order listed below.

  1. We need to allow usage of xp_cmdshell in general (on 2005). Use “Surface Area Configuration” or sp_configure for this.
  2. We need to have a user in the master database which has execute permission on xp_cmdshell. If you are uncertain about the difference between logins and users, you should read up about it in BOL.
  3. We need to say what Windows account should be used when a non-sysadmin user is executing xp_cmdshell.

So, here’s the TSQL script that does all above:

–1, allow xp_cmdshell
EXEC sp_configure ‘xp_cmdshell’, 1
RECONFIGURE
GO

 

–2, grant permission to xp_cmdshell
USE master
CREATE LOGIN JohnDoe WITH PASSWORD = ‘jlkw#.6(‘

–Note, we are in the master database!!!
CREATE USER JohnDoe FROM LOGIN JohnDoe

–Run as login x
EXECUTE AS login = ‘JohnDoe’
–Below fails, no execute permission on xp_cmdshell
EXEC xp_cmdshell ‘DIR C:\*.*’
REVERT
GO

–Note, we are in the master database!!!
GRANT EXECUTE ON xp_cmdshell TO JohnDoe

–Try again
EXECUTE AS login = ‘JohnDoe’
–Execution of xp_cmdshell is allowed.
–But I haven’t configured the proxy account…
EXEC xp_cmdshell ‘DIR C:\*.*’
REVERT
GO

 

–3, specify the proxy account for non-syadmins
–Replace obvious parts!
EXEC sp_xp_cmdshell_proxy_account ‘Domain\WinAccount’,’pwd’
EXECUTE AS login = ‘JohnDoe’
–Execution of xp_cmdshell is allowed.
–And executes successfully!!!
EXEC xp_cmdshell ‘DIR C:\*.*’
REVERT

 

–Cleanup
EXEC sp_xp_cmdshell_proxy_account null

DROP USER JohnDoe
DROP LOGIN JohnDoe
EXEC sp_configure ‘xp_cmdshell’, 0
RECONFIGURE

 

Bulk logged recovery model and log backups

This post is inspired from a discussion about bulk logged recovery model and log backup. Some basics:

In bulk logged recovery model, some operations can be minimally logged, like:

  • SELECT INTO
  • Index create, rebuild and drop
  • Bulk loading of data

The interesting thing is that you can do a log backup after such a minimally logged operation. This log backup will contain both log records and also the data and index pages affected by the minimally logged operation.

You cannot do a point in time restore using STOPAT or similar for such a log backup (a log backup containing both log recorda and data pages), for obvious reasons. You can do a point in time restore of subsequent log backups, however, assuming that log backup doesn’t contain data pages (no bulk logged operation was done during the time span that the log backup cover).

You cannot do a log backup while in bulk logged mode if a bulk logged operation has occured and if the data files are lost. This is also pretty obvious since SQL Server cannot go and grab the data pages is needs – the data files are lost!

But what about if we have several file groups, we have bulk logged recovery model and performed a bulk logged operation; but the bulk logged operation doesn’t affect the data file which is lost? Theoretically, SQL Server could now allow such a log backup. So, lets give it a try:

In below script, we have two file groups: PRIMARY and SEC. We do a bulk logged operation which affects only PRIMARY, and then delete the file for the scondary file group. Then try a log backup. This fails, so the answer is that all data files need to be accessible in order to perform a log backup (in bulk logged recovery model and after a minimally logged operation). The reason becomes more obvious if we look at the message from such a successful log backup first:

Processed 72 pages for database ‘BulkBackup’, file ‘BulkBackup’ on file 1.
Processed 8 pages for database ‘BulkBackup’, file ‘BulkBackup2’ on file 1.
Processed 58 pages for database ‘BulkBackup’, file ‘BulkBackup_log’ on file 1.
BACKUP LOG successfully processed 138 pages in 0.098 seconds (11.530 MB/sec).

The file BulkBackup2 is a file which is unaffcted by the bulk logged operation. But still SQL Server goes and grab 8 pages from that file. The number “8” makes me suspect that this is the file header (first 8 pages on each file), needed for some reason. (I get the same error with both NO_TRUNCATE and CONTINUE_AFTER_ERROR.)

The findings are also implied by Books Online. Here’s a quote:

“If the database is damaged, for example, if the database does not start, a tail-log backup succeeds only if the log files are undamaged, the database is in a state that supports tail-log backups, and the database does not contain any bulk-logged changes.”

Above do not talk about individual files, it addresses the whole database.

Finally, here’s the script. Read it so you understand it before you execute it. It will drop a database named BulkBackup, if such exists. If you want, you can change recovery to FULL if you want to verify that log backup is indeed possible for a damaged database (in general terms…).

USE master
IF DB_ID(‘BulkBackup’) IS NOT NULL DROP DATABASE BulkBackup
GO
CREATE DATABASE BulkBackup
ON
PRIMARY
(NAME = N’BulkBackup’, FILENAME = N’C:\BulkBackup.mdf’ , SIZE = 20MB , FILEGROWTH = 10MB),
FILEGROUP SEC
(NAME = N’BulkBackup2′, FILENAME = N’C:\BulkBackup2.ndf’ , SIZE = 20MB , FILEGROWTH = 10MB)
LOG ON
(NAME = N’BulkBackup_log’, FILENAME = N’C:\BulkBackup_log.ldf’ , SIZE = 20MB , FILEGROWTH = 10MB)
GO
ALTER DATABASE BulkBackup SET RECOVERY BULK_LOGGED
ALTER DATABASE BulkBackup SET AUTO_CLOSE ON

USE BulkBackup
CREATE TABLE t(c1 int identity, c2 char(5)) ON SEC
INSERT INTO t SELECT TOP 10000 ‘hello’ FROM syscolumns a CROSS JOIN syscolumns b
CREATE TABLE tSec(c1 int identity, c2 char(5)) ON “PRIMARY”
INSERT INTO tSec SELECT TOP 10000 ‘hello’ FROM syscolumns a CROSS JOIN syscolumns b
BACKUP DATABASE BulkBackup TO DISK = ‘nul’

–Bulk logged operation
SELECT * INTO tSec2 FROM tSec

–Verify log backup possible
–after bulk logged operation
BACKUP LOG BulkBackup TO DISK = ‘nul’

–Bulk logged operation again
SELECT * INTO tSec3 FROM tSec

–Crash secondary database file
USE master
GO
WAITFOR DELAY ’00:00:05′
EXEC sp_configure ‘xp_cmdshell’, 1
RECONFIGURE
EXEC xp_cmdshell ‘DEL C:\BulkBackup2.ndf’
EXEC sp_configure ‘xp_cmdshell’, 0
RECONFIGURE

–Can we now do a log backup?
BACKUP LOG BulkBackup TO DISK = ‘nul’ WITH NO_TRUNCATE
–Apparently not. Error messages is:

–BACKUP LOG is not possible because bulk logged changes exist in
–the database and one or more filegroups are unavailable.

–Cleanup:
GO
IF DB_ID(‘BulkBackup’) IS NOT NULL DROP DATABASE BulkBackup

 

Is statistics over non-indexed columns updated by index rebuild?

Short answer: no.

This question came up today in the MCT discussion group. My gut instinct said no, but I wanted to test it to be certain. But first a brief background:

You can rebuild an index using DBCC DBREINDEX or in 2005 (and now preferred) ALTER INDEX … REBUILD. Rebuilding an index internally creates a new index and when that has been done, drops the old index.

So it is pretty obvious that we also get new statistics for that index (based on all data, not sampled, just as when we do CREATE INDEX). As an aside, reorganizing does *not* update the statistics…

But what about statistics over non-indexed columns? SQL Server can create this by itself, assuming you didn’t turn off this database option. These are named something like _WA_sys. And you can also create these explicitly usinf CREATE STATISTICS.

A few words about below script: I wanted to use the STATS_DATE function to retrieve datetime for when the statistics was built/updated. But STATS_DATE doesn’t seem to work on statistics only; it expect an id for an index… So, this is why I use DBCC SHOW_STATISTICS instead. And, unfortunately, DBCC SHOW_STATISTICS only display the statistics build time with minute precision. This is why I have a WAITFOR with > 1 minute in between the operations.

I got the same resuld whether or not I rebuild a clustered or non-clustered index on the table or even when specifying ALL indexes. Script:

USE tempdb
SET NOCOUNT ON
IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t(c1 int identity, c2 char(5))
INSERT INTO t (c2)
SELECT TOP 10000 ‘Hello’ FROM syscolumns a, syscolumns b

CREATE CLUSTERED INDEX x1 ON t(c2)
CREATE STATISTICS s1 ON t(c1)

SELECT ‘ ‘ AS “Before mass modification”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1′) WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY ’00:01:02’

INSERT INTO t (c2)
SELECT TOP 10000 ‘Hi’ FROM syscolumns a, syscolumns b
SELECT ‘ ‘ AS “Before index rebuild”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1′) WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY ’00:01:02’

–ALTER INDEX x1 ON t REBUILD
ALTER INDEX ALL ON t REBUILD
SELECT ‘ ‘ AS “After index rebuild”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1’) WITH STAT_HEADER, NO_INFOMSGS