What does RESTORE do?

I was replying to a newsgroup post today, explaining the restore process. I find some confusion in various places about what actually happens during restore, and hopefully below can help a bit:

Fact: All backups contains log records. A log backup contains only of log records (more later on bulk-logged recovery). The different types of database backup contain the log records that was produced while the backup was running – in addition to data pages.
The restore command can perform several things:
  1. Delete the existing database. This happens if you specify the REPLACE option.
  2. Create the existing database. This happens if the database name you specify in the RESTORE command doesn’t exist.
  3. Copy data pages to the same address in each database file as they were when the backup was produced. And of course also copy log records to ldf file. Source for these copy operations is the backup file that you restore from.
  4. REDO. Roll forward transactions based on the log records copied in above step.
  5. UNDO. Rollback all open transaction. Not performed if you specify NORECOVERY (more later on STANDBY option). Database is now accessible, and no further backups can be restored (diff or log backups).
A couple of special cases:
If you are in bulk-logged recovery model, then a log backup performed if you did minimally logged operations since last log backup will contain also data pages (in addition to log records). This type of backup cannot be performed if any of the data files are lost. When you restore from this type of backup, you cannot do point-in-time restore.
The STANDBY option of the RESTORE command does perform UNDO but saves that work to a file you specify. This so that the UNDO work can be undone when you restore a subsequent backup.
I think I managed to compress the topic pretty well, so unless you worked a bit with backup and restore in SQL Server, you might want to read above a couple of times. ūüôā

Backup compression in SQL Server 2008

Having a few moments to spare, I decided to give this a spin.

Specifying for the backup to be compressed is really simple. Just specify COMPRESSION in the WITH clause of the BACKUP command. For example:

BACKUP DATABASE Adventureworks
TO DISK = 'C:\Advc.bak'
WITH INIT, COMPRESSION

For fun, I compared backup file size and backup time between compressing and not compressing. I ran below after priming the cache (not that it should matter since backup read pages from disk, see http://sqlblog.com/blogs/tibor_karaszi/archive/2007/08/03/does-backup-utilize-pages-in-cache.aspx). I also deleted the backup files, if they exist, before execution.

DECLARE @dt datetime

SET @dt = GETDATE()
BACKUP DATABASE Adventureworks
TO DISK = 'C:\Adv.bak'
WITH INIT
SELECT DATEDIFF(ms, @dt, GETDATE())

SET @dt = GETDATE()
BACKUP DATABASE Adventureworks
TO DISK = 'C:\Advc.bak'
WITH INIT, COMPRESSION
SELECT DATEDIFF(ms, @dt, GETDATE())

Size of backup file is 129 MB vs. 35 MB. Of course, compression ratio varies depending on what type of data there is in the database (string data tend to compress better than other data,. for instance). Time to execute commands were 7.5 seconds vs. 3.8 seconds. Now, this is a virtual machine on VPC and Adventureworks is a tiny database. But at least we can see clear potential for savings here.

So how do we know if a database backup was compressed? If you’ve used RESTORE HEADERONLY, you probably noticed this annoying column called “Compressed”. Well, this is the first time we see a value of “1” in that column.

It seems we cannot mix compressed and non-compressed backups on the same file, though. (Not that I often have multiple backups on the same file.) If I append a backup using COMPRESSION on a file were there already are non-compressed backup, I get an error. Or I do a backup without COMPRESSION on a file where there are compressed backups, the new backup will be compressed (even when not saying COMPRESSION). Something to look out for if you have several backups on the backup files.

So what about the RESTORE command? Well, there’s nothing to say, really. You don’t have to specify in the RESTORE command that the backup was compressed. Time for RESTORE was 10.4 vs 6.3 seconds (with the destination database already existing). I expect the difference to be bigger on real installation and realistic db size.

TSQL script to do backup like Maintenance Plans

This is frequently asked question, both in forums, as well as in class:

How can I though TSQL create new backup files for each backup and have date and time in the file names?

The reason for above is most often that you want a higher degree of customization than what maint plans allow for. An obvious such in 2005 is to add the CHECKSUM option to the BACKUP command. I know there already exist several script like mine “out there”, this is just my simple version so I remember where to point whenever I get this question next. ūüôā

Since I probably have some bugs in it, I have it on my website. This way I can fix the bugs as needed. You are most welcome to report bugs in the script if you spot any! And here’s the URL:

http://karaszi.com/tsql-script-to-create-backup-file-including-database-name-date-and-time

Bulk logged recovery model and log backups

This post is inspired from a discussion about bulk logged recovery model and log backup. Some basics:

In bulk logged recovery model, some operations can be minimally logged, like:

  • SELECT INTO
  • Index create, rebuild and drop
  • Bulk loading of data

The interesting thing is that you can do a log backup after such a minimally logged operation. This log backup will contain both log records and also the data and index pages affected by the minimally logged operation.

You cannot do a point in time restore using STOPAT or similar for such a log backup (a log backup containing both log recorda and data pages), for obvious reasons. You can do a point in time restore of subsequent log backups, however, assuming that log backup doesn’t contain data pages (no bulk logged operation was done during the time span that the log backup cover).

You cannot do a log backup while in bulk logged mode if a bulk logged operation has occured and if the data files are lost. This is also pretty obvious since SQL Server cannot go and grab the data pages is needs Рthe data files are lost!

But what about if we have several file groups, we have bulk logged recovery model and performed a bulk logged operation; but the bulk logged operation doesn’t affect the data file which is lost? Theoretically, SQL Server could now allow such a log backup. So, lets give it a try:

In below script, we have two file groups: PRIMARY and SEC. We do a bulk logged operation which affects only PRIMARY, and then delete the file for the scondary file group. Then try a log backup. This fails, so the answer is that all data files need to be accessible in order to perform a log backup (in bulk logged recovery model and after a minimally logged operation). The reason becomes more obvious if we look at the message from such a successful log backup first:

Processed 72 pages for database ‘BulkBackup’, file ‘BulkBackup’ on file 1.
Processed 8 pages for database ‘BulkBackup’, file ‘BulkBackup2’ on file 1.
Processed 58 pages for database ‘BulkBackup’, file ‘BulkBackup_log’ on file 1.
BACKUP LOG successfully processed 138 pages in 0.098 seconds (11.530 MB/sec).

The file BulkBackup2 is a file which is unaffcted by the bulk logged operation. But still SQL Server goes and grab 8 pages from that file. The number “8” makes me suspect that this is the file header (first 8 pages on each file), needed for some reason. (I get the same error with both NO_TRUNCATE and CONTINUE_AFTER_ERROR.)

The findings are also implied by Books Online. Here’s a quote:

“If the database is damaged, for example, if the database does not start, a tail-log backup succeeds only if the log files are undamaged, the database is in a state that supports tail-log backups, and the database does not contain any bulk-logged changes.”

Above do not talk about individual files, it addresses the whole database.

Finally, here’s the script. Read it so you understand it before you execute it. It will drop a database named BulkBackup, if such exists. If you want, you can change recovery to FULL if you want to verify that log backup is indeed possible for a damaged database (in general terms…).

USE master
IF DB_ID(‘BulkBackup’) IS NOT NULL DROP DATABASE BulkBackup
GO
CREATE DATABASE BulkBackup
ON
PRIMARY
(NAME = N’BulkBackup’, FILENAME = N’C:\BulkBackup.mdf’ , SIZE = 20MB , FILEGROWTH = 10MB),
FILEGROUP SEC
(NAME = N’BulkBackup2′, FILENAME = N’C:\BulkBackup2.ndf’ , SIZE = 20MB , FILEGROWTH = 10MB)
LOG ON
(NAME = N’BulkBackup_log’, FILENAME = N’C:\BulkBackup_log.ldf’ , SIZE = 20MB , FILEGROWTH = 10MB)
GO
ALTER DATABASE BulkBackup SET RECOVERY BULK_LOGGED
ALTER DATABASE BulkBackup SET AUTO_CLOSE ON

USE BulkBackup
CREATE TABLE t(c1 int identity, c2 char(5)) ON SEC
INSERT INTO t SELECT TOP 10000 ‘hello’ FROM syscolumns a CROSS JOIN syscolumns b
CREATE TABLE tSec(c1 int identity, c2 char(5)) ON “PRIMARY”
INSERT INTO tSec SELECT TOP 10000 ‘hello’ FROM syscolumns a CROSS JOIN syscolumns b
BACKUP DATABASE BulkBackup TO DISK = ‘nul’

–Bulk logged operation
SELECT * INTO tSec2 FROM tSec

–Verify log backup possible
–after bulk logged operation
BACKUP LOG BulkBackup TO DISK = ‘nul’

–Bulk logged operation again
SELECT * INTO tSec3 FROM tSec

–Crash secondary database file
USE master
GO
WAITFOR DELAY ’00:00:05′
EXEC sp_configure ‘xp_cmdshell’, 1
RECONFIGURE
EXEC xp_cmdshell ‘DEL C:\BulkBackup2.ndf’
EXEC sp_configure ‘xp_cmdshell’, 0
RECONFIGURE

–Can we now do a log backup?
BACKUP LOG BulkBackup TO DISK = ‘nul’ WITH NO_TRUNCATE
–Apparently not. Error messages is:

–BACKUP LOG is not possible because bulk logged changes exist in
–the database and one or more filegroups are unavailable.

–Cleanup:
GO
IF DB_ID(‘BulkBackup’) IS NOT NULL DROP DATABASE BulkBackup

 

Does BACKUP utilize pages in cache?

I’m having a conversation with Tony Rogerson (see his great blog at: http://sqlblogcasts.com/blogs/tonyrogerson/) about whether or not BACKUP will grab paged from cache (Buffer Pool, BP) or always read them from disk.

It seems natural to grab the pages in BP to eliminate some physical I/O. But thinking a bit more, I realized that the data in BP might not be organized in a suitable way. As far as I know, there’s no overall extent structure (map) for the BP. And BACKUP is done at extent basis (extents doesn’t have to be sequental on backup media, though). So, how would SQL Server orchestrate this? It need to lay each extent in backup media, but reading BP, it will get something like page 5 from extent 78, then page 2 from page 456, then page 3 from page 8964, then page 2 from extent 78.

Anyhow, seeing is believeing, so I created a repro for this (see code at end of this blog).

My conclusion is that backup does not read pages from BP, it will read all pages (extents) from disk. My test showed no different in lapse time whether or not data is in cache and same for I/O.

The conversation is still ongoing, and anyone is free to point out flaws in the script or different findings.

USE master
GO
IF DB_ID(‘testBpBackup’) IS NOT NULL DROP DATABASE testBpBackup
GO
CREATE DATABASE testBpBackup
ON  PRIMARY
(NAME = N’testBpBackup_Data’
,FILENAME = N’C:\testBpBackup_Data.mdf’
,SIZE = 1GB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 50MB )
LOG ON
(NAME = N’testBpBackup_Log’
,FILENAME = N’C:\testBpBackup_Log.ldf’
,SIZE = 100MB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 10MB)
GO
USE testBpBackup
GO
IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t (c1 int identity, c2 char(500))
INSERT INTO t
SELECT TOP 1000000 ‘Hello’ FROM syscolumns a, syscolumns b, syscolumns c
EXEC sp_spaceused ‘t’ –Approx 500MB
GO

–Execute below in one batch

–support variables
DECLARE @t datetime, @io_before bigint, @crap int

–Backup with data in bp
SET @crap = (SELECT COUNT(*) FROM t)
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = ‘nul’
SELECT
DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS “seconds with data in BP”,
(SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) – @io_before AS “I/O with data in BP”

–Backup with data not in bp
DBCC DROPCLEANBUFFERS
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = ‘nul’
SELECT
DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS “seconds with data NOT in BP”,
(SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) – @io_before AS “I/O with data NOT in BP”

 

Online restore and filegroups

Is data available for a partitioned table if one of the partitions¬†isn’t available? Instead of reading BOL, I wanted to test this. Here is an outline of what I did (you find TSQL code at the end of this post):

  • Create a database over several file groups
  • Create a partitioned table
  • Inserted some rows
  • Database backup
  • Inserted some rows
  • Log¬†backup
  • Inserted some rows
  • Log¬†backup
  • Inserted some rows
  • Database restore of only one file group
  • Answer is yes. If the query is written so that the non-available partition is eliminated (either optimize or run time) we can access the available partitions.
  • Restore of the log backups
    Whole table is now available

TSQL code. Note that this will drop a database named PartOnLineRestore if such exist:

SET NOCOUNT ON
USE master
IF DB_ID(‘PartOnLineRestore’) IS NOT NULL DROP DATABASE PartOnLineRestore
GO
CREATE DATABASE PartOnLineRestore
ON
PRIMARY
(NAME = N’PartOnLineRestore’, FILENAME = N’C:\PartOnLineRestore.mdf’, SIZE = 3MB ),
FILEGROUP FG1
(NAME = N’FG1File’, FILENAME = N’C:\FG1File.ndf’, SIZE = 3MB),
FILEGROUP FG2
(NAME = N’FG2File’, FILENAME = N’C:\FG2File.ndf’, SIZE = 3MB ),
FILEGROUP FG3
(NAME = N’FG3File’, FILENAME = N’C:\FG3File.ndf’, SIZE = 3MB )
LOG ON
(NAME = N’PartOnLineRestore_log’, FILENAME = N’C:\PartOnLineRestore_log.ldf’, SIZE = 3MB)
GO
USE PartOnLineRestore

CREATE PARTITION FUNCTION PF1 (int) AS RANGE RIGHT FOR VALUES (1,2);

CREATE PARTITION SCHEME PS1 AS PARTITION PF1  TO (FG1, FG2, FG3);

IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t (a int, b int, c char(300) default ‘hello’);

— Populate table with some rows
DECLARE @i int; SET @i=0;
WHILE @i<10000
BEGIN
SET @i=@i+1;
INSERT INTO t (a, b) VALUES (5*RAND(),100*RAND())
END;

–Partition table
CREATE CLUSTERED INDEX x ON t(a) ON PS1(a)

–Do some backups
USE master
ALTER DATABASE PartOnLineRestore SET RECOVERY FULL
BACKUP DATABASE PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH INIT
INSERT INTO PartOnLineRestore..t (a, b) VALUES (5*RAND(),100*RAND())
BACKUP LOG PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH NOINIT
INSERT INTO PartOnLineRestore..t (a, b) VALUES (5*RAND(),100*RAND())
BACKUP LOG PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH NOINIT
INSERT INTO PartOnLineRestore..t (a, b) VALUES (5*RAND(),100*RAND())

–Filegroup restore
RESTORE DATABASE PartOnLineRestore FILEGROUP = ‘FG1’ FROM DISK = ‘C:\PartOnLineRestore.bak’

–Can we access the table?
SELECT * FROM PartOnLineRestore..t
–Error message

SELECT * FROM PartOnLineRestore..t WHERE a = 0
–Error message

SELECT * FROM PartOnLineRestore..t WHERE a = 1
–OK, partition elimination at compile time

DECLARE @a int
SET @a = 1
SELECT * FROM PartOnLineRestore..t WHERE a = @a
–OK, partition elimination at run time

–Restore of log backups:
RESTORE LOG PartOnLineRestore FROM DISK = ‘C:\PartOnLineRestore.bak’ WITH FILE = 2
RESTORE LOG PartOnLineRestore FROM DISK = ‘C:\PartOnLineRestore.bak’ WITH FILE = 3
BACKUP LOG PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH NOINIT
RESTORE LOG PartOnLineRestore FROM DISK = ‘C:\PartOnLineRestore.bak’ WITH FILE = 4

–All data vailable?
SELECT * FROM PartOnLineRestore..t WHERE a = 0
–OK

SELECT * FROM PartOnLineRestore..t
–OK

–Cleanup:
DROP DATABASE PartOnLineRestore

Having fun with new RESTORE options in 2005

You might know that we have some new options for backup and especially restore in SQL Server 2005. Below script was inspired by a newsgroup post regarding not being able to restore an old readonly filegroup backup into a new database.

Here is what I wanted to try:

  • Setting a filegroup to read only, do a backup and then forget about this filegroup. This was not an option in earlier releases. In 7.0 and 2000, restoring a file or filegroup backup meant that you also need to restore all subsequent log backups. Not very convenient if it is Dec 10 and the backup was performend Jan 01. We can do this now, but the filegroup must have been readonly all the time since the backup of that filegroup was performed.
  • Piecemal restore. This allow you to first restore the primary filegroup, and immediately after let the users in to the database. You can now, at your leisure, restore the other filegroups. This allow for the users to more quickly get into the database. They don’t have to wait 10 hours for you to¬†restore that large history table, which is only used for reporting anyhow.
  • Do above into a new database name. Say that you had to rebuild the old server and only have the backup files available.

Below is the script. Note that it will drop the databases X and X2, if such exists:

USE master
GO
IF DB_ID(‘x’) IS NOT NULL DROP DATABASE x
GO
IF DB_ID(‘x2′) IS NOT NULL DROP DATABASE x2
GO
CREATE DATABASE [x] ON  PRIMARY
( NAME = N’x’, FILENAME = N’C:\x.mdf’ , SIZE = 3072KB , FILEGROWTH = 1024KB ),
FILEGROUP [STUFF]
( NAME = N’x_stuff’, FILENAME = N’C:\x_stuff.ndf’ , SIZE = 3072KB , FILEGROWTH = 1024KB )
LOG ON
( NAME = N’x_log’, FILENAME = N’C:\x_log.ldf’ , SIZE = 3072KB , FILEGROWTH = 10%)
GO

CREATE TABLE x..t(c1 int) ON “PRIMARY”
INSERT INTO x..t VALUES(1)
CREATE TABLE x..t_stuff(c1 int) ON STUFF
INSERT INTO x..t_stuff VALUES(1)
GO
ALTER DATABASE x MODIFY FILEGROUP STUFF READONLY
GO
BACKUP DATABASE x FILEGROUP = ‘STUFF’ TO DISK = ‘C:\x_fg_STUFF.BAK’ WITH INIT
GO
INSERT INTO x..t VALUES(2)
GO
BACKUP DATABASE x READ_WRITE_FILEGROUPS TO DISK = ‘C:\x_fg_rw.BAK’ WITH INIT
GO

–Restore the primary (read/write) filegroup
RESTORE DATABASE x2 FROM DISK = ‘C:\x_fg_rw.BAK’
WITH
PARTIAL
,MOVE ‘x’ TO ‘C:\x2.mdf’
,MOVE ‘x_stuff’ TO ‘C:\x2_stuff.mdf’
,MOVE ‘x_log’ TO ‘C:\x2_log.mdf’
,REPLACE

SELECT * FROM x2..t –OK
GO
SELECT * FROM x2..t_stuff –Fails, as expected
GO

–Restore the other filegroup
RESTORE DATABASE x2 FROM DISK = ‘C:\x_fg_STUFF.BAK’
WITH
MOVE ‘x’ TO ‘C:\x2.mdf’
,MOVE ‘x_stuff’ TO ‘C:\x2_stuff.mdf’
,MOVE ‘x_log’ TO ‘C:\x2_log.mdf’
GO

SELECT * FROM x2..t –OK
SELECT * FROM x2..t_stuff –OK