Is statistics over non-indexed columns updated by index rebuild?

Short answer: no.

This question came up today in the MCT discussion group. My gut instinct said no, but I wanted to test it to be certain. But first a brief background:

You can rebuild an index using DBCC DBREINDEX or in 2005 (and now preferred) ALTER INDEX … REBUILD. Rebuilding an index internally creates a new index and when that has been done, drops the old index.

So it is pretty obvious that we also get new statistics for that index (based on all data, not sampled, just as when we do CREATE INDEX). As an aside, reorganizing does *not* update the statistics…

But what about statistics over non-indexed columns? SQL Server can create this by itself, assuming you didn’t turn off this database option. These are named something like _WA_sys. And you can also create these explicitly usinf CREATE STATISTICS.

A few words about below script: I wanted to use the STATS_DATE function to retrieve datetime for when the statistics was built/updated. But STATS_DATE doesn’t seem to work on statistics only; it expect an id for an index… So, this is why I use DBCC SHOW_STATISTICS instead. And, unfortunately, DBCC SHOW_STATISTICS only display the statistics build time with minute precision. This is why I have a WAITFOR with > 1 minute in between the operations.

I got the same resuld whether or not I rebuild a clustered or non-clustered index on the table or even when specifying ALL indexes. Script:

USE tempdb
SET NOCOUNT ON
IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t(c1 int identity, c2 char(5))
INSERT INTO t (c2)
SELECT TOP 10000 ‘Hello’ FROM syscolumns a, syscolumns b

CREATE CLUSTERED INDEX x1 ON t(c2)
CREATE STATISTICS s1 ON t(c1)

SELECT ‘ ‘ AS “Before mass modification”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1′) WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY ’00:01:02’

INSERT INTO t (c2)
SELECT TOP 10000 ‘Hi’ FROM syscolumns a, syscolumns b
SELECT ‘ ‘ AS “Before index rebuild”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1′) WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY ’00:01:02’

–ALTER INDEX x1 ON t REBUILD
ALTER INDEX ALL ON t REBUILD
SELECT ‘ ‘ AS “After index rebuild”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1’) WITH STAT_HEADER, NO_INFOMSGS

 

 

Is ‘YYYY-MM-DD’ always treated as year, month, date?

Most of you already know the answer to this question: no.

I won’t recap the rules for interpretation of the current datetime literals, as I have an article on the subject: http://karaszi.com/the-ultimate-guide-to-the-datetime-datatypes

However, SQL Server 2008 introduces a number of new date, datetime and time datatypes. One interesting aspect of these is that interpretation of the ANSI SQL datetime format, ‘YYYY-MM-DD’ is independent of language and datetime settings.

SQL Server MVP Steve Kass opened up the discussion (along with a connect suggestion) that it is time for the old datetime datatype also always treat this format as year, month, date. In general, I’m all for a consistent treatment of this datetime formatting, and that should indeed be the most “natural” one and also adhere to ANSI SQL. However, I don’t feel qualified to estimate if or how much backwards compatibility problems this will cause. Are any of you using format ‘YYYY-DD-MM’ format, for instance?

Here’s a script that show you whether or not SQL Server treats ‘nnnn-nn-nn’ as ‘yyyy-mm-dd’ or something else. For fun, if you have July CTP of SQL Server 2008, you can run the same script and change the datatypes to datetime2 and compare the difference.

SET NOCOUNT ON
USE tempdb
IF OBJECT_ID(‘langdf’) IS NOT NULL DROP TABLE langdf
CREATE TABLE langdf(lang sysname, dt datetime, correct bit)

DECLARE @lang sysname, @sql nvarchar(4000)
DECLARE c CURSOR FOR SELECT alias FROM master.dbo.syslanguages
OPEN c
WHILE 1 = 1
BEGIN
FETCH NEXT FROM c INTO @lang
IF @@FETCH_STATUS <> 0 BREAK
SET @sql =
‘SET LANGUAGE “‘ +
@lang + ‘”
INSERT INTO langdf
SELECT
”’ + @lang + ”’
,CAST(”2007-02-09” AS datetime)
,CASE WHEN CAST(”2007-02-09” AS datetime) = ”20070209” THEN 1 ELSE 0 END’
PRINT @sql
EXEC(@sql)
END
DEALLOCATE c

SELECT
COUNT(NULLIF(correct, 0)) AS “correct interpretation”
,COUNT(NULLIF(correct, 1)) AS “incorrect interpretation”
FROM langdf

SELECT * FROM langdf

Does BACKUP utilize pages in cache?

I’m having a conversation with Tony Rogerson (see his great blog at: http://sqlblogcasts.com/blogs/tonyrogerson/) about whether or not BACKUP will grab paged from cache (Buffer Pool, BP) or always read them from disk.

It seems natural to grab the pages in BP to eliminate some physical I/O. But thinking a bit more, I realized that the data in BP might not be organized in a suitable way. As far as I know, there’s no overall extent structure (map) for the BP. And BACKUP is done at extent basis (extents doesn’t have to be sequental on backup media, though). So, how would SQL Server orchestrate this? It need to lay each extent in backup media, but reading BP, it will get something like page 5 from extent 78, then page 2 from page 456, then page 3 from page 8964, then page 2 from extent 78.

Anyhow, seeing is believeing, so I created a repro for this (see code at end of this blog).

My conclusion is that backup does not read pages from BP, it will read all pages (extents) from disk. My test showed no different in lapse time whether or not data is in cache and same for I/O.

The conversation is still ongoing, and anyone is free to point out flaws in the script or different findings.

USE master
GO
IF DB_ID(‘testBpBackup’) IS NOT NULL DROP DATABASE testBpBackup
GO
CREATE DATABASE testBpBackup
ON  PRIMARY
(NAME = N’testBpBackup_Data’
,FILENAME = N’C:\testBpBackup_Data.mdf’
,SIZE = 1GB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 50MB )
LOG ON
(NAME = N’testBpBackup_Log’
,FILENAME = N’C:\testBpBackup_Log.ldf’
,SIZE = 100MB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 10MB)
GO
USE testBpBackup
GO
IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t (c1 int identity, c2 char(500))
INSERT INTO t
SELECT TOP 1000000 ‘Hello’ FROM syscolumns a, syscolumns b, syscolumns c
EXEC sp_spaceused ‘t’ –Approx 500MB
GO

–Execute below in one batch

–support variables
DECLARE @t datetime, @io_before bigint, @crap int

–Backup with data in bp
SET @crap = (SELECT COUNT(*) FROM t)
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = ‘nul’
SELECT
DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS “seconds with data in BP”,
(SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) – @io_before AS “I/O with data in BP”

–Backup with data not in bp
DBCC DROPCLEANBUFFERS
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = ‘nul’
SELECT
DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS “seconds with data NOT in BP”,
(SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) – @io_before AS “I/O with data NOT in BP”

 

No Notification Services in SQL Server 2008

If you have been wondering where Notification Services (NS) is in the prior CTPs of SQL Server 2008, you now have the answer. NS will not be carried forward to SQL Server 2008. Here’s a quote from July CTP readme:

“SQL Server Notification Services will not be included as a component of SQL Server 2008, but will continue to be supported as part of the SQL Server 2005 product support life-cycle. Moving forward, support for key notification scenarios will be incorporated into SQL Server Reporting Services. Existing Reporting Services functionality, such as data driven subscriptions, addresses some of the notification requirements. Features to support additional notification scenarios may be expected in future releases. ”

So, now we know.

I’ve been delivering training of SQL Server 2005 for a while now, and one of the courses has a module on NS. Over time, I gradually adapted to the fact that almost no-one was interested in NS and nowadays I just give a brief overwiew of what NS is, and the basics of creating an NS solution. Funny thing is that when you “get” NS, you realize that it is a neat piece of infrastructural software, taking writing some code off your hands. I guess that there hasn’t been enough interest in NS to carry it forward to next release, but that it pure speculation from my side…

Are execution plans for functions cached?

Obviously, were talking about multi-statement functions, since in-line functions are just views in the end.

My gut feeling for this is “yes”, but I wanted to be absolutely certain. So here goes:

  1. Create a function in Adventureworks
  2. Use that function in a SELECT statement
  3. Check if a plan exists in the plan cache for above

USE Adventureworks
GO

IF OBJECT_ID(‘fn’) IS NOT NULL DROP FUNCTION fn
GO

CREATE FUNCTION fn(@rg uniqueidentifier)
RETURNS @tbl TABLE(SalesOrderDetailID int NOT NULL PRIMARY KEY, OrderQty smallint NOT NULL)
AS
BEGIN
INSERT INTO @tbl(SalesOrderDetailID, OrderQty)
SELECT SalesOrderDetailID, OrderQty FROM sod WHERE rowguid = @rg
RETURN
END
GO

SELECT * FROM fn(‘80667840-F962-4EE3-96E0-AECA108E0D4F’)
GO

SELECT cp.cacheobjtype, cp.plan_handle, qp.query_plan
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
WHERE qp.objectid  = OBJECT_ID(‘fn’)

IF OBJECT_ID(‘fn’) IS NOT NULL DROP FUNCTION fn

Note the execution plan in XML format picked up from sys.dm_exec_query_plan. If you executed the query in grid format, you can cklick on it, and then save the XML as a file. Rename the file extension to .sqlplan and open that file in SSMS. You can now see the plan for this function graphically.

Online restore and filegroups

Is data available for a partitioned table if one of the partitions isn’t available? Instead of reading BOL, I wanted to test this. Here is an outline of what I did (you find TSQL code at the end of this post):

  • Create a database over several file groups
  • Create a partitioned table
  • Inserted some rows
  • Database backup
  • Inserted some rows
  • Log backup
  • Inserted some rows
  • Log backup
  • Inserted some rows
  • Database restore of only one file group
  • Answer is yes. If the query is written so that the non-available partition is eliminated (either optimize or run time) we can access the available partitions.
  • Restore of the log backups
    Whole table is now available

TSQL code. Note that this will drop a database named PartOnLineRestore if such exist:

SET NOCOUNT ON
USE master
IF DB_ID(‘PartOnLineRestore’) IS NOT NULL DROP DATABASE PartOnLineRestore
GO
CREATE DATABASE PartOnLineRestore
ON
PRIMARY
(NAME = N’PartOnLineRestore’, FILENAME = N’C:\PartOnLineRestore.mdf’, SIZE = 3MB ),
FILEGROUP FG1
(NAME = N’FG1File’, FILENAME = N’C:\FG1File.ndf’, SIZE = 3MB),
FILEGROUP FG2
(NAME = N’FG2File’, FILENAME = N’C:\FG2File.ndf’, SIZE = 3MB ),
FILEGROUP FG3
(NAME = N’FG3File’, FILENAME = N’C:\FG3File.ndf’, SIZE = 3MB )
LOG ON
(NAME = N’PartOnLineRestore_log’, FILENAME = N’C:\PartOnLineRestore_log.ldf’, SIZE = 3MB)
GO
USE PartOnLineRestore

CREATE PARTITION FUNCTION PF1 (int) AS RANGE RIGHT FOR VALUES (1,2);

CREATE PARTITION SCHEME PS1 AS PARTITION PF1  TO (FG1, FG2, FG3);

IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t (a int, b int, c char(300) default ‘hello’);

— Populate table with some rows
DECLARE @i int; SET @i=0;
WHILE @i<10000
BEGIN
SET @i=@i+1;
INSERT INTO t (a, b) VALUES (5*RAND(),100*RAND())
END;

–Partition table
CREATE CLUSTERED INDEX x ON t(a) ON PS1(a)

–Do some backups
USE master
ALTER DATABASE PartOnLineRestore SET RECOVERY FULL
BACKUP DATABASE PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH INIT
INSERT INTO PartOnLineRestore..t (a, b) VALUES (5*RAND(),100*RAND())
BACKUP LOG PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH NOINIT
INSERT INTO PartOnLineRestore..t (a, b) VALUES (5*RAND(),100*RAND())
BACKUP LOG PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH NOINIT
INSERT INTO PartOnLineRestore..t (a, b) VALUES (5*RAND(),100*RAND())

–Filegroup restore
RESTORE DATABASE PartOnLineRestore FILEGROUP = ‘FG1’ FROM DISK = ‘C:\PartOnLineRestore.bak’

–Can we access the table?
SELECT * FROM PartOnLineRestore..t
–Error message

SELECT * FROM PartOnLineRestore..t WHERE a = 0
–Error message

SELECT * FROM PartOnLineRestore..t WHERE a = 1
–OK, partition elimination at compile time

DECLARE @a int
SET @a = 1
SELECT * FROM PartOnLineRestore..t WHERE a = @a
–OK, partition elimination at run time

–Restore of log backups:
RESTORE LOG PartOnLineRestore FROM DISK = ‘C:\PartOnLineRestore.bak’ WITH FILE = 2
RESTORE LOG PartOnLineRestore FROM DISK = ‘C:\PartOnLineRestore.bak’ WITH FILE = 3
BACKUP LOG PartOnLineRestore TO DISK = ‘C:\PartOnLineRestore.bak’ WITH NOINIT
RESTORE LOG PartOnLineRestore FROM DISK = ‘C:\PartOnLineRestore.bak’ WITH FILE = 4

–All data vailable?
SELECT * FROM PartOnLineRestore..t WHERE a = 0
–OK

SELECT * FROM PartOnLineRestore..t
–OK

–Cleanup:
DROP DATABASE PartOnLineRestore

Whats in the default trace?

As you probably know, there’s a trace running by default in SQL Server 2005. The directory for the trace file is the SQL Server log directory, and you can turn off and on this trace with sp_configure.

But how do we find out what events and columns are traced to this? We use a trace function and some trace catalog views:

The function fn_trace_geteventinfo returns what columns and events are captured by a configured trace. But we don’t want to see the column id and event id, we want the names. So we join this to the following functions:
sys.trace_events
sys.trace_categories
sys.trace_columns

Here’s the end result:

SELECT cat.name AS CategoryName, e.name AS EventName, c.name AS ColumnName
FROM fn_trace_geteventinfo(1) AS rt
INNER JOIN sys.trace_events AS e
ON rt.eventid = e.trace_event_id
INNER JOIN sys.trace_columns AS c
ON rt.columnid = c.trace_column_id
INNER JOIN sys.trace_categories AS cat
ON e.category_id = cat.category_id
ORDER BY CategoryName, EventName, ColumnName

And here’s one with only category and event:

SELECT DISTINCT cat.name AS CategoryName, e.name AS EventName
FROM fn_trace_geteventinfo(1) AS rt
INNER JOIN sys.trace_events AS e
ON rt.eventid = e.trace_event_id
INNER JOIN sys.trace_categories AS cat
ON e.category_id = cat.category_id
ORDER BY CategoryName, EventName

Leaking roof and file shrinking

Shrinking and growing database files cost you resources and have other side effects as well. I have a technical description  http://karaszi.com/why-you-want-to-be-restrictive-with-shrink-of-database-files if you want to get into details about it. Sure, you can have valid reasons to shrink, after archiving lots of data and needing to free some disk space for example. But even after pointing to my article I see people doing this at a regular basis, hence this analogy. The analogy uses transaction log file as example, but the same basic principles applies to data files:

Say you have a leaking roof at the office. You need to put a bucket for the water under the leak (the bucket is your database file, a log file, .ldf, for this analogy).

Ever so often, you need to empty the bucket (backup log if in full or bulk logged recovery model, or when checkpoint occurs if you are in simple recovery model). Emptying the bucket doesn’t make the bucket smaller. The bucket still consumes the same amount of cubic cm of your office space.

You can of course have one of the office workers (the machine resources) take an angle grinder and make the bucker physically smaller when you empty it. This consumes resources. But you would now have an office worker to use a welder, perhaps every 10 minutes or every hour to make the bucket larger, as you have more and more water in the bucket.

I don’t know anyone who had a leaking roof, but if I were in that situation, I sure wouldn’t make this bucket larger and smaller all the time. I’d use a bucket with a size necessary to accommodate the water that builds up between when I empty the bucket. This is how I handle my database files as well.

Having fun with new RESTORE options in 2005

You might know that we have some new options for backup and especially restore in SQL Server 2005. Below script was inspired by a newsgroup post regarding not being able to restore an old readonly filegroup backup into a new database.

Here is what I wanted to try:

  • Setting a filegroup to read only, do a backup and then forget about this filegroup. This was not an option in earlier releases. In 7.0 and 2000, restoring a file or filegroup backup meant that you also need to restore all subsequent log backups. Not very convenient if it is Dec 10 and the backup was performend Jan 01. We can do this now, but the filegroup must have been readonly all the time since the backup of that filegroup was performed.
  • Piecemal restore. This allow you to first restore the primary filegroup, and immediately after let the users in to the database. You can now, at your leisure, restore the other filegroups. This allow for the users to more quickly get into the database. They don’t have to wait 10 hours for you to restore that large history table, which is only used for reporting anyhow.
  • Do above into a new database name. Say that you had to rebuild the old server and only have the backup files available.

Below is the script. Note that it will drop the databases X and X2, if such exists:

USE master
GO
IF DB_ID(‘x’) IS NOT NULL DROP DATABASE x
GO
IF DB_ID(‘x2′) IS NOT NULL DROP DATABASE x2
GO
CREATE DATABASE [x] ON  PRIMARY
( NAME = N’x’, FILENAME = N’C:\x.mdf’ , SIZE = 3072KB , FILEGROWTH = 1024KB ),
FILEGROUP [STUFF]
( NAME = N’x_stuff’, FILENAME = N’C:\x_stuff.ndf’ , SIZE = 3072KB , FILEGROWTH = 1024KB )
LOG ON
( NAME = N’x_log’, FILENAME = N’C:\x_log.ldf’ , SIZE = 3072KB , FILEGROWTH = 10%)
GO

CREATE TABLE x..t(c1 int) ON “PRIMARY”
INSERT INTO x..t VALUES(1)
CREATE TABLE x..t_stuff(c1 int) ON STUFF
INSERT INTO x..t_stuff VALUES(1)
GO
ALTER DATABASE x MODIFY FILEGROUP STUFF READONLY
GO
BACKUP DATABASE x FILEGROUP = ‘STUFF’ TO DISK = ‘C:\x_fg_STUFF.BAK’ WITH INIT
GO
INSERT INTO x..t VALUES(2)
GO
BACKUP DATABASE x READ_WRITE_FILEGROUPS TO DISK = ‘C:\x_fg_rw.BAK’ WITH INIT
GO

–Restore the primary (read/write) filegroup
RESTORE DATABASE x2 FROM DISK = ‘C:\x_fg_rw.BAK’
WITH
PARTIAL
,MOVE ‘x’ TO ‘C:\x2.mdf’
,MOVE ‘x_stuff’ TO ‘C:\x2_stuff.mdf’
,MOVE ‘x_log’ TO ‘C:\x2_log.mdf’
,REPLACE

SELECT * FROM x2..t –OK
GO
SELECT * FROM x2..t_stuff –Fails, as expected
GO

–Restore the other filegroup
RESTORE DATABASE x2 FROM DISK = ‘C:\x_fg_STUFF.BAK’
WITH
MOVE ‘x’ TO ‘C:\x2.mdf’
,MOVE ‘x_stuff’ TO ‘C:\x2_stuff.mdf’
,MOVE ‘x_log’ TO ‘C:\x2_log.mdf’
GO

SELECT * FROM x2..t –OK
SELECT * FROM x2..t_stuff –OK