Execution plan re-use, sp_executesql and TSQL variables

Let me start by saying that the contents of this post is not very advanced. If you have read the excellent paper “Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005”, https://technet.microsoft.com/en-us/library/cc966425.aspx  and understood it, you would already know below, and much more…

I was reading a thread in the open newsgroups today (.tools, posted by Mike), about an advice that the SQL Server 2005 Database Engine Tuning Advisor apparently gave. It advices to replace sp_executesql usage with declaring TSQL variables and using those in the WHERE clause. Translated to the Adventureworks database, it advices that instead of 1) below, we should use 2) below.

1)
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM Person.Person
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'

2)
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM Person.Person
WHERE LastName @P1

Now, I could not reproduce this (make DTA give me the same advice, to replace 1) with 2) ). Perhaps I misunderstood the poster in the group, it is because I’m running SQL Server 2008 DTA and engine, I’m not looking in the right place for that advice, my data isn’t representative, I’m running the DTA with some other settings, etc.. But say that DTA does indeed give such an advice, would would it do that? To be honest, I don’t know. It can hardly have enough information to determine whether 1) or 2) is the best choice.

In short: Say we have an index on the LastName column and the name we look for can either be a very common name, like perhaps “Smith”, or a not so common name, like “Karaszi”. For the more common name, a table scan might be the best choice, where for the not-so-common name, an index seek it probably best thing. OK, a very simple example, but serves well for this discussion.

Back to the difference between 1) and 2). There are potentially very important differences between the two:

For 1), SQL Server will determine a plan based on the contents of the parameter when the plan is created. I.e., it can determine selectivity based on that and determine whether it is a good idea to use an index or not. The plan is then cached and can be re-used. But what if we for the first execution pass in something which is very selective, but typically we are not very selective? Or the other way around? I,e, the plan for one case might not be optimal for some other case. This is where we have new optimizer hints in 2005 like OPTIMIZE FOR and the RECOMPILE hints. I will not go into details about these here, since I’m focusing on the differences between 1) and 2). See the white paper I mentioned, and of course Books Online, for more details.

For 2), SQL Server (or rather: the optimizer) has no knowledge of the contents of the variable when the plan is produced. So it cannot use the statistics to determine selectivity. In above case, it instead uses density (stored with the statistics, assuming such exists for the column). Density is basically the 1/ number of unique values for the column(s). This might be a good representation for a typical lastname, but perhaps not too good for a very common or a very uncommon lastname. Now, in 2005, we have hints for these situations as well (RECOMPILE), but again, that is not the point here.

In order for DTA to give the best advice here, it would have to know about the distribution over the data for that column and also have many executions of that query to see if “all” executions are using a typical value (sp_executesql might be better) or if the stored density value is a good representation for “all” queries that are passed from the clients. I very much doubt that DTA has this level of sophistication. Basically, I don’t know why it advices this. There might be other aspects, like “avoid dynamic SQL” (which whether that holds for this case we could argue in another post), but DTA is about performance, not best practices.

Bottom line is that these things are not simple and we should be very cautious with “rules of thumbs”.

Here’s some TSQL code to demonstrate the differences between 1) and 2). As always, only execute after you read and understood the code!

--Create a copy of the person table
--We will have lots of "Diaz" and very few "Gimmi"
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p
GO
SELECT INTO p
FROM Person.Person

CREATE INDEX ON dbo.p(LastName)

--Create lots of Diaz
INSERT INTO p
SELECT  BusinessEntityID 30000PersonTypeNameStyleTitleFirstNameMiddleNameN'Diaz'SuffixEmailPromotionAdditionalContactInfoDemographicsrowguidModifiedDate
FROM p

--Make sure we have up-to-date statistics
UPDATE STATISTICS WITH FULLSCAN

--Verify execution plan and I/O cost
--for table scan with low selectivity
--and index seek with high selectivity
SET STATISTICS IO ON

DBCC FREEPROCCACHE
SELECT FROM WHERE LastName 'Diaz'
--20183 rows, table scan, 7612 pages

DBCC FREEPROCCACHE
SELECT FROM WHERE LastName 'Gimmi'
--1 row, index seek, 3 pages

------------------------------------------------------------
--sp_execute alternative
------------------------------------------------------------

--Table scan will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, table scan, 7612 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, table scan, 7612 pages

--Other way around
--Index search will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, index seek, 3 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, index seek, 20291 pages

------------------------------------------------------------------
--Alternative using variable
------------------------------------------------------------------
DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--20183 rows, index seek, 20291 pages

DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Gimmi'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--1 rows, index seek, 1 pages

--Same plan even though very different selectivity
--and emptying plan cache in between

--Estimated 33 rows for both above.
--See if that is drawn from statistics density?

--Formula for density: 1/#OfUniqueValuesInColumn
SELECT 1./COUNT(DISTINCT LastNameFROM p
--0.00082918739

--Does that match density in index statistics?
DBCC SHOW_STATISTICS('p''x')
--Yes

--How many rows in the table?
SELECT COUNT(*) FROM p
--39944

--So how many rows would we estimate based on density?
SELECT 0.00082918739 39944
--Yep, 33 rows.

--I.e, for the variable alternative, SQL Server has no
--knowledge of contents of those variables so it must use density instead.

--Clean up:
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p

Are inserts quicker to heap or clustered tables?

Is it quicker and/or lower overhead to insert into a heap vs. a clustered table?
I don’t know. So I decided to do a test. Some background information first:

The test was inspired from a sidebar with Gert-Jan Strik in the open newsgroups. Basically I expressed that a heap doesn’t automatically carry lower overhead… just because it is a heap. Now, heaps vs. clustered tables is a huge topic with many aspects. I will not cover anything else here except inserts into a heap vs. a table which is clustered on an ever increasing key. No other indexes. There will be no fragmentation. I do not cover searches, covering etc. Only the pure insert aspect. OK? Good!

One might think that a heap has lower overhead because it is a … heap. But hang on for a second and think about what happens when you do an insert:

Heap:
SQL Server need to find where the row should go. For this it uses one or more IAM pages for the heap, and it cross references these to one or more PFS pages for the database file(s). IMO, there should be potential for a noticable overhead here. And even more, with many users hammering the same table I can imagine blocking (waits) against the PFS and possibly also IAM pages.

Clustered table:
Now, this is dead simple. SQL server navigates the clustered index tree and find where the row should go. Since this is an ever increasing index key, each row will go to the end of the table (linked list).

The result:
So what is the conclusion? I did several executions of the code at the end of this post, with some variations. Basically there was no or very little difference whith only one user. I.e., no contention to the GAM or PFS pages. This was pretty consistent for below three scenarios:

  1. Insert with subselect. I.e., this inserts lots of rows in the same statement.
  2. Insert in a loop (one insert and row per iteration), many rows in the same transaction.
  3. Insert in a loop, one row per transaction.

Now the difference between 2 and 3 is important.
With many transactions, we incur an overhead of force-log-write-at-commit *for each row*. I.e., much more overhead against the transaction log. And indeed, the timings between 2 and 3 for one of my executions (10000 rows) showed that 2 took on average 650 ms where the same number for 3 was 5600 ms. This is about 9 times longer!!! Now, this was more or less expected, but another important aspect is when we have several users. With many users, we might run into blocking on the PFS and IAM pages. Also, with several users it is meaningless to do it all in one transaction since we will block and essentially single-thread the code anyhow. I.e., the only revelant measure where we run many users is the loop construction where each row is its own transaction (3).

There was indded a noticeable difference when I executed several inserts in parallell and had each insert in its own transaction (for clustered table vs. heap table).

Some numbers:
I did 4 repeated tests and calculated average execution time for inserting 10000 rows for a thread. With 6 parallel thread I had 22 seconds for a clustered table and 29 seconds for a heap table. With 10 threads I had 31 seconds for a clustered table and 42 seconds for a heap table.

I didn’t find performance difference more than a couple of percents for batch inserts, when I single threaded (only one thread pumping inserts), or when I had all inserts in the loop as one transaction.

Now, I would need lots of more time to run exchaustive tests, but my interpretation is that with many users doing inserts, there is an noticable overhead for the heap vs clustering on a increasing key.

The code:
Note that for parallell executions, I recommend starting the DoTheInserts procedure using SQLCMD, a BAT file and START. As always, read the code carefully (so you understand it) and execute at your own risk.

——————————————–
–Create the database etc.
——————————————–
USE master SET NOCOUNT ON
GO
IF DB_ID(‘TestDb’) IS NOT NULL DROP DATABASE TestDb
GO
–Makes files large enough so that inserts don’t causes autogrow
CREATE DATABASE TestDb
ON  PRIMARY
(NAME = ‘TestDb’, FILENAME = ‘C:\TestDb.mdf’, SIZE = 300MB, FILEGROWTH = 50MB)
LOG ON
(NAME = ‘TestDb_log’, FILENAME = ‘C:\TestDb_log.ldf’, SIZE = 200MB, FILEGROWTH = 100MB)
GO
–Full recovery to avoid effect of system caused log truncation
ALTER DATABASE TestDb SET RECOVERY FULL
BACKUP DATABASE TestDb TO DISK = ‘nul’
USE TestDb

–Execution time log table
IF OBJECT_ID(‘TimeLogger’) IS NOT NULL DROP TABLE TimeLogger
GO
CREATE TABLE TimeLogger
(
SomeId int identity
,spid int
,TableStructure varchar(10) CHECK (TableStructure IN (‘heap’, ‘clustered’))
,InsertType varchar(20) CHECK (InsertType IN(‘one statement’, ‘loop’))
,ExecutionTimeMs int
)
GO

IF OBJECT_ID(‘RowsToInsert’) IS NOT NULL DROP TABLE RowsToInsert
CREATE TABLE RowsToInsert(#rows int)
GO

–Support procedures
IF OBJECT_ID(‘CreateTables’) IS NOT NULL DROP PROC CreateTables
GO
CREATE PROC CreateTables AS
IF OBJECT_ID(‘HeapLoop’) IS NOT NULL DROP TABLE HeapLoop
CREATE TABLE HeapLoop(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
IF OBJECT_ID(‘ClusteredLoop’) IS NOT NULL DROP TABLE ClusteredLoop
CREATE TABLE ClusteredLoop(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
CREATE CLUSTERED INDEX x ON ClusteredLoop(c1)
IF OBJECT_ID(‘HeapOneStatement’) IS NOT NULL DROP TABLE HeapOneStatement
CREATE TABLE HeapOneStatement(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
IF OBJECT_ID(‘ClusteredOneStatement’) IS NOT NULL DROP TABLE ClusteredOneStatement
CREATE TABLE ClusteredOneStatement(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
CREATE CLUSTERED INDEX x ON ClusteredOneStatement(c1)
GO

IF OBJECT_ID(‘TruncateTables’) IS NOT NULL DROP PROC TruncateTables
GO
CREATE PROC TruncateTables AS
TRUNCATE TABLE HeapLoop
TRUNCATE TABLE ClusteredLoop
TRUNCATE TABLE HeapOneStatement
TRUNCATE TABLE ClusteredOneStatement
GO

IF OBJECT_ID(‘DoBefore’) IS NOT NULL DROP PROC DoBefore
GO
CREATE PROC DoBefore AS
BACKUP LOG TestDb TO DISK = ‘nul’
CHECKPOINT
GO

IF OBJECT_ID(‘iHeapLoop’) IS NOT NULL DROP PROC iHeapLoop
GO
CREATE PROC iHeapLoop @rows int AS
DECLARE @i int = 1
WHILE @i <= @rows
BEGIN
INSERT INTO HeapLoop (c2) VALUES(2)
SET @i = @i + 1
END
GO

IF OBJECT_ID(‘iClusteredLoop’) IS NOT NULL DROP PROC iClusteredLoop
GO
CREATE PROC iClusteredLoop @rows int AS
DECLARE @i int = 1
WHILE @i <= @rows
BEGIN
INSERT INTO ClusteredLoop (c2) VALUES(2)
SET @i = @i + 1
END
GO

IF OBJECT_ID(‘iHeapOneStatement’) IS NOT NULL DROP PROC iHeapOneStatement
GO
CREATE PROC iHeapOneStatement @rows int AS
INSERT INTO HeapOneStatement (c2)
SELECT TOP(@rows) 2 FROM syscolumns a CROSS JOIN syscolumns b
GO

IF OBJECT_ID(‘iClusteredOneStatement’) IS NOT NULL DROP PROC iClusteredOneStatement
GO
CREATE PROC iClusteredOneStatement @rows int AS
INSERT INTO ClusteredOneStatement (c2)
SELECT TOP(@rows) 2 FROM syscolumns a CROSS JOIN syscolumns b
GO

–Proc to do the inserts
IF OBJECT_ID(‘DoTheInserts’) IS NOT NULL DROP PROC DoTheInserts
GO
CREATE PROC DoTheInserts
AS
DECLARE @dt datetime, @NumberOfRowsToInsert int
SET @NumberOfRowsToInsert = (SELECT #rows FROM RowsToInsert)
EXEC DoBefore –Batch allocation, heap:
SET @dt = GETDATE()
EXEC iHeapOneStatement @rows = @NumberOfRowsToInsert
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘heap’, ‘one statement’, DATEDIFF(ms, @dt, GETDATE()))

EXEC DoBefore –Batch allocation, clustered:
SET @dt = GETDATE()
EXEC iClusteredOneStatement @rows = @NumberOfRowsToInsert
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘clustered’, ‘one statement’, DATEDIFF(ms, @dt, GETDATE()))

EXEC DoBefore –Single allocations, heap:
SET @dt = GETDATE()
–BEGIN TRAN
EXEC iHeapLoop @rows = @NumberOfRowsToInsert
–COMMIT
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘heap’, ‘loop’, DATEDIFF(ms, @dt, GETDATE()))

EXEC DoBefore –Single allocations, clustered
SET @dt = GETDATE()
–BEGIN TRAN
EXEC iClusteredLoop @rows = @NumberOfRowsToInsert
–COMMIT
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘clustered’, ‘loop’, DATEDIFF(ms, @dt, GETDATE()))
GO

–Run the tests
EXEC CreateTables
TRUNCATE TABLE TimeLogger
TRUNCATE TABLE RowsToInsert INSERT INTO RowsToInsert VALUES(10000)

–<Below can be executed over several connections>
EXEC DoTheInserts
EXEC DoTheInserts
EXEC DoTheInserts
EXEC DoTheInserts
–</Below can be executed over several connections>

–How did we do?
SELECT COUNT(*) AS NumberOfExecutions, TableStructure, InsertType, AVG(ExecutionTimeMs) AS AvgMs
FROM TimeLogger WITH(NOLOCK)
GROUP BY TableStructure, InsertType
ORDER BY InsertType, TableStructure

–Verify that no fragmentation
SELECT
OBJECT_NAME(OBJECT_ID) AS objName
,index_type_desc
,avg_fragmentation_in_percent AS frag
,page_count AS #pages
,record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, ‘DETAILED’)
WHERE OBJECT_NAME(OBJECT_ID) <> ‘TimeLogger’ AND index_level = 0

 

Finally got Kalen’s new Query Tuning book…

I know this book has been out for a little while now, but I didn’t get it until now. This one I’ve been looking forward to for a long time. A quick look in the book is very promising (as expected).

The full name of the book, btw, is “Inside Microsoft SQL Server 2005 Query Tuning and Optimization” from MS press. The title says it all, I guess. 🙂

Non-trusted constraints and performance

(See my part 1 article about non-trusted constraints in general: http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints.aspx)

The optimizer is a pretty smart little animal. It can even use constraints to eliminate some data access or part of a query. That assumes however that the constraint is trusted. For example, for a check constraint:

USE Adventureworks
SET STATISTICS IO ON
--Also check the execution plan
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Note that the table isn’t even accessed for the second query. SQL Server know that there can be no rows in the table where UnitPrice > 0 because there is a constraint:

SELECT OBJECT_NAME(parent_object_idAS table_namenamedefinition
FROM sys.check_constraints
WHERE parent_object_id OBJECT_ID('sales.salesorderdetail')
ORDER BY table_name

Notice the constraint CK_SalesOrderDetail_UnitPrice with the condition ([UnitPrice]>=(0.00)). But what if the constraint isn’t trusted?

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Now we do have data access. And even if we enable the constraint, it will be non-trusted and SQL Server cannot know for sure that no row violates the condition:

ALTER TABLE Sales.SalesOrderDetail CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

We need to enable the constraint WITH CHECK to make sure it is trusted:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Performance can also be improved with foreign key constraints. For example, we have a foreign key which states that we cannot have an order detail row for an order which doesn’t exist in the orders table. Now, consider below (check out both I/O and execution plan):

SELECT sd.SalesOrderIDsd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID
WHERE sd.OrderQty 20

We didn’t return any columns from the SalesOrderHeader table, and since SQL Server know that each row in the SalesOrderDetail table has a corresponding row in the SalesOrderHeader table, there’s no need to access the SalesOrderHeader table at all. But if the constraint isn’t trusted:

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT sd.SalesOrderIDsd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID
WHERE sd.OrderQty 20

Notice that now the SalesOrderTable *is* accessed, because the constraint is disabled. As, as with above, it isn’t enough to enable it, we need to enable it WITH CHECK to make it trusted again. OK, we might say that why even bother with the join if we only need columns from the referencing table in the first place. But consider this view:

CREATE VIEW myView AS
SELECT 
sd.SalesOrderIDs.RevisionNumbers.DueDatesd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID

The view accesses columns from both tables. But it is quite possible that we have users of this view who are only intered in columns from the SalesOrderDetail table:

SELECT SalesOrderIDCarrierTrackingNumber
FROM myView

If the constraint is not trusted, then both tables are accessed (with the join operation, of course). But if we make sure that the constraint is trusted, then only the SalesOrderDetail table is accessed and no join is performed:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT SalesOrderIDCarrierTrackingNumber
FROM myView

So, bottom line is that you should be careful with non-trusted constraints. Obviously because you might have data in the table which violates the constraint, but also for performance reasons. It is easy to check whether you have any non-trusted constraints in the database:

SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.check_constraints
WHERE is_not_trusted 1
UNION ALL
SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.foreign_keys
WHERE is_not_trusted 1
ORDER BY table_name

I should add that to me, the purpose of constraint to make sure I have valid data. The performance aspect is just a bonus. But I admitt that I’m pretty impressed by the optimizer in some cases. 🙂

 

Owner/schema qualifying object names

It isn’t easy to remember all the details regarding object/schema qualifying object names. There are many variables involved, such as:

  • Version of SQL Server
  • Qualifying the owner when you call the proc (EXEC dbo.procname)
  • Qualifying object references inside the proc code
  • Qualifying object names if you aren’t using procedures
  • Whether the user is the same as the owner of the proc/table
  • Default schema for the user

So, I decided to give it a spin for different combinations and investigate both profiler events and also number of plans in cache. I won’t post all details here, that would be too much to write down. You can use the scripts at the end to do your own findings. I did not investigate the differences regarding compile locks. Anyhow, here are my conclusions:

  • I could not produce any recompile (SP:Recompile or SQLStmtRecompile) event for any combbination.
  • I did find SP:CacheMiss events on 2000 when you execute a proc and don’t qualify the proc name (for 2005 I always got those events). Then a subsequent SP_CacheHit will follow.
  • For straight SQL (no procedures) I noticed that each user get its separate plan when you don’t owner-qualify the table name. This makes sense. An interesting aspect on 2005 was that if you specify a default schema for the user (and two users has the same default schema), then the users will share the plan (basically the default schema becomes the “owner” of the plan).

Below are the scripts I used:

--Login as sysadmin:
USE master
IF DB_ID('myTestDb'IS NOT NULL DROP DATABASE myTestDb
IF EXISTS(SELECT FROM syslogins WHERE name 'Kalle'EXEC sp_droplogin 'Kalle'
IF EXISTS(SELECT FROM syslogins WHERE name 'Olle'EXEC sp_droplogin 'Olle'
GO
EXEC sp_addlogin 'Kalle''*hjk&6f' EXEC sp_addlogin 'Olle''*hjk&6f'
CREATE DATABASE myTestDb
GO
USE myTestDb
EXEC sp_grantdbaccess 'Kalle' EXEC sp_grantdbaccess 'Olle'
GO
CREATE TABLE dbo.t(c1 int identity PRIMARY KEYc2 char(30))
INSERT INTO dbo.t SELECT TOP 1000 'hello' FROM sysobjects a CROSS JOIN sysobjects b
CREATE INDEX ON t(c1)
GO
CREATE PROC dbo.p AS SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
CREATE PROC dbo.p_q AS SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
GRANT EXECUTE ON dbo.p TO KalleOlle
GRANT EXECUTE ON dbo.p_q TO KalleOlle
GRANT SELECT ON TO KalleOlle

--Number of plans in cache, run after executions of proc
SELECT OBJECT_NAME(objid), sqluid, *
FROM master..syscacheobjects
WHERE dbid DB_ID()
AND 
cacheobjtype 'Compiled Plan'
AND sql NOT LIKE '%PSTATMAN%'

--Run this three times, logged in as sysadmin (dbo), Kalle and Olle
USE myTestDb
GO
EXEC dbo.p
GO
EXEC dbo.p_q
GO
EXEC p
GO
EXEC p_q
GO
SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
USE master

Find table and index name for fragmented indexes

Got this question from a newsgroup today. The answer is pretty simple, just use the dynamic management view sys.dm_db_index_physical_stats. I’m posting this here mostly so I have somewhere to refer to when asked this question…

I prefer to have a helper function to get the index name:

CREATE FUNCTION dbo.index_name (@object_id int@index_id int)
RETURNS sysname
AS
BEGIN
RETURN
(SELECT name FROM sys.indexes WHERE object_id @object_id and index_id @index_id)
END;
GO

And then a simple query:

SELECT
OBJECT_NAME(object_idAS tblName
,dbo.index_name(object_idindex_idAS ixName
,avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL)
WHERE avg_fragmentation_in_percent 20
AND index_type_desc IN('CLUSTERED INDEX''NONCLUSTERED INDEX')

Then you just adjust the search clause to your liking. One hint is to exclude nindexes with few pages (the page_count column).

 

 

 

Is statistics over non-indexed columns updated by index rebuild?

Short answer: no.

This question came up today in the MCT discussion group. My gut instinct said no, but I wanted to test it to be certain. But first a brief background:

You can rebuild an index using DBCC DBREINDEX or in 2005 (and now preferred) ALTER INDEX … REBUILD. Rebuilding an index internally creates a new index and when that has been done, drops the old index.

So it is pretty obvious that we also get new statistics for that index (based on all data, not sampled, just as when we do CREATE INDEX). As an aside, reorganizing does *not* update the statistics…

But what about statistics over non-indexed columns? SQL Server can create this by itself, assuming you didn’t turn off this database option. These are named something like _WA_sys. And you can also create these explicitly usinf CREATE STATISTICS.

A few words about below script: I wanted to use the STATS_DATE function to retrieve datetime for when the statistics was built/updated. But STATS_DATE doesn’t seem to work on statistics only; it expect an id for an index… So, this is why I use DBCC SHOW_STATISTICS instead. And, unfortunately, DBCC SHOW_STATISTICS only display the statistics build time with minute precision. This is why I have a WAITFOR with > 1 minute in between the operations.

I got the same resuld whether or not I rebuild a clustered or non-clustered index on the table or even when specifying ALL indexes. Script:

USE tempdb
SET NOCOUNT ON
IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t(c1 int identity, c2 char(5))
INSERT INTO t (c2)
SELECT TOP 10000 ‘Hello’ FROM syscolumns a, syscolumns b

CREATE CLUSTERED INDEX x1 ON t(c2)
CREATE STATISTICS s1 ON t(c1)

SELECT ‘ ‘ AS “Before mass modification”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1′) WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY ’00:01:02’

INSERT INTO t (c2)
SELECT TOP 10000 ‘Hi’ FROM syscolumns a, syscolumns b
SELECT ‘ ‘ AS “Before index rebuild”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1′) WITH STAT_HEADER, NO_INFOMSGS

WAITFOR DELAY ’00:01:02’

–ALTER INDEX x1 ON t REBUILD
ALTER INDEX ALL ON t REBUILD
SELECT ‘ ‘ AS “After index rebuild”
DBCC SHOW_STATISTICS(‘t’, ‘x1’) WITH STAT_HEADER, NO_INFOMSGS
DBCC SHOW_STATISTICS(‘t’, ‘s1’) WITH STAT_HEADER, NO_INFOMSGS

 

 

Does BACKUP utilize pages in cache?

I’m having a conversation with Tony Rogerson (see his great blog at: http://sqlblogcasts.com/blogs/tonyrogerson/) about whether or not BACKUP will grab paged from cache (Buffer Pool, BP) or always read them from disk.

It seems natural to grab the pages in BP to eliminate some physical I/O. But thinking a bit more, I realized that the data in BP might not be organized in a suitable way. As far as I know, there’s no overall extent structure (map) for the BP. And BACKUP is done at extent basis (extents doesn’t have to be sequental on backup media, though). So, how would SQL Server orchestrate this? It need to lay each extent in backup media, but reading BP, it will get something like page 5 from extent 78, then page 2 from page 456, then page 3 from page 8964, then page 2 from extent 78.

Anyhow, seeing is believeing, so I created a repro for this (see code at end of this blog).

My conclusion is that backup does not read pages from BP, it will read all pages (extents) from disk. My test showed no different in lapse time whether or not data is in cache and same for I/O.

The conversation is still ongoing, and anyone is free to point out flaws in the script or different findings.

USE master
GO
IF DB_ID(‘testBpBackup’) IS NOT NULL DROP DATABASE testBpBackup
GO
CREATE DATABASE testBpBackup
ON  PRIMARY
(NAME = N’testBpBackup_Data’
,FILENAME = N’C:\testBpBackup_Data.mdf’
,SIZE = 1GB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 50MB )
LOG ON
(NAME = N’testBpBackup_Log’
,FILENAME = N’C:\testBpBackup_Log.ldf’
,SIZE = 100MB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 10MB)
GO
USE testBpBackup
GO
IF OBJECT_ID(‘t’) IS NOT NULL DROP TABLE t
CREATE TABLE t (c1 int identity, c2 char(500))
INSERT INTO t
SELECT TOP 1000000 ‘Hello’ FROM syscolumns a, syscolumns b, syscolumns c
EXEC sp_spaceused ‘t’ –Approx 500MB
GO

–Execute below in one batch

–support variables
DECLARE @t datetime, @io_before bigint, @crap int

–Backup with data in bp
SET @crap = (SELECT COUNT(*) FROM t)
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = ‘nul’
SELECT
DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS “seconds with data in BP”,
(SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) – @io_before AS “I/O with data in BP”

–Backup with data not in bp
DBCC DROPCLEANBUFFERS
SET @t = CURRENT_TIMESTAMP
SET @io_before = (SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1))

BACKUP DATABASE testBpBackup TO DISK = ‘nul’
SELECT
DATEDIFF(s, @t, CURRENT_TIMESTAMP) AS “seconds with data NOT in BP”,
(SELECT num_of_reads FROM sys.dm_io_virtual_file_stats( DB_ID(), 1)) – @io_before AS “I/O with data NOT in BP”

 

Are execution plans for functions cached?

Obviously, were talking about multi-statement functions, since in-line functions are just views in the end.

My gut feeling for this is “yes”, but I wanted to be absolutely certain. So here goes:

  1. Create a function in Adventureworks
  2. Use that function in a SELECT statement
  3. Check if a plan exists in the plan cache for above

USE Adventureworks
GO

IF OBJECT_ID(‘fn’) IS NOT NULL DROP FUNCTION fn
GO

CREATE FUNCTION fn(@rg uniqueidentifier)
RETURNS @tbl TABLE(SalesOrderDetailID int NOT NULL PRIMARY KEY, OrderQty smallint NOT NULL)
AS
BEGIN
INSERT INTO @tbl(SalesOrderDetailID, OrderQty)
SELECT SalesOrderDetailID, OrderQty FROM sod WHERE rowguid = @rg
RETURN
END
GO

SELECT * FROM fn(‘80667840-F962-4EE3-96E0-AECA108E0D4F’)
GO

SELECT cp.cacheobjtype, cp.plan_handle, qp.query_plan
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) AS qp
WHERE qp.objectid  = OBJECT_ID(‘fn’)

IF OBJECT_ID(‘fn’) IS NOT NULL DROP FUNCTION fn

Note the execution plan in XML format picked up from sys.dm_exec_query_plan. If you executed the query in grid format, you can cklick on it, and then save the XML as a file. Rename the file extension to .sqlplan and open that file in SSMS. You can now see the plan for this function graphically.