Execution plan re-use, sp_executesql and TSQL variables

Let me start by saying that the contents of this post is not very advanced. If you have read the excellent paper “Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005”, https://technet.microsoft.com/en-us/library/cc966425.aspx  and understood it, you would already know below, and much more…

I was reading a thread in the open newsgroups today (.tools, posted by Mike), about an advice that the SQL Server 2005 Database Engine Tuning Advisor apparently gave. It advices to replace sp_executesql usage with declaring TSQL variables and using those in the WHERE clause. Translated to the Adventureworks database, it advices that instead of 1) below, we should use 2) below.

1)
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM Person.Person
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'

2)
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM Person.Person
WHERE LastName @P1

Now, I could not reproduce this (make DTA give me the same advice, to replace 1) with 2) ). Perhaps I misunderstood the poster in the group, it is because I’m running SQL Server 2008 DTA and engine, I’m not looking in the right place for that advice, my data isn’t representative, I’m running the DTA with some other settings, etc.. But say that DTA does indeed give such an advice, would would it do that? To be honest, I don’t know. It can hardly have enough information to determine whether 1) or 2) is the best choice.

In short: Say we have an index on the LastName column and the name we look for can either be a very common name, like perhaps “Smith”, or a not so common name, like “Karaszi”. For the more common name, a table scan might be the best choice, where for the not-so-common name, an index seek it probably best thing. OK, a very simple example, but serves well for this discussion.

Back to the difference between 1) and 2). There are potentially very important differences between the two:

For 1), SQL Server will determine a plan based on the contents of the parameter when the plan is created. I.e., it can determine selectivity based on that and determine whether it is a good idea to use an index or not. The plan is then cached and can be re-used. But what if we for the first execution pass in something which is very selective, but typically we are not very selective? Or the other way around? I,e, the plan for one case might not be optimal for some other case. This is where we have new optimizer hints in 2005 like OPTIMIZE FOR and the RECOMPILE hints. I will not go into details about these here, since I’m focusing on the differences between 1) and 2). See the white paper I mentioned, and of course Books Online, for more details.

For 2), SQL Server (or rather: the optimizer) has no knowledge of the contents of the variable when the plan is produced. So it cannot use the statistics to determine selectivity. In above case, it instead uses density (stored with the statistics, assuming such exists for the column). Density is basically the 1/ number of unique values for the column(s). This might be a good representation for a typical lastname, but perhaps not too good for a very common or a very uncommon lastname. Now, in 2005, we have hints for these situations as well (RECOMPILE), but again, that is not the point here.

In order for DTA to give the best advice here, it would have to know about the distribution over the data for that column and also have many executions of that query to see if “all” executions are using a typical value (sp_executesql might be better) or if the stored density value is a good representation for “all” queries that are passed from the clients. I very much doubt that DTA has this level of sophistication. Basically, I don’t know why it advices this. There might be other aspects, like “avoid dynamic SQL” (which whether that holds for this case we could argue in another post), but DTA is about performance, not best practices.

Bottom line is that these things are not simple and we should be very cautious with “rules of thumbs”.

Here’s some TSQL code to demonstrate the differences between 1) and 2). As always, only execute after you read and understood the code!

--Create a copy of the person table
--We will have lots of "Diaz" and very few "Gimmi"
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p
GO
SELECT INTO p
FROM Person.Person

CREATE INDEX ON dbo.p(LastName)

--Create lots of Diaz
INSERT INTO p
SELECT  BusinessEntityID 30000PersonTypeNameStyleTitleFirstNameMiddleNameN'Diaz'SuffixEmailPromotionAdditionalContactInfoDemographicsrowguidModifiedDate
FROM p

--Make sure we have up-to-date statistics
UPDATE STATISTICS WITH FULLSCAN

--Verify execution plan and I/O cost
--for table scan with low selectivity
--and index seek with high selectivity
SET STATISTICS IO ON

DBCC FREEPROCCACHE
SELECT FROM WHERE LastName 'Diaz'
--20183 rows, table scan, 7612 pages

DBCC FREEPROCCACHE
SELECT FROM WHERE LastName 'Gimmi'
--1 row, index seek, 3 pages

------------------------------------------------------------
--sp_execute alternative
------------------------------------------------------------

--Table scan will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, table scan, 7612 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, table scan, 7612 pages

--Other way around
--Index search will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, index seek, 3 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, index seek, 20291 pages

------------------------------------------------------------------
--Alternative using variable
------------------------------------------------------------------
DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--20183 rows, index seek, 20291 pages

DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Gimmi'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--1 rows, index seek, 1 pages

--Same plan even though very different selectivity
--and emptying plan cache in between

--Estimated 33 rows for both above.
--See if that is drawn from statistics density?

--Formula for density: 1/#OfUniqueValuesInColumn
SELECT 1./COUNT(DISTINCT LastNameFROM p
--0.00082918739

--Does that match density in index statistics?
DBCC SHOW_STATISTICS('p''x')
--Yes

--How many rows in the table?
SELECT COUNT(*) FROM p
--39944

--So how many rows would we estimate based on density?
SELECT 0.00082918739 39944
--Yep, 33 rows.

--I.e, for the variable alternative, SQL Server has no
--knowledge of contents of those variables so it must use density instead.

--Clean up:
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p

Leave a Reply

Your email address will not be published.