Why can’t we have column alias in ORDER BY?

You probably think something like “What! Of course we can!”. And so did I when I read the title for a post about this in a forum. Let us first establish that you can have a column alias in ORDER BY:

[sql]SELECT
p.ProductID AS pid
,p.ListPrice AS pr
FROM Production.Product AS p
ORDER BY pr[/sql]

The problem was that a table name was also in the ORDER BY clause. And this is when it becomes weird. The problem isn’t the column alias, it is the table alias. Basically the desire is to write something like:

SELECT
p.ProductID AS pid
,p.ListPrice AS pr
FROM Production.Product AS p
ORDER BY p.pr

Above was apparently allowed in SQL Server 2005 2000 (I never did it myself, however). (Update: Erland Sommarskog caught my error, this was allowed in 2000, not 2005.) In order to understand why above is backwards, we need to consider the logical query flow. The first part of a query which is executed is the FROM clause (logically, the optimizer can do as it pleases as long as we get the same result). In above example, I renamed the table (for the rest of the query) from “Production.Product” to “p”. That means that we cannot refer to “Product” or “Production.Product” anymore in the query. However, SQL has imperfections and the topic of this blog is an attempt from MS to rectify such earlier imperfections (some of them), i.e., improving the product.

Continuing the logical query flow (WHERE, GROUP BY, HAVING), we towards the end find the SELECT clause. Here is where you can rename the columns (a.k.a. column alias) for the rest of the query. After the SELECT list, it is meaningless to talk about different tables (where the data originally comes from), all we have is the result of the query (so far). This is why, we can in the ORDER BY refer to a column alias (generally); ORDER BY comes logically after the SELECT list.

However, the original problem was that one also wanted to include a table alias, which shouldn’t be allowed. So, why is it allowed, in the first place – consider it makes no sense from a logical standpoint? For that we need to look as somethining weird, which is allowed (both in T-SQL and ANSI SQL) and that can be somewhat useful:

SELECT
ProductID AS pid
,ListPrice AS pr
FROM Production.Product
ORDER BY ModifiedDate

We don’t need to return something to the client app in order to sort over it. Sure, that can be useful, but from a logical perspective, this is spooky. We sort over something which doesn’t exist! So, the imperfections of the SQL language is the reason things starts to break down. Basically, what we do is to sort over something as if we would have included it in the SELECT list. We are saying, in the ORDER BY clause something like

“Hey, dbms! Can you please pretend we included Product.ModifiedDate in the SELECT list so we can sort over it. Bur don’t return it to the client app, I don’t want to see that column.”

Where it really breaks down is when we combine table and column alias. Let me repeat the problematic query:

SELECT
p.ProductID AS pid
,p.ListPrice AS pr
FROM Production.Product AS p
ORDER BY p.pr

The problem is that it is meaningless to specify a table alias here, the column does exist in the SELECT list. The only reason we can refer to something table-something at all, is to let the DBMS pretend we included something additional from the base tables, so we could sort over it. Such as (allowed):

SELECT
ProductID AS pid
,ListPrice AS pr
FROM Production.Product AS p
ORDER BY p.ModifiedDate

One could of course argue that below would be more logical (not allowed):

SELECT
ProductID AS pid
,ListPrice AS pr
FROM Production.Product AS p
ORDER BY Product.ModifiedDate

Arguing that one form of two above is more logical than the other is pretty irrelevant, houever. Both are weird constructs, and the language is what it is. Anyone feel like checking the ANSI SQL standard which of above two (first, second or both) are allowed/preferred? No me, I’m off to grab some breakfast!

 

Adding a PK online?

I just read in a forum about a user who want to replikate a table, but the table doesn’t have a PK. The table is pretty large, and having the table not available while adding the PK is undesireable. The table has a clustered index already, and there are other columns which are known to be unique (presence of unique indexes).

What I wanted to test is whether we can just add the PK constraint using the ONLINE option. Show answer is “yes”. We can’t turn a unique index into a PK using some meta-data only operation, unfortunately. That would be the easiest step. But we can add a unique constraint using the ONLINE option – there’s even an example syntax for this in BOL. We can then remove the pre-existing unique index using ONLINE. Since we are using ONLINE, we need to be on Enterprise or Developer Edition.

I wanted to test this, and below is my test script:

USE tempdb
SET NOCOUNT ON
GO

IF OBJECT_ID('t'IS NOT NULL DROP TABLE t
GO
CREATE TABLE t(c1 INT NOT NULL, c2 CHAR(100))
CREATE UNIQUE CLUSTERED INDEX ON t(c1)

INSERT INTO t
SELECT TOP(5000000ROW_NUMBER() OVER(ORDER BY a.id), 'x'
FROM syscolumns AS a
CROSS JOIN syscolumns AS b
CROSS JOIN syscolumns AS c
GO

-----------------------------------------------------
--Now try to add a PK "online"...:
-----------------------------------------------------

--Add a nullable identity?
ALTER TABLE ADD c3 INT IDENTITY NULL
--Msg 8147, Level 16, State 1, Line 1
--Could not create IDENTITY attribute on nullable column 'c3', table 't'.
GO

--Add a PK using ONLINE?
--Prepare a new connection with following INSERTs
--to verify it can run simultaneously:
--INSERT INTO t(c1, c2) VALUES(5000001, 't')
--INSERT INTO t(c1, c2) VALUES(5000001, 't')
--INSERT INTO t(c1, c2) VALUES(5000002, 't')
--GO
--INSERT INTO t(c1, c2) VALUES(5000003, 't')

--Above prepared? OK, execute below and jump to
--other window to verify it is online
ALTER TABLE ADD CONSTRAINT PK_t PRIMARY KEY NONCLUSTERED (c1WITH(ONLINE = ON)
GO

--Verify the indexes using my own sp_indexinfo
EXEC sp_indexinfo 't'

Is there an overhead to RPC events?

I recently read a discussion whether RPC events add overhead compared to just submitting text. I got curious and did some testing, which I want to share.

Background, SQL events
Using most APIs, if you just submit a query to SQL Server, you will get what we call an SQL event (for instance in Profiler an SQL:BatchCompleted). The client app submits some text, SQL Server parses the text, and either creates an execution pland or re-uses a cached plan (if such exists, based on checksum of the query text submitted). Now, this is very straight forward. But there’s also problems, mainly risk for SQL Injection and bad query plan re-use. Your query probably has a WHERE clause, and if you just concatenate the values you will have in the WHERE clause, then you will submit different text each time. Different text means different execution plans – bad plan re-use. Also, if you have some malicious user or are under hacker-attack, then somebody can inject some query or part of query inside what you think would be only the search value (this is a classic).

Stored procedures as RPC events
Now, consider if we are using stored procedures. We might want to pass the proc name, and the values for each parameter, where we pass the parameter values as binary data. Do this right (for instance in ADO, you configure your command object’s CommandType property as being CommandType.StoredProcedure (instead of CommandType.Text which is default). This causes the batch be shown in Profiler as an RPC event, for instance RPC:Completed. So, consider executing the same procedure either by constructing a string such as “EXEC myProc @p1 = ‘20080223’, @p2 = ‘Stockholm'” as text (SQL event), or passing it as a stored procedure using parameter objects (RPC event). We expect RPC event to be more efficient, right? We’ll get to that in a moment.

Queries as RPC events
Now, we can also submit queries, without using stored procedures, so they are passed as RPC events. We do this by adding at least one parameter to the command object. This causes ADO.NET to use an RPC event and execute sp_executesql, parameterizing accoring to the parameter object for the command object. This causes your code to be safe from SQL injection and will allow for better plan re-use. I.e., in most cases a very good thing – second best to use stored procedures!

Non-parameterized queries as RPC events
Now, consider if we don’t want to parameterize our queries (we don’t have a WHERE clause, we just don’t want to, we use some programming layer which doesn’t allow for it, or we are just plan curious – like now). Can we still get RPC events? Yes, by adding a dummy parameter to the command object. We don’t have to use this command object in the query – just the existence causes an RPC event instead of SQL event. This allow us to compare SQL events to RPC event and look at only the overhead for the sp_executesql part.

Overhead for using sp_executesql
Below is some VB.NET code which does just that. For the first two, we use exactly the same query multiple executions and compare timing for doing them as SQL events or RPC events. For the second two, we change what produceID we search for (but not parameterizing the search condition) which causes different execution plans for each execution. I verified by counting execution plans that the first two uses the same execution pland and the second two generates a new plan for each iteration in the loop. Here’s the VB.NET code:

Imports System.Data
Imports System.Data.SqlClient
Module Module1

Sub Main()

Dim As Int32iterations As Int32 20000
Dim tickcount As Int64
Dim ProductID As Int32 43664
Dim ProdStatus As Int16
Dim sql As String

sql "SELECT status FROM Sales.SalesOrderHeader WHERE SalesOrderID = "
Dim cn As New SqlConnection("SERVER=localhost\a;DATABASE=AdventureWorks2008;INTEGRATED SECURITY=TRUE")
cn.Open()

Dim utilCmd As New SqlCommand("DBCC FREEPROCCACHE"cn)
utilCmd.ExecuteNonQuery()

'Definitions
'**********************************************************************************************
'Same query every execution, SQL:BatchCompleted
Dim cmdAllText As New SqlCommand(sql "43664"cn)
cmdAllText.CommandType CommandType.Text

'Same query every execution, RPC:Completed
Dim cmdParmText As New SqlCommand(sql "43664"cn)
cmdParmText.CommandType CommandType.Text
cmdParmText.Parameters.Add
("@dummy"SqlDbType.Int)
cmdParmText.Parameters("@dummy").Value -9999

'Different query every execution, SQL:BatchCompleted
Dim cmdAllText2 As New SqlCommand()
cmdAllText2.Connection cn
cmdAllText2.CommandType 
CommandType.Text

'Different query every execution, RPC:Completed
Dim cmdParmText2 As New SqlCommand()
cmdParmText2.Connection cn
cmdParmText2.CommandType 
CommandType.Text
cmdParmText2.Parameters.Add
("@dummy"SqlDbType.Int)
cmdParmText2.Parameters("@dummy").Value -9999

'Execution
'**********************************************************************************************
Console.WriteLine("Same query every execution, SQL:BatchCompleted")
tickcount Environment.TickCount
For To iterations
ProdStatus 
cmdAllText.ExecuteScalar()
Next
tickcount Environment.TickCount tickcount
Call PrintOuput(iterationstickcount)

Console.WriteLine("Same query every execution, RPC:Completed")
tickcount Environment.TickCount
For To iterations
ProdStatus 
cmdParmText.ExecuteScalar()
Next
tickcount Environment.TickCount tickcount
Call PrintOuput(iterationstickcount)

Console.WriteLine("Different query every execution, SQL:BatchCompleted")
tickcount Environment.TickCount
For To iterations
cmdAllText2.CommandText 
sql i.ToString()
ProdStatus cmdAllText2.ExecuteScalar()
Next
tickcount Environment.TickCount tickcount
Call PrintOuput(iterationstickcount)

Console.WriteLine("Different query every execution, RPC:Completed")
tickcount Environment.TickCount
For To iterations
cmdParmText2.CommandText 
sql i.ToString()
ProdStatus cmdParmText2.ExecuteScalar()
Next
tickcount Environment.TickCount tickcount
Call PrintOuput(iterationstickcount)

Console.ReadLine()

End Sub

Sub PrintOuput(ByVal iterations As Int32ByVal tickcount As Int64)
Console.WriteLine("Number of executions: " iterations.ToString)
Console.WriteLine("Total exec time (ms): " tickcount)
Console.WriteLine("Exec time per query (ms): " tickcount iterations)
Console.WriteLine("")
End Sub

End Module

And here’s a typical execution result from above code:

Same query every execution, SQL:BatchCompleted
Number of executions: 20000
Total exec time (ms): 1357
Exec time per query (ms): 0,06785

Same query every execution, RPC:Completed
Number of executions: 20000
Total exec time (ms): 1513
Exec time per query (ms): 0,07565

Different query every execution, SQL:BatchCompleted
Number of executions: 20000
Total exec time (ms): 2402
Exec time per query (ms): 0,1201

Different query every execution, RPC:Completed
Number of executions: 20000
Total exec time (ms): 14446
Exec time per query (ms): 0,7223

As you can see, there is an overhead to use RPC events and sp_executesql. I.e., if we don’t parameterize our command object, then we only pay with no gain. The overhead is definitely noticeable were we generate a new plan for each execution. An interesting aspect is that if I remove the call to DBCC FREEPROCCACHE, (so we get new plan for each iteration in the loop, but plans can be re-used from prior execution of the VB code), then this is much cheaper and closer to the others (about 5.7 seconds instead of 14 seconds). So, we do pay extra for sp_executesql especially when we generate a new plan.

Is above a weird example?
Yes, it is, and I want to emphasize that. Typically, you will use sp_executesql when you actually parameterize your queries. That will cause better plan re-use (and protect from SQL injection). Above is only to show whether sp_executesql has a cost – without considering the benefits.

How about using stored procedures?
Say you are using stored procedures, and wonder how different ways to execute a procedure compares? Let’s have a look. We can do it three ways (probably more, but below are the ones I’m interested in):

  1. Build the proc command and pass it as a string. No parameter objects, no sp_executesql. You are still open to SQL injection. Don’t do this, we only want to show it for comparsion reasons. Profiler will show an SQL:BatchCompleted event with TextData something like “EXEC dbo.GetOrderStatusRes @SalesOrderID = 43664”.
  2. Use parameters for your command objects, but you forget to define the command object of type CommandType.StoredProcedure. Profiler will show an RPC:Completed event with TextData something like “exec sp_executesql N’EXEC dbo.GetOrderStatusRes @SalesOrderID = @theID’,N’@theID int’,@theID=43664”.
  3. The right way. Define the command object as of type CommandType.StoredProcedure, and of course add parameter object to the command object. Profiler will show an RPC:Completed event with TextData something like:
    “declare @p2 tinyint
    set @p2=5
    exec dbo.GetOrderStatusOut @SalesOrderID=43664,@Status=@p2 output
    select @p2”
    Note that the Profiler tool adds the declare, set and select parts, it is not part of what is submitted from the client app. The client app actually submit the paramaters as binary values. Profiler add this to be nice to us, so we can copy that text and execute it, read it, etc.

Anyhow, here’s the VB code:

Imports System.Data
Imports System.Data.SqlClient
Module Module1

Sub Main()

Dim As Int32iterations As Int32 20000
Dim tickcount As Int64
Dim ProductID As Int32 43664
Dim ProdStatus As Int16
Dim sql As String

Dim cn As New SqlConnection("SERVER=localhost\a;DATABASE=AdventureWorks2008;INTEGRATED SECURITY=TRUE")
cn.Open()

'Create the procedures we will execute
Dim utilCmd As New SqlCommand(""cn)
utilCmd.CommandText "IF OBJECT_ID('dbo.GetOrderStatusRes') IS NOT NULL DROP PROC dbo.GetOrderStatusRes" vbCrLf
utilCmd.CommandText 
+"IF OBJECT_ID('dbo.GetOrderStatusOut') IS NOT NULL DROP PROC dbo.GetOrderStatusOut"
utilCmd.ExecuteNonQuery()
utilCmd.CommandText "CREATE PROC dbo.GetOrderStatusRes @SalesOrderID int AS SELECT status FROM Sales.SalesOrderHeader WHERE SalesOrderID = @SalesOrderID"
utilCmd.ExecuteNonQuery()
utilCmd.CommandText "CREATE PROC dbo.GetOrderStatusOut @SalesOrderID int, @status tinyint OUT AS SET @status = (SELECT status FROM Sales.SalesOrderHeader WHERE SalesOrderID = @SalesOrderID)"
utilCmd.ExecuteNonQuery()

'Run each once so it is in cache
utilCmd.CommandText "EXEC dbo.GetOrderStatusRes @SalesOrderID = 43664"
utilCmd.ExecuteScalar()
utilCmd.CommandText "DECLARE @s tinyint EXEC dbo.GetOrderStatusOut @SalesOrderID = 43664, @status = @s OUT"
utilCmd.ExecuteNonQuery()

utilCmd.CommandText "DBCC FREEPROCCACHE"

'Definitions
'**********************************************************************************************
'Exec proc as string, SQL:BatchCompleted
Dim cmdAllText As New SqlCommand("EXEC dbo.GetOrderStatusRes @SalesOrderID = " ProductID.ToString(), cn)
cmdAllText.CommandType CommandType.Text

'Parameterize, but still as text, RPC:Completed with sp_executesql
Dim cmdParmText As New SqlCommand("EXEC dbo.GetOrderStatusRes @SalesOrderID = @theID"cn)
cmdParmText.CommandType CommandType.Text
cmdParmText.Parameters.Add
("@theID"SqlDbType.Int)
cmdParmText.Parameters("@theID").Value ProductID

'Parameterize as stored procedure, RPC:Completed
Dim cmdRpc As New SqlCommand("dbo.GetOrderStatusOut"cn)
cmdRpc.CommandType CommandType.StoredProcedure
Dim prmSalesOrderID As SqlParameter = New SqlParameter("@SalesOrderID"SqlDbType.Int)
prmSalesOrderID.Value ProductID
cmdRpc.Parameters.Add
(prmSalesOrderID)
Dim prmStatus As SqlParameter = New SqlParameter("@Status"SqlDbType.TinyInt)
prmStatus.Direction ParameterDirection.Output
cmdRpc.Parameters.Add(prmStatus)

'Execution
'**********************************************************************************************
utilCmd.ExecuteNonQuery()
Console.ReadLine()

Console.WriteLine("Exec proc as string, SQL:BatchCompleted")
tickcount Environment.TickCount
For To iterations
ProdStatus 
cmdAllText.ExecuteScalar()
Next
tickcount Environment.TickCount tickcount
Call PrintOuput(iterationstickcount)
Console.ReadLine()

utilCmd.ExecuteNonQuery()
Console.WriteLine("Parameterize, but still as text, RPC:Completed with sp_executesql")
tickcount Environment.TickCount
For To iterations
ProdStatus 
cmdParmText.ExecuteScalar()
Next
tickcount Environment.TickCount tickcount
Call PrintOuput(iterationstickcount)
Console.ReadLine()

utilCmd.ExecuteNonQuery()
Console.WriteLine("Parameterize as stored procedure, RPC:Completed")
tickcount Environment.TickCount
For To iterations
cmdRpc.ExecutteNonQuery
()
Next
tickcount Environment.TickCount tickcount
Call PrintOuput(iterationstickcount)

Console.ReadLine()

End Sub
Sub 
PrintOuput(ByVal iterations As Int32ByVal tickcount As Int64)
Console.WriteLine("Number of executions: " iterations.ToString)
Console.WriteLine("Total exec time (ms): " tickcount)
Console.WriteLine("Exec time per query (ms): " tickcount iterations)
Console.WriteLine("")
End Sub

End Module

And here’s the result from a typical execution:

Exec proc as string, SQL:BatchCompleted
Number of executions: 20000
Total exec time (ms): 1810
Exec time per query (ms): 0,0905

Parameterize, but still as text, RPC:Completed with sp_executesql
Number of executions: 20000
Total exec time (ms): 1700
Exec time per query (ms): 0,085

Parameterize as stored procedure, RPC:Completed
Number of executions: 20000
Total exec time (ms): 1388
Exec time per query (ms): 0,0694

We can see that doing it the “proper” way is cheapest, but there’s not a big difference between the three. The first alternative is not good, though, since we aren’t protected from SQL injection. And since you then will be using parameter object anyhow, just go ahead and define the CommandType as stored procedure while you’re at it.

Be careful with constraints calling UDFs

You might just not get what you think. I would be surprised if this hasn’t been blogged already, but if so, it would be worth repeating. Here’s the deal (example from a forum,, slightly re-worked):

I want the values in one column to be unique, assuming the value in another column is 1. Can I use an UDF for that?

On the surface, yes. You can write an UDF to wich you pass the value which should be conditionally unique and in that UDF check how many rows has this value AND othercolumn = 1. If more than 1 row, then function returns 0, else 1 (or something else to signal “OK” or “Not OK”). Now, you can call this function in a CHECK constraint. Something like CHECK(myFunction(uniqueCol) = 1). this will on the surface do its job, as long as you INSERT into the table. But if you update a row and only set the otherColumn for some row from 0 to 1, then the check constraint will not be checked. The optimizer is smart enough to understand that the update doesn’t change anything that we refer to in our CHECK constraint, so why bother checking the constraint? End result here is that the constraint doesn’t do what we want it to do. Use a trigger instead (or some other method). Here’s a repro:

USE tempdb
GO
IF OBJECT_ID('t'IS NOT NULL DROP TABLE t
IF OBJECT_ID('t_uq'IS NOT NULL DROP FUNCTION t_uq
GO

CREATE TABLE t(c0 INTc1 NVARCHAR(50), c2 bit)
GO

CREATE FUNCTION t_uq(@c1 NVARCHAR(50))
RETURNS bit
AS
BEGIN
DECLARE 
@ret bit
IF (SELECT COUNT(*) FROM WHERE c1 @c1 AND c2 1) > 1
SET @ret 0
ELSE
SET 
@ret 1
RETURN @ret
END
GO

ALTER TABLE ADD CONSTRAINT t_c CHECK(dbo.t_uq(c11)

INSERT INTO t(c0c1c2VALUES(1'a'0--OK
INSERT INTO t(c0c1c2VALUES(2'a'0--OK
INSERT INTO t(c0c1c2VALUES(3'b'1--OK
INSERT INTO t(c0c1c2VALUES(4'b'1--Fails

--So far so good, but watch now:

UPDATE SET c2 WHERE c0 2
--No error, the constraint doesn't do its job!

--We have invalid data:
SELECT FROM t

German translation of my updated datetime article

Frank Kalis has been kind enough to translate my datetime article (http://www.karaszi.com/SQLServer/info_datetime.asp) to German. I updated my article a while ago to reflect the new date and time related types in SQL Server 2008, and I just learned that Frank now has finished the translation of my updated article to to German. Check it out at:

http://www.insidesql.org/beitraege/datetime/der-ultimative-guide-fuer-die-datetime-datentypen

Execution plan re-use, sp_executesql and TSQL variables

Let me start by saying that the contents of this post is not very advanced. If you have read the excellent paper “Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005”, https://technet.microsoft.com/en-us/library/cc966425.aspx  and understood it, you would already know below, and much more…

I was reading a thread in the open newsgroups today (.tools, posted by Mike), about an advice that the SQL Server 2005 Database Engine Tuning Advisor apparently gave. It advices to replace sp_executesql usage with declaring TSQL variables and using those in the WHERE clause. Translated to the Adventureworks database, it advices that instead of 1) below, we should use 2) below.

1)
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM Person.Person
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'

2)
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM Person.Person
WHERE LastName @P1

Now, I could not reproduce this (make DTA give me the same advice, to replace 1) with 2) ). Perhaps I misunderstood the poster in the group, it is because I’m running SQL Server 2008 DTA and engine, I’m not looking in the right place for that advice, my data isn’t representative, I’m running the DTA with some other settings, etc.. But say that DTA does indeed give such an advice, would would it do that? To be honest, I don’t know. It can hardly have enough information to determine whether 1) or 2) is the best choice.

In short: Say we have an index on the LastName column and the name we look for can either be a very common name, like perhaps “Smith”, or a not so common name, like “Karaszi”. For the more common name, a table scan might be the best choice, where for the not-so-common name, an index seek it probably best thing. OK, a very simple example, but serves well for this discussion.

Back to the difference between 1) and 2). There are potentially very important differences between the two:

For 1), SQL Server will determine a plan based on the contents of the parameter when the plan is created. I.e., it can determine selectivity based on that and determine whether it is a good idea to use an index or not. The plan is then cached and can be re-used. But what if we for the first execution pass in something which is very selective, but typically we are not very selective? Or the other way around? I,e, the plan for one case might not be optimal for some other case. This is where we have new optimizer hints in 2005 like OPTIMIZE FOR and the RECOMPILE hints. I will not go into details about these here, since I’m focusing on the differences between 1) and 2). See the white paper I mentioned, and of course Books Online, for more details.

For 2), SQL Server (or rather: the optimizer) has no knowledge of the contents of the variable when the plan is produced. So it cannot use the statistics to determine selectivity. In above case, it instead uses density (stored with the statistics, assuming such exists for the column). Density is basically the 1/ number of unique values for the column(s). This might be a good representation for a typical lastname, but perhaps not too good for a very common or a very uncommon lastname. Now, in 2005, we have hints for these situations as well (RECOMPILE), but again, that is not the point here.

In order for DTA to give the best advice here, it would have to know about the distribution over the data for that column and also have many executions of that query to see if “all” executions are using a typical value (sp_executesql might be better) or if the stored density value is a good representation for “all” queries that are passed from the clients. I very much doubt that DTA has this level of sophistication. Basically, I don’t know why it advices this. There might be other aspects, like “avoid dynamic SQL” (which whether that holds for this case we could argue in another post), but DTA is about performance, not best practices.

Bottom line is that these things are not simple and we should be very cautious with “rules of thumbs”.

Here’s some TSQL code to demonstrate the differences between 1) and 2). As always, only execute after you read and understood the code!

--Create a copy of the person table
--We will have lots of "Diaz" and very few "Gimmi"
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p
GO
SELECT INTO p
FROM Person.Person

CREATE INDEX ON dbo.p(LastName)

--Create lots of Diaz
INSERT INTO p
SELECT  BusinessEntityID 30000PersonTypeNameStyleTitleFirstNameMiddleNameN'Diaz'SuffixEmailPromotionAdditionalContactInfoDemographicsrowguidModifiedDate
FROM p

--Make sure we have up-to-date statistics
UPDATE STATISTICS WITH FULLSCAN

--Verify execution plan and I/O cost
--for table scan with low selectivity
--and index seek with high selectivity
SET STATISTICS IO ON

DBCC FREEPROCCACHE
SELECT FROM WHERE LastName 'Diaz'
--20183 rows, table scan, 7612 pages

DBCC FREEPROCCACHE
SELECT FROM WHERE LastName 'Gimmi'
--1 row, index seek, 3 pages

------------------------------------------------------------
--sp_execute alternative
------------------------------------------------------------

--Table scan will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, table scan, 7612 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, table scan, 7612 pages

--Other way around
--Index search will be used for both because of execution plan re-use
DBCC FREEPROCCACHE
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, index seek, 3 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM p
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, index seek, 20291 pages

------------------------------------------------------------------
--Alternative using variable
------------------------------------------------------------------
DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--20183 rows, index seek, 20291 pages

DBCC FREEPROCCACHE
DECLARE @P1 NVARCHAR(50)
SET @P1 'Gimmi'
SELECT FirstNameLastNamePersonTypeTitle
FROM p
WHERE LastName @P1
GO
--1 rows, index seek, 1 pages

--Same plan even though very different selectivity
--and emptying plan cache in between

--Estimated 33 rows for both above.
--See if that is drawn from statistics density?

--Formula for density: 1/#OfUniqueValuesInColumn
SELECT 1./COUNT(DISTINCT LastNameFROM p
--0.00082918739

--Does that match density in index statistics?
DBCC SHOW_STATISTICS('p''x')
--Yes

--How many rows in the table?
SELECT COUNT(*) FROM p
--39944

--So how many rows would we estimate based on density?
SELECT 0.00082918739 39944
--Yep, 33 rows.

--I.e, for the variable alternative, SQL Server has no
--knowledge of contents of those variables so it must use density instead.

--Clean up:
IF OBJECT_ID('p'IS NOT NULL DROP TABLE p

Code page backgrounder, courtesy of Erland Sommarskog

While browsing through the programming newsgroup today, I came across a post from Erland Sommarskog – a short backgrounder about code pages and collations. I’ve never seen code pages described so coherent and with so few words, so I asked Erland if I could quote his text in my blog (no, Erland doesn’t blog 🙂 ). So below quoted text is with Erland’s kind permission.

For those of you who want to know more about Erland or read some of his great deep-dive articles, check out http://www.sommarskog.se/.

“To start with, if we should be picky, there are no ASCII characters >= 128.
There are however lot of other character sets that defines this area.

Way back in the 80s vendors started to explore the area 128-255, and
about each vendor come with its character set(s). The contribution
from the IBM/Microsoft combo that ran MS-DOS was a number of code
pages, of which 437 was of their oldest. Later, they realized that
they did not support all languages in Western Europe, and they defined
CP850 which served Western Europe better.

Meanwhile, HP had Roman-8 and Digital had their DEC Multinational Character
Set. Eventually, ISO settled on composing a standard, and they worked
from DEC MCS – or DEC were smart to work from the ISO drafts, I don’t know
which. This resulted in ISO-8859 a family or originally eight 8-bit
character sets, which recently evolved into 15 sets.

By the time Microsoft divorced from IBM, they abandoned CP437 and
CP850 as the character set for Windows, and went with ISO-8859, at
least for Western Europe. Except that they added some printable
characters in the range 128-159 where Latin-1 has only control characters.
This became CodePage 1252, and CP1252 is the code page typically
used for 8-bit Windows applications on a computer installed in Western
Europe or the Americas. However, CP437/CP850 still lives on Windows
today; the command-line windows uses a so-called OEM character set which
is one of these.

If you have a Windows program that uses CP1252, and the server collation
is CP437, the client API will convert the data for you, so if you pass
for instance Ö which has character code 216 in CP1252, the byte that
gets stored in SQL Server will be another. When you retrieve data,
data will be converted in the other direction. However, since CP1252
and CP437 does not include the same characters, the conversion may
not be roundtrip. For instance, Å may not be in CP437, so an Å from
CP1252 will become A, and will remain A when you retrieve it.

<TiborComment>Here I removed a section which was only relevant for the newsgroup thread in question</TiborComment>

Finally, all collations have 255 characters for varchar, and at least
65535 characters for nvarchar.”

For those of you who want to dive deep in collations and such topics, check out http://msdn.microsoft.com/en-us/library/bb330962.aspx.

Non-trusted constraints and performance

(See my part 1 article about non-trusted constraints in general: http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints.aspx)

The optimizer is a pretty smart little animal. It can even use constraints to eliminate some data access or part of a query. That assumes however that the constraint is trusted. For example, for a check constraint:

USE Adventureworks
SET STATISTICS IO ON
--Also check the execution plan
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Note that the table isn’t even accessed for the second query. SQL Server know that there can be no rows in the table where UnitPrice > 0 because there is a constraint:

SELECT OBJECT_NAME(parent_object_idAS table_namenamedefinition
FROM sys.check_constraints
WHERE parent_object_id OBJECT_ID('sales.salesorderdetail')
ORDER BY table_name

Notice the constraint CK_SalesOrderDetail_UnitPrice with the condition ([UnitPrice]>=(0.00)). But what if the constraint isn’t trusted?

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Now we do have data access. And even if we enable the constraint, it will be non-trusted and SQL Server cannot know for sure that no row violates the condition:

ALTER TABLE Sales.SalesOrderDetail CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

We need to enable the constraint WITH CHECK to make sure it is trusted:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Performance can also be improved with foreign key constraints. For example, we have a foreign key which states that we cannot have an order detail row for an order which doesn’t exist in the orders table. Now, consider below (check out both I/O and execution plan):

SELECT sd.SalesOrderIDsd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID
WHERE sd.OrderQty 20

We didn’t return any columns from the SalesOrderHeader table, and since SQL Server know that each row in the SalesOrderDetail table has a corresponding row in the SalesOrderHeader table, there’s no need to access the SalesOrderHeader table at all. But if the constraint isn’t trusted:

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT sd.SalesOrderIDsd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID
WHERE sd.OrderQty 20

Notice that now the SalesOrderTable *is* accessed, because the constraint is disabled. As, as with above, it isn’t enough to enable it, we need to enable it WITH CHECK to make it trusted again. OK, we might say that why even bother with the join if we only need columns from the referencing table in the first place. But consider this view:

CREATE VIEW myView AS
SELECT 
sd.SalesOrderIDs.RevisionNumbers.DueDatesd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID

The view accesses columns from both tables. But it is quite possible that we have users of this view who are only intered in columns from the SalesOrderDetail table:

SELECT SalesOrderIDCarrierTrackingNumber
FROM myView

If the constraint is not trusted, then both tables are accessed (with the join operation, of course). But if we make sure that the constraint is trusted, then only the SalesOrderDetail table is accessed and no join is performed:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT SalesOrderIDCarrierTrackingNumber
FROM myView

So, bottom line is that you should be careful with non-trusted constraints. Obviously because you might have data in the table which violates the constraint, but also for performance reasons. It is easy to check whether you have any non-trusted constraints in the database:

SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.check_constraints
WHERE is_not_trusted 1
UNION ALL
SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.foreign_keys
WHERE is_not_trusted 1
ORDER BY table_name

I should add that to me, the purpose of constraint to make sure I have valid data. The performance aspect is just a bonus. But I admitt that I’m pretty impressed by the optimizer in some cases. 🙂

 

Non-trusted constraints

A discussion in the newsgroups before the holidays lead to trusted constraints and performance. This inspired me to blog about it, but I decided to have some vacation first :-). Instead of having one long article, I’ll do a two-part. This one is about non-trusted constraints in general, and another one will follow about non-trusted constraints and performance.

As you might know, we can disable check and foreign key constraints. This can be done when we create the constraint, for instance:

USE tempdb
CREATE TABLE t1(c1 int)
INSERT INTO t1(c1VALUES(-1)
GO
ALTER TABLE t1 WITH NOCHECK ADD CONSTRAINT CK_t1_c1 CHECK(c1 0)

Above allow us to add the constraint even though we have rows that violates the constraint. The constraint isn’t disabled, but we (can) have rows in the table that violates the constraint – the constraint isn’t trusted. We can also disable an existing constraint:

USE tempdb
CREATE TABLE t2(c1 INT CONSTRAINT CK_t2_c1 CHECK(c1 0) )
INSERT INTO t2(c1VALUES(1)
GO
ALTER TABLE t2 NOCHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1VALUES(-1)

Again, we now have rows in the table that violates the constraint. For the first example, the constraint is enabled, but we didn’t check for existing rows when we added the constraint. If we try to add a row which violates the constraint, we get an error message. For the second example, the constraint isn’t even enabled. We can enable a disabled constraint:

ALTER TABLE t2 CHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1VALUES(-1)

The immediate above INSERT command will fail with an error message. The constraint in table t2 is now enabled. But the constraint for both table t1 and table t2 are non-trusted. For table t1, we added the constraint with existing data, and told SQL Server to not check existing data. SQL Server cannot trust this constraint. For table t2, we disabled the constraint, added data, then enabled the constraint. SQL Server cannot trust the constraint because we might have added data which violates the constraint while the constraint was disabled. There’s an easy way to check whether you have non-tructed constraints. For instance, for check constraints:

SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.check_constraints
WHERE is_not_trusted 1

Now, can we turn a non-trusted constraint into a trusted constraint? Yes. But we first need to get rid of offending data:

DELETE FROM t1 WHERE c1 0
DELETE FROM t2 WHERE c1 0

And now we want to make sure the constraints are trusted:

ALTER TABLE t1 WITH CHECK CHECK CONSTRAINT CK_t1_c1
ALTER TABLE t2 WITH CHECK CHECK CONSTRAINT CK_t2_c1

There’s no typo above. “WITH CHECK” is validate the data, and “CHECK CONSTRAINT” is to enable the constraint.

So, why would we want to bother with disabling and non-trusted constraints? The purpose of constraints is to make sure that we have consistent data. Why would we want to break this in the first place? Well, rarely. But here are a coule of possible scenarios where one could consider disabling constraints:

  • We run some batch operation once a week. With constraint enabled, this takes 5 hours. With constraints disabled, it takes 1 hour. We “know” that the batch is written in a way so that it doesn’t violate any of our constraints.
  • We want to expand the domain of allowable values for a column. Today we allow values Y and N. We will also allow value U for the column. This is implemented as a check constraint. We remove the existing constraint and add the new one (which also allow for U). We know that no of the existing rows cannot violate the new constraint since we expand the domain of allowable values. Adding the new constraint with NOCHECK is much faster.

Above examples might seem a bit … constructed. I haven’t encountered much non-trusted constraints in reality, but it has happended. And my initial goal was to talk about non-trusted constraints and performance, and this will come in the following blog post.

Owner/schema qualifying object names

It isn’t easy to remember all the details regarding object/schema qualifying object names. There are many variables involved, such as:

  • Version of SQL Server
  • Qualifying the owner when you call the proc (EXEC dbo.procname)
  • Qualifying object references inside the proc code
  • Qualifying object names if you aren’t using procedures
  • Whether the user is the same as the owner of the proc/table
  • Default schema for the user

So, I decided to give it a spin for different combinations and investigate both profiler events and also number of plans in cache. I won’t post all details here, that would be too much to write down. You can use the scripts at the end to do your own findings. I did not investigate the differences regarding compile locks. Anyhow, here are my conclusions:

  • I could not produce any recompile (SP:Recompile or SQLStmtRecompile) event for any combbination.
  • I did find SP:CacheMiss events on 2000 when you execute a proc and don’t qualify the proc name (for 2005 I always got those events). Then a subsequent SP_CacheHit will follow.
  • For straight SQL (no procedures) I noticed that each user get its separate plan when you don’t owner-qualify the table name. This makes sense. An interesting aspect on 2005 was that if you specify a default schema for the user (and two users has the same default schema), then the users will share the plan (basically the default schema becomes the “owner” of the plan).

Below are the scripts I used:

--Login as sysadmin:
USE master
IF DB_ID('myTestDb'IS NOT NULL DROP DATABASE myTestDb
IF EXISTS(SELECT FROM syslogins WHERE name 'Kalle'EXEC sp_droplogin 'Kalle'
IF EXISTS(SELECT FROM syslogins WHERE name 'Olle'EXEC sp_droplogin 'Olle'
GO
EXEC sp_addlogin 'Kalle''*hjk&6f' EXEC sp_addlogin 'Olle''*hjk&6f'
CREATE DATABASE myTestDb
GO
USE myTestDb
EXEC sp_grantdbaccess 'Kalle' EXEC sp_grantdbaccess 'Olle'
GO
CREATE TABLE dbo.t(c1 int identity PRIMARY KEYc2 char(30))
INSERT INTO dbo.t SELECT TOP 1000 'hello' FROM sysobjects a CROSS JOIN sysobjects b
CREATE INDEX ON t(c1)
GO
CREATE PROC dbo.p AS SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
CREATE PROC dbo.p_q AS SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
GRANT EXECUTE ON dbo.p TO KalleOlle
GRANT EXECUTE ON dbo.p_q TO KalleOlle
GRANT SELECT ON TO KalleOlle

--Number of plans in cache, run after executions of proc
SELECT OBJECT_NAME(objid), sqluid, *
FROM master..syscacheobjects
WHERE dbid DB_ID()
AND 
cacheobjtype 'Compiled Plan'
AND sql NOT LIKE '%PSTATMAN%'

--Run this three times, logged in as sysadmin (dbo), Kalle and Olle
USE myTestDb
GO
EXEC dbo.p
GO
EXEC dbo.p_q
GO
EXEC p
GO
EXEC p_q
GO
SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
USE master