Non-trusted constraints and performance

(See my part 1 article about non-trusted constraints in general: http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints.aspx)

The optimizer is a pretty smart little animal. It can even use constraints to eliminate some data access or part of a query. That assumes however that the constraint is trusted. For example, for a check constraint:

USE Adventureworks
SET STATISTICS IO ON
--Also check the execution plan
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Note that the table isn’t even accessed for the second query. SQL Server know that there can be no rows in the table where UnitPrice > 0 because there is a constraint:

SELECT OBJECT_NAME(parent_object_idAS table_namenamedefinition
FROM sys.check_constraints
WHERE parent_object_id OBJECT_ID('sales.salesorderdetail')
ORDER BY table_name

Notice the constraint CK_SalesOrderDetail_UnitPrice with the condition ([UnitPrice]>=(0.00)). But what if the constraint isn’t trusted?

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Now we do have data access. And even if we enable the constraint, it will be non-trusted and SQL Server cannot know for sure that no row violates the condition:

ALTER TABLE Sales.SalesOrderDetail CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

We need to enable the constraint WITH CHECK to make sure it is trusted:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQtyFROM Sales.SalesOrderDetail WHERE UnitPrice 0

Performance can also be improved with foreign key constraints. For example, we have a foreign key which states that we cannot have an order detail row for an order which doesn’t exist in the orders table. Now, consider below (check out both I/O and execution plan):

SELECT sd.SalesOrderIDsd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID
WHERE sd.OrderQty 20

We didn’t return any columns from the SalesOrderHeader table, and since SQL Server know that each row in the SalesOrderDetail table has a corresponding row in the SalesOrderHeader table, there’s no need to access the SalesOrderHeader table at all. But if the constraint isn’t trusted:

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT sd.SalesOrderIDsd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID
WHERE sd.OrderQty 20

Notice that now the SalesOrderTable *is* accessed, because the constraint is disabled. As, as with above, it isn’t enough to enable it, we need to enable it WITH CHECK to make it trusted again. OK, we might say that why even bother with the join if we only need columns from the referencing table in the first place. But consider this view:

CREATE VIEW myView AS
SELECT 
sd.SalesOrderIDs.RevisionNumbers.DueDatesd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID sd.SalesOrderID

The view accesses columns from both tables. But it is quite possible that we have users of this view who are only intered in columns from the SalesOrderDetail table:

SELECT SalesOrderIDCarrierTrackingNumber
FROM myView

If the constraint is not trusted, then both tables are accessed (with the join operation, of course). But if we make sure that the constraint is trusted, then only the SalesOrderDetail table is accessed and no join is performed:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT SalesOrderIDCarrierTrackingNumber
FROM myView

So, bottom line is that you should be careful with non-trusted constraints. Obviously because you might have data in the table which violates the constraint, but also for performance reasons. It is easy to check whether you have any non-trusted constraints in the database:

SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.check_constraints
WHERE is_not_trusted 1
UNION ALL
SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.foreign_keys
WHERE is_not_trusted 1
ORDER BY table_name

I should add that to me, the purpose of constraint to make sure I have valid data. The performance aspect is just a bonus. But I admitt that I’m pretty impressed by the optimizer in some cases. 🙂

 

Non-trusted constraints

A discussion in the newsgroups before the holidays lead to trusted constraints and performance. This inspired me to blog about it, but I decided to have some vacation first :-). Instead of having one long article, I’ll do a two-part. This one is about non-trusted constraints in general, and another one will follow about non-trusted constraints and performance.

As you might know, we can disable check and foreign key constraints. This can be done when we create the constraint, for instance:

USE tempdb
CREATE TABLE t1(c1 int)
INSERT INTO t1(c1VALUES(-1)
GO
ALTER TABLE t1 WITH NOCHECK ADD CONSTRAINT CK_t1_c1 CHECK(c1 0)

Above allow us to add the constraint even though we have rows that violates the constraint. The constraint isn’t disabled, but we (can) have rows in the table that violates the constraint – the constraint isn’t trusted. We can also disable an existing constraint:

USE tempdb
CREATE TABLE t2(c1 INT CONSTRAINT CK_t2_c1 CHECK(c1 0) )
INSERT INTO t2(c1VALUES(1)
GO
ALTER TABLE t2 NOCHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1VALUES(-1)

Again, we now have rows in the table that violates the constraint. For the first example, the constraint is enabled, but we didn’t check for existing rows when we added the constraint. If we try to add a row which violates the constraint, we get an error message. For the second example, the constraint isn’t even enabled. We can enable a disabled constraint:

ALTER TABLE t2 CHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1VALUES(-1)

The immediate above INSERT command will fail with an error message. The constraint in table t2 is now enabled. But the constraint for both table t1 and table t2 are non-trusted. For table t1, we added the constraint with existing data, and told SQL Server to not check existing data. SQL Server cannot trust this constraint. For table t2, we disabled the constraint, added data, then enabled the constraint. SQL Server cannot trust the constraint because we might have added data which violates the constraint while the constraint was disabled. There’s an easy way to check whether you have non-tructed constraints. For instance, for check constraints:

SELECT OBJECT_NAME(parent_object_idAS table_namename
FROM sys.check_constraints
WHERE is_not_trusted 1

Now, can we turn a non-trusted constraint into a trusted constraint? Yes. But we first need to get rid of offending data:

DELETE FROM t1 WHERE c1 0
DELETE FROM t2 WHERE c1 0

And now we want to make sure the constraints are trusted:

ALTER TABLE t1 WITH CHECK CHECK CONSTRAINT CK_t1_c1
ALTER TABLE t2 WITH CHECK CHECK CONSTRAINT CK_t2_c1

There’s no typo above. “WITH CHECK” is validate the data, and “CHECK CONSTRAINT” is to enable the constraint.

So, why would we want to bother with disabling and non-trusted constraints? The purpose of constraints is to make sure that we have consistent data. Why would we want to break this in the first place? Well, rarely. But here are a coule of possible scenarios where one could consider disabling constraints:

  • We run some batch operation once a week. With constraint enabled, this takes 5 hours. With constraints disabled, it takes 1 hour. We “know” that the batch is written in a way so that it doesn’t violate any of our constraints.
  • We want to expand the domain of allowable values for a column. Today we allow values Y and N. We will also allow value U for the column. This is implemented as a check constraint. We remove the existing constraint and add the new one (which also allow for U). We know that no of the existing rows cannot violate the new constraint since we expand the domain of allowable values. Adding the new constraint with NOCHECK is much faster.

Above examples might seem a bit … constructed. I haven’t encountered much non-trusted constraints in reality, but it has happended. And my initial goal was to talk about non-trusted constraints and performance, and this will come in the following blog post.

Owner/schema qualifying object names

It isn’t easy to remember all the details regarding object/schema qualifying object names. There are many variables involved, such as:

  • Version of SQL Server
  • Qualifying the owner when you call the proc (EXEC dbo.procname)
  • Qualifying object references inside the proc code
  • Qualifying object names if you aren’t using procedures
  • Whether the user is the same as the owner of the proc/table
  • Default schema for the user

So, I decided to give it a spin for different combinations and investigate both profiler events and also number of plans in cache. I won’t post all details here, that would be too much to write down. You can use the scripts at the end to do your own findings. I did not investigate the differences regarding compile locks. Anyhow, here are my conclusions:

  • I could not produce any recompile (SP:Recompile or SQLStmtRecompile) event for any combbination.
  • I did find SP:CacheMiss events on 2000 when you execute a proc and don’t qualify the proc name (for 2005 I always got those events). Then a subsequent SP_CacheHit will follow.
  • For straight SQL (no procedures) I noticed that each user get its separate plan when you don’t owner-qualify the table name. This makes sense. An interesting aspect on 2005 was that if you specify a default schema for the user (and two users has the same default schema), then the users will share the plan (basically the default schema becomes the “owner” of the plan).

Below are the scripts I used:

--Login as sysadmin:
USE master
IF DB_ID('myTestDb'IS NOT NULL DROP DATABASE myTestDb
IF EXISTS(SELECT FROM syslogins WHERE name 'Kalle'EXEC sp_droplogin 'Kalle'
IF EXISTS(SELECT FROM syslogins WHERE name 'Olle'EXEC sp_droplogin 'Olle'
GO
EXEC sp_addlogin 'Kalle''*hjk&6f' EXEC sp_addlogin 'Olle''*hjk&6f'
CREATE DATABASE myTestDb
GO
USE myTestDb
EXEC sp_grantdbaccess 'Kalle' EXEC sp_grantdbaccess 'Olle'
GO
CREATE TABLE dbo.t(c1 int identity PRIMARY KEYc2 char(30))
INSERT INTO dbo.t SELECT TOP 1000 'hello' FROM sysobjects a CROSS JOIN sysobjects b
CREATE INDEX ON t(c1)
GO
CREATE PROC dbo.p AS SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
CREATE PROC dbo.p_q AS SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
GRANT EXECUTE ON dbo.p TO KalleOlle
GRANT EXECUTE ON dbo.p_q TO KalleOlle
GRANT SELECT ON TO KalleOlle

--Number of plans in cache, run after executions of proc
SELECT OBJECT_NAME(objid), sqluid, *
FROM master..syscacheobjects
WHERE dbid DB_ID()
AND 
cacheobjtype 'Compiled Plan'
AND sql NOT LIKE '%PSTATMAN%'

--Run this three times, logged in as sysadmin (dbo), Kalle and Olle
USE myTestDb
GO
EXEC dbo.p
GO
EXEC dbo.p_q
GO
EXEC p
GO
EXEC p_q
GO
SELECT c1 FROM WHERE c1 34 AND c2 'Hello'
GO
SELECT c1 FROM dbo.t WHERE c1 34 AND c2 'Hello'
GO
USE master

Sorted views…

Here’s one that pops up regurarly. Something like:

“My views are no longer ordered in SQL Server 2005. I have ORDER BY in the view, but when I select from it, the rows are not returned according to the ORDER BY.”

Since I tend to post a reply often enough to above type of posts, I decided to put it in writing once and for all, so I can point to this blog post. (Laziness is a virtue 😉 .)

A view is by definition not sorted. A view is supposed to behave like a table (and thanks to that we have the same language to operate against views as we have to operate against tables). This is why ORDER BY is not by itself allowed in a view definition.

It is allowed to have ORDER BY if you also have TOP, but the purpose of ORDER BY is now to make sure that the correct rows are returned (5 most expensive books, for instance), not in any particular order.

At some point in time, the “workaround” to have TOP 100 PERCENT popped up. The thinking was that we now are allowed to have ORDER BY, which makes the view “sorted”. It doesn’t. However, SQL Server 2000’s optimizer wasn’t as smart as 2005’s optimizer, so for simple queries, you often found that data was returned according to your ORDER BY.

SQL Server 2005’s optimizer is smarter, and if it finds TOP 100 PERCENT and ORDER BY it realizes that both these operations doesn’t affect *which* rows to return, so both operations are removed from the execution plan. That is smart, since these doesn’t affect the data anyhow. Remeber that a view is by definition not sorted.

Some developers has relied on the 2000 behavior to create “sorted views”. This behavior was never documented, just a side effect of the execution plan, and was never guaranteed.

So what do we do? Well, same as when we run queries against a table. We have ORDER BY when we read data from the view!

Aside: The query builder which you can use when you create a view has a very strange behavior. It allow you to check a “sort” checkbox, and it will then add ORDER BY and TOP 100 PERCENT to the SELECT statement. According to above explanation, this is kind of silly, but I’m hoping that MS will tidy this up for the next version of SQL Server.

Is ‘YYYY-MM-DD’ always treated as year, month, date?

Most of you already know the answer to this question: no.

I won’t recap the rules for interpretation of the current datetime literals, as I have an article on the subject: http://karaszi.com/the-ultimate-guide-to-the-datetime-datatypes

However, SQL Server 2008 introduces a number of new date, datetime and time datatypes. One interesting aspect of these is that interpretation of the ANSI SQL datetime format, ‘YYYY-MM-DD’ is independent of language and datetime settings.

SQL Server MVP Steve Kass opened up the discussion (along with a connect suggestion) that it is time for the old datetime datatype also always treat this format as year, month, date. In general, I’m all for a consistent treatment of this datetime formatting, and that should indeed be the most “natural” one and also adhere to ANSI SQL. However, I don’t feel qualified to estimate if or how much backwards compatibility problems this will cause. Are any of you using format ‘YYYY-DD-MM’ format, for instance?

Here’s a script that show you whether or not SQL Server treats ‘nnnn-nn-nn’ as ‘yyyy-mm-dd’ or something else. For fun, if you have July CTP of SQL Server 2008, you can run the same script and change the datatypes to datetime2 and compare the difference.

SET NOCOUNT ON
USE tempdb
IF OBJECT_ID(‘langdf’) IS NOT NULL DROP TABLE langdf
CREATE TABLE langdf(lang sysname, dt datetime, correct bit)

DECLARE @lang sysname, @sql nvarchar(4000)
DECLARE c CURSOR FOR SELECT alias FROM master.dbo.syslanguages
OPEN c
WHILE 1 = 1
BEGIN
FETCH NEXT FROM c INTO @lang
IF @@FETCH_STATUS <> 0 BREAK
SET @sql =
‘SET LANGUAGE “‘ +
@lang + ‘”
INSERT INTO langdf
SELECT
”’ + @lang + ”’
,CAST(”2007-02-09” AS datetime)
,CASE WHEN CAST(”2007-02-09” AS datetime) = ”20070209” THEN 1 ELSE 0 END’
PRINT @sql
EXEC(@sql)
END
DEALLOCATE c

SELECT
COUNT(NULLIF(correct, 0)) AS “correct interpretation”
,COUNT(NULLIF(correct, 1)) AS “incorrect interpretation”
FROM langdf

SELECT * FROM langdf