Code page backgrounder, courtesy of Erland Sommarskog

While browsing through the programming newsgroup today, I came across a post from Erland Sommarskog – a short backgrounder about code pages and collations. I’ve never seen code pages described so coherent and with so few words, so I asked Erland if I could quote his text in my blog (no, Erland doesn’t blog ūüôā ). So below quoted text is with Erland’s kind permission.

For those of you who want to know more about Erland or read some of his great deep-dive articles, check out

“To start with, if we should be picky, there are no ASCII characters >= 128.
There are however lot of other character sets that defines this area.

Way back in the 80s vendors started to explore the area 128-255, and
about each vendor come with its character set(s). The contribution
from the IBM/Microsoft combo that ran MS-DOS was a number of code
pages, of which 437 was of their oldest. Later, they realized that
they did not support all languages in Western Europe, and they defined
CP850 which served Western Europe better.

Meanwhile, HP had Roman-8 and Digital had their DEC Multinational Character
Set. Eventually, ISO settled on composing a standard, and they worked
from DEC MCS – or DEC were smart to work from the ISO drafts, I don’t know
which. This resulted in ISO-8859 a family or originally eight 8-bit
character sets, which recently evolved into 15 sets.

By the time Microsoft divorced from IBM, they abandoned CP437 and
CP850 as the character set for Windows, and went with ISO-8859, at
least for Western Europe. Except that they added some printable
characters in the range 128-159 where Latin-1 has only control characters.
This became CodePage 1252, and CP1252 is the code page typically
used for 8-bit Windows applications on a computer installed in Western
Europe or the Americas. However, CP437/CP850 still lives on Windows
today; the command-line windows uses a so-called OEM character set which
is one of these.

If you have a Windows program that uses CP1252, and the server collation
is CP437, the client API will convert the data for you, so if you pass
for instance √Ė which has character code 216 in CP1252, the byte that
gets stored in SQL Server will be another. When you retrieve data,
data will be converted in the other direction. However, since CP1252
and CP437 does not include the same characters, the conversion may
not be roundtrip. For instance, √Ö may not be in CP437, so an √Ö from
CP1252 will become A, and will remain A when you retrieve it.

<TiborComment>Here I removed a section which was only relevant for the newsgroup thread in question</TiborComment>

Finally, all collations have 255 characters for varchar, and at least
65535 characters for nvarchar.”

For those of you who want to dive deep in collations and such topics, check out

Non-trusted constraints and performance

(See my part 1 article about non-trusted constraints in general:

The optimizer is a pretty smart little animal. It can even use constraints to eliminate some data access or part of a query. That assumes however that the constraint is trusted. For example, for a check constraint:

USE Adventureworks
--Also check the execution plan
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Note that the table isn’t even accessed for the second query. SQL Server know that there can be no rows in the table where UnitPrice > 0 because there is a constraint:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name, definition
FROM sys.check_constraints
WHERE parent_object_id = OBJECT_ID('sales.salesorderdetail')
ORDER BY table_name

Notice the constraint CK_SalesOrderDetail_UnitPrice with the condition ([UnitPrice]>=(0.00)). But what if the constraint isn’t trusted?

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Now we do have data access. And even if we enable the constraint, it will be non-trusted and SQL Server cannot know for sure that no row violates the condition:

ALTER TABLE Sales.SalesOrderDetail CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

We need to enable the constraint WITH CHECK to make sure it is trusted:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Performance can also be improved with foreign key constraints. For example, we have a foreign key which states that we cannot have an order detail row for an order which doesn’t exist in the orders table. Now, consider below (check out both I/O and execution plan):

SELECT sd.SalesOrderID, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID
WHERE sd.OrderQty > 20

We didn’t return any columns from the SalesOrderHeader table, and since SQL Server know that each row in the SalesOrderDetail table has a corresponding row in the SalesOrderHeader table, there’s no need to access the SalesOrderHeader table at all. But if the constraint isn’t trusted:

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT sd.SalesOrderID, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID
WHERE sd.OrderQty > 20

Notice that now the SalesOrderTable *is* accessed, because the constraint is disabled. As, as with above, it isn’t enough to enable it, we need to enable it WITH CHECK to make it trusted again. OK, we might say that why even bother with the join if we only need columns from the referencing table in the first place. But consider this view:

sd.SalesOrderID, s.RevisionNumber, s.DueDate, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID

The view accesses columns from both tables. But it is quite possible that we have users of this view who are only intered in columns from the SalesOrderDetail table:

SELECT SalesOrderID, CarrierTrackingNumber
FROM myView

If the constraint is not trusted, then both tables are accessed (with the join operation, of course). But if we make sure that the constraint is trusted, then only the SalesOrderDetail table is accessed and no join is performed:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT SalesOrderID, CarrierTrackingNumber
FROM myView

So, bottom line is that you should be careful with non-trusted constraints. Obviously because you might have data in the table which violates the constraint, but also for performance reasons. It is easy to check whether you have any non-trusted constraints in the database:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.check_constraints
WHERE is_not_trusted = 1
SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.foreign_keys
WHERE is_not_trusted = 1
ORDER BY table_name

I should add that to me, the purpose of constraint to make sure I have valid data. The performance aspect is just a bonus. But I admitt that I’m pretty impressed by the optimizer in some cases. ūüôā


Non-trusted constraints

A discussion in the newsgroups before the holidays lead to trusted constraints and performance. This inspired me to blog about it, but I decided to have some vacation first :-). Instead of having one long article, I’ll do a two-part. This one is about non-trusted constraints in general, and another one will follow about non-trusted constraints and performance.

As you might know, we can disable check and foreign key constraints. This can be done when we create the constraint, for instance:

USE tempdb
CREATE TABLE t1(c1 int)

Above allow us to add the constraint even though we have rows that violates the constraint. The constraint isn’t disabled, but we (can) have rows in the table that violates the constraint – the constraint isn’t trusted. We can also disable an existing constraint:

USE tempdb

Again, we now have rows in the table that violates the constraint. For the first example, the constraint is enabled, but we didn’t check for existing rows when we added the constraint. If we try to add a row which violates the constraint, we get an error message. For the second example, the constraint isn’t even enabled. We can enable a disabled constraint:


The immediate above INSERT command will fail with an error message. The constraint in table t2 is now enabled. But the constraint for both table t1 and table t2 are non-trusted. For table t1, we added the constraint with existing data, and told SQL Server to not check existing data. SQL Server cannot trust this constraint. For table t2, we disabled the constraint, added data, then enabled the constraint. SQL Server cannot trust the constraint because we might have added data which violates the constraint while the constraint was disabled. There’s an easy way to check whether you have non-tructed constraints. For instance, for check constraints:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.check_constraints
WHERE is_not_trusted = 1

Now, can we turn a non-trusted constraint into a trusted constraint? Yes. But we first need to get rid of offending data:


And now we want to make sure the constraints are trusted:


There’s no typo above. “WITH CHECK” is validate the data, and “CHECK CONSTRAINT” is to enable the constraint.

So, why would we want to bother with disabling and non-trusted constraints? The purpose of constraints is to make sure that we have consistent data. Why would we want to break this in the first place? Well, rarely. But here are a coule of possible scenarios where one could consider disabling constraints:

  • We run some batch operation once a week. With constraint enabled, this takes 5 hours. With constraints disabled, it takes 1 hour. We “know” that the batch is written in a way so that it doesn’t violate any of our constraints.
  • We want to expand the domain of allowable values for a column. Today we allow values Y and N. We will also allow value U for the column. This is implemented as a check constraint. We remove the existing constraint and add the new one (which also allow for U). We know that no of the existing rows cannot violate the new constraint since we expand the domain of allowable values. Adding the new constraint with NOCHECK is much faster.

Above examples might seem a bit … constructed. I haven’t encountered much non-trusted constraints in reality, but it has happended. And my initial goal was to talk about non-trusted constraints and performance, and this will come in the following blog post.

Owner/schema qualifying object names

It isn’t easy to remember all the details regarding object/schema qualifying object names. There are many variables involved, such as:

  • Version of SQL Server
  • Qualifying the owner when you call the proc (EXEC dbo.procname)
  • Qualifying object references inside the proc code
  • Qualifying object names if you aren’t using procedures
  • Whether the user is the same as the owner of the proc/table
  • Default schema for the user

So, I decided to give it a spin for different combinations and investigate both profiler events and also number of plans in cache. I won’t post all details here, that would be too much to write down. You can use the scripts at the end to do your own findings. I did not investigate the differences regarding compile locks. Anyhow, here are my conclusions:

  • I could not produce any recompile (SP:Recompile or SQLStmtRecompile) event for any combbination.
  • I did find SP:CacheMiss events on 2000 when you execute a proc and don’t qualify the proc name (for 2005 I always got those events). Then a subsequent SP_CacheHit will follow.
  • For straight SQL (no procedures) I noticed that each user get its separate plan¬†when you don’t owner-qualify the table name. This makes sense. An interesting aspect on 2005 was that if you specify a default schema for the user (and two users has the same default schema), then the users will share the plan (basically the default schema becomes the “owner” of the plan).

Below are the scripts I used:

--Login as sysadmin:
USE master
IF EXISTS(SELECT * FROM syslogins WHERE name = 'Kalle') EXEC sp_droplogin 'Kalle'
IF EXISTS(SELECT * FROM syslogins WHERE name = 'Olle') EXEC sp_droplogin 'Olle'
EXEC sp_addlogin 'Kalle', '*hjk&6f' EXEC sp_addlogin 'Olle', '*hjk&6f'
USE myTestDb
EXEC sp_grantdbaccess 'Kalle' EXEC sp_grantdbaccess 'Olle'
CREATE TABLE dbo.t(c1 int identity PRIMARY KEY, c2 char(30))
INSERT INTO dbo.t SELECT TOP 1000 'hello' FROM sysobjects a CROSS JOIN sysobjects b
CREATE PROC dbo.p AS SELECT c1 FROM t WHERE c1 = 34 AND c2 = 'Hello'
CREATE PROC dbo.p_q AS SELECT c1 FROM dbo.t WHERE c1 = 34 AND c2 = 'Hello'
GRANT EXECUTE ON dbo.p TO Kalle, Olle
GRANT EXECUTE ON dbo.p_q TO Kalle, Olle

--Number of plans in cache, run after executions of proc
SELECT OBJECT_NAME(objid), sql, uid, *
FROM master..syscacheobjects
WHERE dbid = DB_ID()
cacheobjtype = 'Compiled Plan'

--Run this three times, logged in as sysadmin (dbo), Kalle and Olle
USE myTestDb
EXEC dbo.p
EXEC dbo.p_q
EXEC p_q
SELECT c1 FROM t WHERE c1 = 34 AND c2 = 'Hello'
SELECT c1 FROM dbo.t WHERE c1 = 34 AND c2 = 'Hello'
USE master

Sorted views…

Here’s one that pops up regurarly. Something like:

“My views are no longer ordered in SQL Server 2005. I have ORDER BY in the view, but when I select from it, the rows are not returned according to the ORDER BY.”

Since I tend to post a reply often enough to above type of posts, I decided to put it in writing once and for all, so I can point to this blog post. (Laziness is a virtue ūüėČ .)

A view is by definition not sorted. A view is supposed to behave like a table (and thanks to that we have the same language to operate against views as we have to operate against tables). This is why ORDER BY is not by itself allowed in a view definition.

It is allowed to have ORDER BY if you also have TOP, but the purpose of ORDER BY is now to make sure that the correct rows are returned (5 most expensive books, for instance), not in any particular order.

At some point in time, the “workaround” to have TOP 100 PERCENT popped up. The thinking was that we now are allowed to have ORDER BY, which makes the view “sorted”. It doesn’t. However, SQL Server 2000’s optimizer wasn’t as smart as 2005’s optimizer, so for simple queries, you often found that data was returned according to your ORDER BY.

SQL Server 2005’s optimizer is smarter, and if it finds TOP 100 PERCENT and ORDER BY it realizes that both these operations doesn’t affect *which* rows to return, so both operations are removed from the execution plan. That is smart, since these doesn’t affect the data anyhow. Remeber that a view is by definition not sorted.

Some developers has relied on the 2000 behavior to create “sorted views”. This behavior was never documented, just a side effect of the execution plan, and was never guaranteed.

So what do we do? Well, same as when we run queries against a table. We have ORDER BY when we read data from the view!

Aside: The query builder which you can use when you create a view has a very strange behavior. It allow you to check a “sort” checkbox, and it will then add ORDER BY and TOP 100 PERCENT to the SELECT statement. According to above explanation, this is kind of silly, but I’m hoping that MS will tidy this up for the next version of SQL Server.

Is ‘YYYY-MM-DD’ always treated as year, month, date?

Most of you already know the answer to this question: no.

I won’t recap the rules for interpretation of the current datetime literals, as I have an article on the subject:

However, SQL Server 2008 introduces a number of new date, datetime and time datatypes. One interesting aspect of these is that interpretation of the ANSI SQL datetime format, ‘YYYY-MM-DD’ is independent of language and datetime settings.

SQL Server MVP Steve Kass opened up the discussion (along with a connect suggestion) that it is time for the old datetime datatype also always treat this format as year, month, date. In general, I’m all for a consistent treatment of this datetime formatting, and that should indeed be the most “natural” one and also adhere to ANSI SQL. However, I don’t feel qualified to estimate if or how much backwards compatibility problems this will cause. Are any of you using format ‘YYYY-DD-MM’ format, for instance?

Here’s a script that show you whether or not SQL Server treats ‘nnnn-nn-nn’ as ‘yyyy-mm-dd’ or something else. For fun, if you have July CTP of SQL Server 2008, you can run the same script and change the datatypes to datetime2 and compare the difference.

USE tempdb
CREATE TABLE langdf(lang sysname, dt datetime, correct bit)

DECLARE @lang sysname, @sql nvarchar(4000)
DECLARE c CURSOR FOR SELECT alias FROM master.dbo.syslanguages
WHILE 1 = 1
SET @sql =
@lang + ‘”
”’ + @lang + ”’
,CAST(”2007-02-09” AS datetime)
,CASE WHEN CAST(”2007-02-09” AS datetime) = ”20070209” THEN 1 ELSE 0 END’
PRINT @sql

COUNT(NULLIF(correct, 0)) AS “correct interpretation”
,COUNT(NULLIF(correct, 1)) AS “incorrect interpretation”
FROM langdf

SELECT * FROM langdf