Code page backgrounder, courtesy of Erland Sommarskog

While browsing through the programming newsgroup today, I came across a post from Erland Sommarskog – a short backgrounder about code pages and collations. I’ve never seen code pages described so coherent and with so few words, so I asked Erland if I could quote his text in my blog (no, Erland doesn’t blog ūüôā ). So below quoted text is with Erland’s kind permission.

For those of you who want to know more about Erland or read some of his great deep-dive articles, check out http://www.sommarskog.se/.

“To start with, if we should be picky, there are no ASCII characters >= 128.
There are however lot of other character sets that defines this area.

Way back in the 80s vendors started to explore the area 128-255, and
about each vendor come with its character set(s). The contribution
from the IBM/Microsoft combo that ran MS-DOS was a number of code
pages, of which 437 was of their oldest. Later, they realized that
they did not support all languages in Western Europe, and they defined
CP850 which served Western Europe better.

Meanwhile, HP had Roman-8 and Digital had their DEC Multinational Character
Set. Eventually, ISO settled on composing a standard, and they worked
from DEC MCS – or DEC were smart to work from the ISO drafts, I don’t know
which. This resulted in ISO-8859 a family or originally eight 8-bit
character sets, which recently evolved into 15 sets.

By the time Microsoft divorced from IBM, they abandoned CP437 and
CP850 as the character set for Windows, and went with ISO-8859, at
least for Western Europe. Except that they added some printable
characters in the range 128-159 where Latin-1 has only control characters.
This became CodePage 1252, and CP1252 is the code page typically
used for 8-bit Windows applications on a computer installed in Western
Europe or the Americas. However, CP437/CP850 still lives on Windows
today; the command-line windows uses a so-called OEM character set which
is one of these.

If you have a Windows program that uses CP1252, and the server collation
is CP437, the client API will convert the data for you, so if you pass
for instance √Ė which has character code 216 in CP1252, the byte that
gets stored in SQL Server will be another. When you retrieve data,
data will be converted in the other direction. However, since CP1252
and CP437 does not include the same characters, the conversion may
not be roundtrip. For instance, √Ö may not be in CP437, so an √Ö from
CP1252 will become A, and will remain A when you retrieve it.

<TiborComment>Here I removed a section which was only relevant for the newsgroup thread in question</TiborComment>

Finally, all collations have 255 characters for varchar, and at least
65535 characters for nvarchar.”

For those of you who want to dive deep in collations and such topics, check out http://msdn.microsoft.com/en-us/library/bb330962.aspx.

SQL Server 2008 and Visual Studio 2008

You probably already know that SQL Server 2008 RTM’d (yesterday). You need to be careful when installing SQL Server 2008 if you also have Visual Studio 2008 installed. It all has to do with the version of the framework that each product requires. Denis already blogged about it here: http://sqlblog.com/blogs/denis_gobo/archive/2008/08/07/8261.aspx.

I just found out that the SQL Server 2008 release notes has been updated with some extra information about this. Check it out at:

http://download.microsoft.com/download/4/9/e/49eeb41a-a769-4520-80d6-671b8ae2bd06/SQLServer2008ReleaseNotes.htm

Read the section below the “Before you install” title. It is worth the 3 minutes it take to read it!!!

Analogy between SQL Server and operating systems

With SQL Server 2008 released, I was thinking back of earlier versions of SQL Server. And I decided to compare them to the MS operating systems. Not a point-in-time comparsion, like “SQL Server version x was released year a, which was the same year that OS y was released.”. I’m thinking more of the feel you have for the product. Why would anyone want to do that? I don’t know – for fun, perhaps? While writing below I realized that the comparsions/analogies worked better the older the product is. Perhaps a product need to be obsolete for us to have the sentimental feeling required for this type of comparsion? Anyhow, here goes:

SQL Server 1.x <-> DOS
(I do know it ran on OS/2, but again this is more about how you feel for the product.)
I know, perhaps not all fair, but think about it. We are talking about command-line environments, or at the best some full-screen character based applications (like edit.exe or saf.exe). And installation was floppy based where the product did fit on a couple of floppies.

SQL Server 4.x <-> OS/2 or Windows pre-95
I can’t decide here.
OS/2 had the merit that it wasn’t a bad OS, but almost no apps were developed for it (think back to version 1.2 and 1.3 and what it was at the time – and what it could have been), and it wasn’t a fun environment to work in. Windows pre-95 had the merit of being a GUI which, sort of, brought multitasking to the desktop – but what about robustness?
Same goes for SQL server 4.x. It was revolutionary in some sense, like: Imagine fitting a real RDBMS in a PC? Now smaller businesses can start using “real” RDBMSs. But OTOH, it was very unpolished. Remember the GUI tools? They were really Windows apps where some conversion tool converted them for OS/2.
So, I think it is a draw between OS/2 and Windows pre-95.

SQL Server 6.x <-> Windows NT 3.x or Windows 9x
Again, I can’t decide.
In one way, SQL Server 6.x was MS first “own” release. But OTOH, the Sybase code base was still there. MS mainly did tool stuff, along with some engine stuff (like ANSI SQL compliance). But it wasn’t a re-write of the engine.
This can compare to Windows 9x – the DOS heritage was still there, in some sense.
If you compare SQL Server 6.x to Windows NT 3.x you can also see similarities. NT 3.x was the first versions of the new revolutionary OS from MS. But it still looked like … old Windows – something you might compare with SQL Server 6.x enterprise Manager.

SQL Server 7.0 <-> NT 4
I was originally going to put Windows 2000 here, but after thinking a while, I decide for NT 4.
7.0 was the first version of the new architecture. A lot happened, where the engine was all re-written. New stuff was introduced (Profiler, DTS, Olap server). So, at the engine level, we basically got a more modern look-and-feel.
To some extent NT 4 was similar. You got a new GUI (adopted from Windows 9x). The revolution was that you now had an *stable* OS which you also could run as your desktop OS. I bet that many of you (computer nerds)/readers preferred NT 4 instead of Windows 9x at that time. I did. There were some architectural news in the OS as well, like the device driver model (some stuff were moved to kernel mode – if my memory serves me).

SQL Server 2000 <-> Windows 2000
Seems too easy, but think about it.
SQL Server 2000 was when the new architecture matured. IMO, a great release at that time. OK, some would argue that it didn’t happened that much between 7.0 and 2000, but maturing and polish of the new architecture is a major thing to me.
Windows 2000 can also be seen as becoming mature – ready to be used in masses. OK, there were some revolutionary new stuff like AD, but you can’t expect the analogy to fit 100%. ūüėČ

SQL Server 2005 <-> Vista
Hmm, is my analogy breaking down here?
I was originally going to put Windows 2003 here. But that was a bit too much going chronologically hand-in-hand.
And I think that XP is a bit unfair (perhaps XP would be a better fit for SQL Server 2000?).
But 2005 did have lots and lots of changes and new features. And so did Vista. Vista has a rather slow adaption rate, and I have the same feeling for SQL Server 2005. Many people seems to wait for Vista+, a perhaps more cleaned-up OS? And some seem to be waiting for SQL Server 2008, even though perhaps not for the same reasons.

SQL Server 2008 <-> Vista +
This was unavoidable, considering how we got here. I won’t dwell into this, since it is too early to say how we feel about these releases in 10 years from now… 

Now why on earth did I write this post? Well, I have been doing some 6 full installations and some 12 database engine installations of SQL Server 2008 the last two days – so I’ve had a lot of time on my hands. ūüôā

 

Fed up with hunting physical index details?

I am. I find myself endlessly hunting for index information when working against the various SQL Servers I come in contact with. And, sure, the information is there. You just need to go and get it. This generally means that I start with sp_helpindex. Then some SELECT from sys.indexes. Then some more against sys.partitions and sys.allocation units (we want space usage stats as well). And perhaps general usage stats (sys.dm_index_usage_stats). (I sometimes might even use the GUI (SSMS) reports and index dialog – but you might already know that I’m not much of a GUI person.)

The good news with all this is that I learn to use these catalog and dynamic management views. Bad news is that it is kind of … boring to do the same thing again and again.

This is why wrote sp_indexinfo. You might have your own index information procedures (which you wrote yourself or found on the Internet). If not, you are welcome to use this one. I aim to improve it over time, so suggestions are welcome. Possible improvements include:

  1. Make it a function. Functions are nice since we can order the results, aggregate, and basically do whatever we want to when we SELECT from the function. But for this I need to find out how we install a user-defined global system function – there’s no supported way to do this. I’m not sure I want to go there…
  2. Reurn missing index information as well. For this we probably want two resultsets, and only return missing index information when we targeted *one* table (no wildcards). If we do this, then function is out since a function can only return *one* result set.

If you care to give it a spin, please let me know. I just wrote the procedure, so I haven’t tested it much yet. If you do find bugs, please leave a comment and I will incorporate into the source (let me know if you want to be acknowledged). Any comments are welcome.

You find the proc at: http://karaszi.com/spindexinfo-enhanced-index-information-procedure

Import/Export Wizard and indexes, keys etc

Here’s one which I’ve seen much confusion about. Something like “I copy tables from database A to database B but I don’t get any indexes, keys, constraints¬†etc. This worked in 2000. Why?”.

We need to first understand that that Import/Export Wizard just sit on top if DTS/SSIS. In DTS, you had three choices in the Wizard:

  1. Copy data from one of more tables or views
  2. Write a query to specify the data to transfer
  3. Copy objects and data between SQL Server databases

Now, read above and think of the actual words, they are very descriptive. The thing is that in 2005 (or 2008), option 3 isn’t available in the wizard anymore. So what many users end up doing is to use option 1 above. But this was never designed to carry over indexes, keys, triggers etc. This is designed to be generic and work with (more or less) and type of data source (SQL Server, Oracle, DB2 etc).

So the question is how we get the “Copy objects and data between SQL Server databases” functionality in 2005 or 2008? We create an SSIS package in BIDS and select the task type “Transfer SQL Server Objects Task”. (For you long-timers: this is the old “Object Transfer” functionality which has existed since the version 4 days, and been exposed in various places in the tools over the years.)

I just now played with this a bit and you probably need to work it a bit to get it do exactly what you want. I didn’t get it to work correctly for the Adventureworks database since it thought that the Emplyee table exists in the dbo schema (judgning by error message and Profiler trace). This was even though the GUI clearly show the proper schema. When I selected pubs as source, it worked better (after changing some options).

However, the point of this post is not to troubleshoot possible bugs or quirks in the “Transfer SQL Server Objects Task” task. Now is morning after Swedish midsummer and I’m too tired for that ūüôā . The point is where to find the functionality that copies not only raw table definition and data – but also indexes, key, triggers etc.

Did we have recent autogrow?

I just read the question how to find out when autogrow in tempdb happened. And again I want to push for the default trace. The simple answer is that you already have this information available, just go and get it! (Unless you explicitly disabled the default trace, of course…)

Adjust the file name for your most recent default trace file in below query:

SELECT te.name, t.DatabaseName, t.FileName, t.StartTime, t.ApplicatioNname
FROM fn_trace_gettable('C:\Program Files\Microsoft SQL Server\MSSQL.3\MSSQL\LOG\log_331.trc', NULL) AS t
INNER JOIN sys.trace_events AS te ON t.EventClass = te.trace_event_id
WHERE te.name LIKE '%Auto Grow'
ORDER BY StartTime ASC

Endpoints, Netlibs, IPC and stuff…

Its been a while since my last post. No special reason, just a combination of lot of work and I didn’t feel I had something pressing to say…

This topic is basically on how the client app communicates with SQL Server. Not the API level (like ADO or ODBC), or the packet level (TDS), but in between. Basicaly we’re talking IPC, Inter Process Communication – in a SQL Server context:

Here’s how I understand it (I probably gonna get some points wrong and you are all welcome to correct me):

There are network protocols, such as:

  • TCP/IP (has routing functionality of course)
  • NetBEUI (very limited, if any, routing functionality)
  • IPX (the original protocol for Novell networks)
  • SNA (mainly used in IBM mainframe and such environments)

A network protocol is of little use if we can’t send data back and fort between application over that network protocol. So, there are APIs to facilitate IPC:

  • NETBIOS (originally developed for NetBEUI, but is also supported over IP (requires WINS or LMHOST for name resolution))
  • Sockets (not available for NetBEUI AFAIK, only TCP/IP)
  • Named Pipes (built on top of NETBIOS)
  • RPC (implemented and available over both NetBEUI and IP)
  • SPX (as I understand it, the API over IPX)
  • APPC (program-to-program protocol over SNA)

When MS released “their” SQL Server, they needed a way for the client app to communicate to the server. They decided to go for Named Pipes and developed what we call “netlib”. I.e., the MS deveopers used the Named Pipes API (which is similar to reading and writing to a file from the programmers perspective) when developing the Named Pipes netlib.

Over time, new netlibs were developed, where in SQL Server 2000, this culminated in below list:

  • Shared Memory (only for local connections, obviously) 
  • Named Pipes
  • Sockets
  • RPC
  • VIA
  • SPX

There was never a netlib deveoped directly on top of NETBIOS, but indirectly through Named Pipes. Named Pipes uses NETBIOS, which available over IP, and hence is routable. In 2005, the list has shrunk to:

  • Shared Memory 
  • Named Pipes
  • Sockets
  • VIA

And I have a feeling that in the end Named Pipes will go away. I don’t have any experience with VIA, but I believe that it is closer to the metal than Sockets so it might stick around for dedicated AppServer-to-SqlServer networks.

So, what does above have to do with Endpoints? Well, MS are categorizing netlibs as endpoint nowadays. This makes sense since the netlibs are a “way in” to SQL Server, as are HTTP, Service Broker and Database Mirroring endpoints.

Finally got Kalen’s new Query Tuning book…

I know this book has been out for a little while now, but I didn’t get it until now. This one I’ve been looking forward to for a long time. A quick look in the book is very promising (as expected).

The full name of the book, btw, is “Inside Microsoft SQL Server 2005 Query Tuning and Optimization” from MS press. The title says it all, I guess. ūüôā

Non-trusted constraints and performance

(See my part 1 article about non-trusted constraints in general: http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints.aspx)

The optimizer is a pretty smart little animal. It can even use constraints to eliminate some data access or part of a query. That assumes however that the constraint is trusted. For example, for a check constraint:

USE Adventureworks
SET STATISTICS IO ON
--Also check the execution plan
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Note that the table isn’t even accessed for the second query. SQL Server know that there can be no rows in the table where UnitPrice > 0 because there is a constraint:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name, definition
FROM sys.check_constraints
WHERE parent_object_id = OBJECT_ID('sales.salesorderdetail')
ORDER BY table_name

Notice the constraint CK_SalesOrderDetail_UnitPrice with the condition ([UnitPrice]>=(0.00)). But what if the constraint isn’t trusted?

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Now we do have data access. And even if we enable the constraint, it will be non-trusted and SQL Server cannot know for sure that no row violates the condition:

ALTER TABLE Sales.SalesOrderDetail CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

We need to enable the constraint WITH CHECK to make sure it is trusted:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Performance can also be improved with foreign key constraints. For example, we have a foreign key which states that we cannot have an order detail row for an order which doesn’t exist in the orders table. Now, consider below (check out both I/O and execution plan):

SELECT sd.SalesOrderID, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID
WHERE sd.OrderQty > 20

We didn’t return any columns from the SalesOrderHeader table, and since SQL Server know that each row in the SalesOrderDetail table has a corresponding row in the SalesOrderHeader table, there’s no need to access the SalesOrderHeader table at all. But if the constraint isn’t trusted:

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT sd.SalesOrderID, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID
WHERE sd.OrderQty > 20

Notice that now the SalesOrderTable *is* accessed, because the constraint is disabled. As, as with above, it isn’t enough to enable it, we need to enable it WITH CHECK to make it trusted again. OK, we might say that why even bother with the join if we only need columns from the referencing table in the first place. But consider this view:

CREATE VIEW myView AS
SELECT 
sd.SalesOrderID, s.RevisionNumber, s.DueDate, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID

The view accesses columns from both tables. But it is quite possible that we have users of this view who are only intered in columns from the SalesOrderDetail table:

SELECT SalesOrderID, CarrierTrackingNumber
FROM myView

If the constraint is not trusted, then both tables are accessed (with the join operation, of course). But if we make sure that the constraint is trusted, then only the SalesOrderDetail table is accessed and no join is performed:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT SalesOrderID, CarrierTrackingNumber
FROM myView

So, bottom line is that you should be careful with non-trusted constraints. Obviously because you might have data in the table which violates the constraint, but also for performance reasons. It is easy to check whether you have any non-trusted constraints in the database:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.check_constraints
WHERE is_not_trusted = 1
UNION ALL
SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.foreign_keys
WHERE is_not_trusted = 1
ORDER BY table_name

I should add that to me, the purpose of constraint to make sure I have valid data. The performance aspect is just a bonus. But I admitt that I’m pretty impressed by the optimizer in some cases. ūüôā

 

Non-trusted constraints

A discussion in the newsgroups before the holidays lead to trusted constraints and performance. This inspired me to blog about it, but I decided to have some vacation first :-). Instead of having one long article, I’ll do a two-part. This one is about non-trusted constraints in general, and another one will follow about non-trusted constraints and performance.

As you might know, we can disable check and foreign key constraints. This can be done when we create the constraint, for instance:

USE tempdb
CREATE TABLE t1(c1 int)
INSERT INTO t1(c1) VALUES(-1)
GO
ALTER TABLE t1 WITH NOCHECK ADD CONSTRAINT CK_t1_c1 CHECK(c1 > 0)

Above allow us to add the constraint even though we have rows that violates the constraint. The constraint isn’t disabled, but we (can) have rows in the table that violates the constraint – the constraint isn’t trusted. We can also disable an existing constraint:

USE tempdb
CREATE TABLE t2(c1 INT CONSTRAINT CK_t2_c1 CHECK(c1 > 0) )
INSERT INTO t2(c1) VALUES(1)
GO
ALTER TABLE t2 NOCHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1) VALUES(-1)

Again, we now have rows in the table that violates the constraint. For the first example, the constraint is enabled, but we didn’t check for existing rows when we added the constraint. If we try to add a row which violates the constraint, we get an error message. For the second example, the constraint isn’t even enabled. We can enable a disabled constraint:

ALTER TABLE t2 CHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1) VALUES(-1)

The immediate above INSERT command will fail with an error message. The constraint in table t2 is now enabled. But the constraint for both table t1 and table t2 are non-trusted. For table t1, we added the constraint with existing data, and told SQL Server to not check existing data. SQL Server cannot trust this constraint. For table t2, we disabled the constraint, added data, then enabled the constraint. SQL Server cannot trust the constraint because we might have added data which violates the constraint while the constraint was disabled. There’s an easy way to check whether you have non-tructed constraints. For instance, for check constraints:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.check_constraints
WHERE is_not_trusted = 1

Now, can we turn a non-trusted constraint into a trusted constraint? Yes. But we first need to get rid of offending data:

DELETE FROM t1 WHERE c1 < 0
DELETE FROM t2 WHERE c1 < 0

And now we want to make sure the constraints are trusted:

ALTER TABLE t1 WITH CHECK CHECK CONSTRAINT CK_t1_c1
ALTER TABLE t2 WITH CHECK CHECK CONSTRAINT CK_t2_c1

There’s no typo above. “WITH CHECK” is validate the data, and “CHECK CONSTRAINT” is to enable the constraint.

So, why would we want to bother with disabling and non-trusted constraints? The purpose of constraints is to make sure that we have consistent data. Why would we want to break this in the first place? Well, rarely. But here are a coule of possible scenarios where one could consider disabling constraints:

  • We run some batch operation once a week. With constraint enabled, this takes 5 hours. With constraints disabled, it takes 1 hour. We “know” that the batch is written in a way so that it doesn’t violate any of our constraints.
  • We want to expand the domain of allowable values for a column. Today we allow values Y and N. We will also allow value U for the column. This is implemented as a check constraint. We remove the existing constraint and add the new one (which also allow for U). We know that no of the existing rows cannot violate the new constraint since we expand the domain of allowable values. Adding the new constraint with NOCHECK is much faster.

Above examples might seem a bit … constructed. I haven’t encountered much non-trusted constraints in reality, but it has happended. And my initial goal was to talk about non-trusted constraints and performance, and this will come in the following blog post.