Import/Export Wizard and indexes, keys etc

Here’s one which I’ve seen much confusion about. Something like “I copy tables from database A to database B but I don’t get any indexes, keys, constraints¬†etc. This worked in 2000. Why?”.

We need to first understand that that Import/Export Wizard just sit on top if DTS/SSIS. In DTS, you had three choices in the Wizard:

  1. Copy data from one of more tables or views
  2. Write a query to specify the data to transfer
  3. Copy objects and data between SQL Server databases

Now, read above and think of the actual words, they are very descriptive. The thing is that in 2005 (or 2008), option 3 isn’t available in the wizard anymore. So what many users end up doing is to use option 1 above. But this was never designed to carry over indexes, keys, triggers etc. This is designed to be generic and work with (more or less) and type of data source (SQL Server, Oracle, DB2 etc).

So the question is how we get the “Copy objects and data between SQL Server databases” functionality in 2005 or 2008? We create an SSIS package in BIDS and select the task type “Transfer SQL Server Objects Task”. (For you long-timers: this is the old “Object Transfer” functionality which has existed since the version 4 days, and been exposed in various places in the tools over the years.)

I just now played with this a bit and you probably need to work it a bit to get it do exactly what you want. I didn’t get it to work correctly for the Adventureworks database since it thought that the Emplyee table exists in the dbo schema (judgning by error message and Profiler trace). This was even though the GUI clearly show the proper schema. When I selected pubs as source, it worked better (after changing some options).

However, the point of this post is not to troubleshoot possible bugs or quirks in the “Transfer SQL Server Objects Task” task. Now is morning after Swedish midsummer and I’m too tired for that ūüôā . The point is where to find the functionality that copies not only raw table definition and data – but also indexes, key, triggers etc.

Did we have recent autogrow?

I just read the question how to find out when autogrow in tempdb happened. And again I want to push for the default trace. The simple answer is that you already have this information available, just go and get it! (Unless you explicitly disabled the default trace, of course…)

Adjust the file name for your most recent default trace file in below query:

SELECT te.name, t.DatabaseName, t.FileName, t.StartTime, t.ApplicatioNname
FROM fn_trace_gettable('C:\Program Files\Microsoft SQL Server\MSSQL.3\MSSQL\LOG\log_331.trc', NULL) AS t
INNER JOIN sys.trace_events AS te ON t.EventClass = te.trace_event_id
WHERE te.name LIKE '%Auto Grow'
ORDER BY StartTime ASC

Endpoints, Netlibs, IPC and stuff…

Its been a while since my last post. No special reason, just a combination of lot of work and I didn’t feel I had something pressing to say…

This topic is basically on how the client app communicates with SQL Server. Not the API level (like ADO or ODBC), or the packet level (TDS), but in between. Basicaly we’re talking IPC, Inter Process Communication – in a SQL Server context:

Here’s how I understand it (I probably gonna get some points wrong and you are all welcome to correct me):

There are network protocols, such as:

  • TCP/IP (has routing functionality of course)
  • NetBEUI (very limited, if any, routing functionality)
  • IPX (the original protocol for Novell networks)
  • SNA (mainly used in IBM mainframe and such environments)

A network protocol is of little use if we can’t send data back and fort between application over that network protocol. So, there are APIs to facilitate IPC:

  • NETBIOS (originally developed for NetBEUI, but is also supported over IP (requires WINS or LMHOST for name resolution))
  • Sockets (not available for NetBEUI AFAIK, only TCP/IP)
  • Named Pipes (built on top of NETBIOS)
  • RPC (implemented and available over both NetBEUI and IP)
  • SPX (as I understand it, the API over IPX)
  • APPC (program-to-program protocol over SNA)

When MS released “their” SQL Server, they needed a way for the client app to communicate to the server. They decided to go for Named Pipes and developed what we call “netlib”. I.e., the MS deveopers used the Named Pipes API (which is similar to reading and writing to a file from the programmers perspective) when developing the Named Pipes netlib.

Over time, new netlibs were developed, where in SQL Server 2000, this culminated in below list:

  • Shared Memory (only for local connections, obviously) 
  • Named Pipes
  • Sockets
  • RPC
  • VIA
  • SPX

There was never a netlib deveoped directly on top of NETBIOS, but indirectly through Named Pipes. Named Pipes uses NETBIOS, which available over IP, and hence is routable. In 2005, the list has shrunk to:

  • Shared Memory 
  • Named Pipes
  • Sockets
  • VIA

And I have a feeling that in the end Named Pipes will go away. I don’t have any experience with VIA, but I believe that it is closer to the metal than Sockets so it might stick around for dedicated AppServer-to-SqlServer networks.

So, what does above have to do with Endpoints? Well, MS are categorizing netlibs as endpoint nowadays. This makes sense since the netlibs are a “way in” to SQL Server, as are HTTP, Service Broker and Database Mirroring endpoints.

Finally got Kalen’s new Query Tuning book…

I know this book has been out for a little while now, but I didn’t get it until now. This one I’ve been looking forward to for a long time. A quick look in the book is very promising (as expected).

The full name of the book, btw, is “Inside Microsoft SQL Server 2005 Query Tuning and Optimization” from MS press. The title says it all, I guess. ūüôā

Non-trusted constraints and performance

(See my part 1 article about non-trusted constraints in general: http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints.aspx)

The optimizer is a pretty smart little animal. It can even use constraints to eliminate some data access or part of a query. That assumes however that the constraint is trusted. For example, for a check constraint:

USE Adventureworks
SET STATISTICS IO ON
--Also check the execution plan
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Note that the table isn’t even accessed for the second query. SQL Server know that there can be no rows in the table where UnitPrice > 0 because there is a constraint:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name, definition
FROM sys.check_constraints
WHERE parent_object_id = OBJECT_ID('sales.salesorderdetail')
ORDER BY table_name

Notice the constraint CK_SalesOrderDetail_UnitPrice with the condition ([UnitPrice]>=(0.00)). But what if the constraint isn’t trusted?

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Now we do have data access. And even if we enable the constraint, it will be non-trusted and SQL Server cannot know for sure that no row violates the condition:

ALTER TABLE Sales.SalesOrderDetail CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

We need to enable the constraint WITH CHECK to make sure it is trusted:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT CK_SalesOrderDetail_UnitPrice
SELECT AVG(OrderQty) FROM Sales.SalesOrderDetail WHERE UnitPrice < 0

Performance can also be improved with foreign key constraints. For example, we have a foreign key which states that we cannot have an order detail row for an order which doesn’t exist in the orders table. Now, consider below (check out both I/O and execution plan):

SELECT sd.SalesOrderID, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID
WHERE sd.OrderQty > 20

We didn’t return any columns from the SalesOrderHeader table, and since SQL Server know that each row in the SalesOrderDetail table has a corresponding row in the SalesOrderHeader table, there’s no need to access the SalesOrderHeader table at all. But if the constraint isn’t trusted:

ALTER TABLE Sales.SalesOrderDetail NOCHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT sd.SalesOrderID, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID
WHERE sd.OrderQty > 20

Notice that now the SalesOrderTable *is* accessed, because the constraint is disabled. As, as with above, it isn’t enough to enable it, we need to enable it WITH CHECK to make it trusted again. OK, we might say that why even bother with the join if we only need columns from the referencing table in the first place. But consider this view:

CREATE VIEW myView AS
SELECT 
sd.SalesOrderID, s.RevisionNumber, s.DueDate, sd.CarrierTrackingNumber
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS sd
ON s.SalesOrderID = sd.SalesOrderID

The view accesses columns from both tables. But it is quite possible that we have users of this view who are only intered in columns from the SalesOrderDetail table:

SELECT SalesOrderID, CarrierTrackingNumber
FROM myView

If the constraint is not trusted, then both tables are accessed (with the join operation, of course). But if we make sure that the constraint is trusted, then only the SalesOrderDetail table is accessed and no join is performed:

ALTER TABLE Sales.SalesOrderDetail WITH CHECK CHECK CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID

SELECT SalesOrderID, CarrierTrackingNumber
FROM myView

So, bottom line is that you should be careful with non-trusted constraints. Obviously because you might have data in the table which violates the constraint, but also for performance reasons. It is easy to check whether you have any non-trusted constraints in the database:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.check_constraints
WHERE is_not_trusted = 1
UNION ALL
SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.foreign_keys
WHERE is_not_trusted = 1
ORDER BY table_name

I should add that to me, the purpose of constraint to make sure I have valid data. The performance aspect is just a bonus. But I admitt that I’m pretty impressed by the optimizer in some cases. ūüôā

 

Non-trusted constraints

A discussion in the newsgroups before the holidays lead to trusted constraints and performance. This inspired me to blog about it, but I decided to have some vacation first :-). Instead of having one long article, I’ll do a two-part. This one is about non-trusted constraints in general, and another one will follow about non-trusted constraints and performance.

As you might know, we can disable check and foreign key constraints. This can be done when we create the constraint, for instance:

USE tempdb
CREATE TABLE t1(c1 int)
INSERT INTO t1(c1) VALUES(-1)
GO
ALTER TABLE t1 WITH NOCHECK ADD CONSTRAINT CK_t1_c1 CHECK(c1 > 0)

Above allow us to add the constraint even though we have rows that violates the constraint. The constraint isn’t disabled, but we (can) have rows in the table that violates the constraint – the constraint isn’t trusted. We can also disable an existing constraint:

USE tempdb
CREATE TABLE t2(c1 INT CONSTRAINT CK_t2_c1 CHECK(c1 > 0) )
INSERT INTO t2(c1) VALUES(1)
GO
ALTER TABLE t2 NOCHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1) VALUES(-1)

Again, we now have rows in the table that violates the constraint. For the first example, the constraint is enabled, but we didn’t check for existing rows when we added the constraint. If we try to add a row which violates the constraint, we get an error message. For the second example, the constraint isn’t even enabled. We can enable a disabled constraint:

ALTER TABLE t2 CHECK CONSTRAINT CK_t2_c1
GO
INSERT INTO t2(c1) VALUES(-1)

The immediate above INSERT command will fail with an error message. The constraint in table t2 is now enabled. But the constraint for both table t1 and table t2 are non-trusted. For table t1, we added the constraint with existing data, and told SQL Server to not check existing data. SQL Server cannot trust this constraint. For table t2, we disabled the constraint, added data, then enabled the constraint. SQL Server cannot trust the constraint because we might have added data which violates the constraint while the constraint was disabled. There’s an easy way to check whether you have non-tructed constraints. For instance, for check constraints:

SELECT OBJECT_NAME(parent_object_id) AS table_name, name
FROM sys.check_constraints
WHERE is_not_trusted = 1

Now, can we turn a non-trusted constraint into a trusted constraint? Yes. But we first need to get rid of offending data:

DELETE FROM t1 WHERE c1 < 0
DELETE FROM t2 WHERE c1 < 0

And now we want to make sure the constraints are trusted:

ALTER TABLE t1 WITH CHECK CHECK CONSTRAINT CK_t1_c1
ALTER TABLE t2 WITH CHECK CHECK CONSTRAINT CK_t2_c1

There’s no typo above. “WITH CHECK” is validate the data, and “CHECK CONSTRAINT” is to enable the constraint.

So, why would we want to bother with disabling and non-trusted constraints? The purpose of constraints is to make sure that we have consistent data. Why would we want to break this in the first place? Well, rarely. But here are a coule of possible scenarios where one could consider disabling constraints:

  • We run some batch operation once a week. With constraint enabled, this takes 5 hours. With constraints disabled, it takes 1 hour. We “know” that the batch is written in a way so that it doesn’t violate any of our constraints.
  • We want to expand the domain of allowable values for a column. Today we allow values Y and N. We will also allow value U for the column. This is implemented as a check constraint. We remove the existing constraint and add the new one (which also allow for U). We know that no of the existing rows cannot violate the new constraint since we expand the domain of allowable values. Adding the new constraint with NOCHECK is much faster.

Above examples might seem a bit … constructed. I haven’t encountered much non-trusted constraints in reality, but it has happended. And my initial goal was to talk about non-trusted constraints and performance, and this will come in the following blog post.

Backup compression in SQL Server 2008

Having a few moments to spare, I decided to give this a spin.

Specifying for the backup to be compressed is really simple. Just specify COMPRESSION in the WITH clause of the BACKUP command. For example:

BACKUP DATABASE Adventureworks
TO DISK = 'C:\Advc.bak'
WITH INIT, COMPRESSION

For fun, I compared backup file size and backup time between compressing and not compressing. I ran below after priming the cache (not that it should matter since backup read pages from disk, see http://sqlblog.com/blogs/tibor_karaszi/archive/2007/08/03/does-backup-utilize-pages-in-cache.aspx). I also deleted the backup files, if they exist, before execution.

DECLARE @dt datetime

SET @dt = GETDATE()
BACKUP DATABASE Adventureworks
TO DISK = 'C:\Adv.bak'
WITH INIT
SELECT DATEDIFF(ms, @dt, GETDATE())

SET @dt = GETDATE()
BACKUP DATABASE Adventureworks
TO DISK = 'C:\Advc.bak'
WITH INIT, COMPRESSION
SELECT DATEDIFF(ms, @dt, GETDATE())

Size of backup file is 129 MB vs. 35 MB. Of course, compression ratio varies depending on what type of data there is in the database (string data tend to compress better than other data,. for instance). Time to execute commands were 7.5 seconds vs. 3.8 seconds. Now, this is a virtual machine on VPC and Adventureworks is a tiny database. But at least we can see clear potential for savings here.

So how do we know if a database backup was compressed? If you’ve used RESTORE HEADERONLY, you probably noticed this annoying column called “Compressed”. Well, this is the first time we see a value of “1” in that column.

It seems we cannot mix compressed and non-compressed backups on the same file, though. (Not that I often have multiple backups on the same file.) If I append a backup using COMPRESSION on a file were there already are non-compressed backup, I get an error. Or I do a backup without COMPRESSION on a file where there are compressed backups, the new backup will be compressed (even when not saying COMPRESSION). Something to look out for if you have several backups on the backup files.

So what about the RESTORE command? Well, there’s nothing to say, really. You don’t have to specify in the RESTORE command that the backup was compressed. Time for RESTORE was 10.4 vs 6.3 seconds (with the destination database already existing). I expect the difference to be bigger on real installation and realistic db size.

Owner/schema qualifying object names

It isn’t easy to remember all the details regarding object/schema qualifying object names. There are many variables involved, such as:

  • Version of SQL Server
  • Qualifying the owner when you call the proc (EXEC dbo.procname)
  • Qualifying object references inside the proc code
  • Qualifying object names if you aren’t using procedures
  • Whether the user is the same as the owner of the proc/table
  • Default schema for the user

So, I decided to give it a spin for different combinations and investigate both profiler events and also number of plans in cache. I won’t post all details here, that would be too much to write down. You can use the scripts at the end to do your own findings. I did not investigate the differences regarding compile locks. Anyhow, here are my conclusions:

  • I could not produce any recompile (SP:Recompile or SQLStmtRecompile) event for any combbination.
  • I did find SP:CacheMiss events on 2000 when you execute a proc and don’t qualify the proc name (for 2005 I always got those events). Then a subsequent SP_CacheHit will follow.
  • For straight SQL (no procedures) I noticed that each user get its separate plan¬†when you don’t owner-qualify the table name. This makes sense. An interesting aspect on 2005 was that if you specify a default schema for the user (and two users has the same default schema), then the users will share the plan (basically the default schema becomes the “owner” of the plan).

Below are the scripts I used:

--Login as sysadmin:
USE master
IF DB_ID('myTestDb') IS NOT NULL DROP DATABASE myTestDb
IF EXISTS(SELECT * FROM syslogins WHERE name = 'Kalle') EXEC sp_droplogin 'Kalle'
IF EXISTS(SELECT * FROM syslogins WHERE name = 'Olle') EXEC sp_droplogin 'Olle'
GO
EXEC sp_addlogin 'Kalle', '*hjk&6f' EXEC sp_addlogin 'Olle', '*hjk&6f'
CREATE DATABASE myTestDb
GO
USE myTestDb
EXEC sp_grantdbaccess 'Kalle' EXEC sp_grantdbaccess 'Olle'
GO
CREATE TABLE dbo.t(c1 int identity PRIMARY KEY, c2 char(30))
INSERT INTO dbo.t SELECT TOP 1000 'hello' FROM sysobjects a CROSS JOIN sysobjects b
CREATE INDEX x ON t(c1)
GO
CREATE PROC dbo.p AS SELECT c1 FROM t WHERE c1 = 34 AND c2 = 'Hello'
GO
CREATE PROC dbo.p_q AS SELECT c1 FROM dbo.t WHERE c1 = 34 AND c2 = 'Hello'
GO
GRANT EXECUTE ON dbo.p TO Kalle, Olle
GRANT EXECUTE ON dbo.p_q TO Kalle, Olle
GRANT SELECT ON t TO Kalle, Olle

--Number of plans in cache, run after executions of proc
SELECT OBJECT_NAME(objid), sql, uid, *
FROM master..syscacheobjects
WHERE dbid = DB_ID()
AND 
cacheobjtype = 'Compiled Plan'
AND sql NOT LIKE '%PSTATMAN%'

--Run this three times, logged in as sysadmin (dbo), Kalle and Olle
USE myTestDb
GO
EXEC dbo.p
GO
EXEC dbo.p_q
GO
EXEC p
GO
EXEC p_q
GO
SELECT c1 FROM t WHERE c1 = 34 AND c2 = 'Hello'
GO
SELECT c1 FROM dbo.t WHERE c1 = 34 AND c2 = 'Hello'
GO
USE master

Sorted views…

Here’s one that pops up regurarly. Something like:

“My views are no longer ordered in SQL Server 2005. I have ORDER BY in the view, but when I select from it, the rows are not returned according to the ORDER BY.”

Since I tend to post a reply often enough to above type of posts, I decided to put it in writing once and for all, so I can point to this blog post. (Laziness is a virtue ūüėČ .)

A view is by definition not sorted. A view is supposed to behave like a table (and thanks to that we have the same language to operate against views as we have to operate against tables). This is why ORDER BY is not by itself allowed in a view definition.

It is allowed to have ORDER BY if you also have TOP, but the purpose of ORDER BY is now to make sure that the correct rows are returned (5 most expensive books, for instance), not in any particular order.

At some point in time, the “workaround” to have TOP 100 PERCENT popped up. The thinking was that we now are allowed to have ORDER BY, which makes the view “sorted”. It doesn’t. However, SQL Server 2000’s optimizer wasn’t as smart as 2005’s optimizer, so for simple queries, you often found that data was returned according to your ORDER BY.

SQL Server 2005’s optimizer is smarter, and if it finds TOP 100 PERCENT and ORDER BY it realizes that both these operations doesn’t affect *which* rows to return, so both operations are removed from the execution plan. That is smart, since these doesn’t affect the data anyhow. Remeber that a view is by definition not sorted.

Some developers has relied on the 2000 behavior to create “sorted views”. This behavior was never documented, just a side effect of the execution plan, and was never guaranteed.

So what do we do? Well, same as when we run queries against a table. We have ORDER BY when we read data from the view!

Aside: The query builder which you can use when you create a view has a very strange behavior. It allow you to check a “sort” checkbox, and it will then add ORDER BY and TOP 100 PERCENT to the SELECT statement. According to above explanation, this is kind of silly, but I’m hoping that MS will tidy this up for the next version of SQL Server.