Execution plan re-use, sp_executesql and TSQL variables

Let me start by saying that the contents of this post is not very advanced. If you have read the excellent paper “Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005”, https://technet.microsoft.com/en-us/library/cc966425.aspx  and understood it, you would already know below, and much more…

I was reading a thread in the open newsgroups today (.tools, posted by Mike), about an advice that the SQL Server 2005 Database Engine Tuning Advisor apparently gave. It advices to replace sp_executesql usage with declaring TSQL variables and using those in the WHERE clause. Translated to the Adventureworks database, it advices that instead of 1) below, we should use 2) below.

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
FROM Person.Person
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'

SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
FROM Person.Person
WHERE LastName @P1

Now, I could not reproduce this (make DTA give me the same advice, to replace 1) with 2) ). Perhaps I misunderstood the poster in the group, it is because I’m running SQL Server 2008 DTA and engine, I’m not looking in the right place for that advice, my data isn’t representative, I’m running the DTA with some other settings, etc.. But say that DTA does indeed give such an advice, would would it do that? To be honest, I don’t know. It can hardly have enough information to determine whether 1) or 2) is the best choice.

In short: Say we have an index on the LastName column and the name we look for can either be a very common name, like perhaps “Smith”, or a not so common name, like “Karaszi”. For the more common name, a table scan might be the best choice, where for the not-so-common name, an index seek it probably best thing. OK, a very simple example, but serves well for this discussion.

Back to the difference between 1) and 2). There are potentially very important differences between the two:

For 1), SQL Server will determine a plan based on the contents of the parameter when the plan is created. I.e., it can determine selectivity based on that and determine whether it is a good idea to use an index or not. The plan is then cached and can be re-used. But what if we for the first execution pass in something which is very selective, but typically we are not very selective? Or the other way around? I,e, the plan for one case might not be optimal for some other case. This is where we have new optimizer hints in 2005 like OPTIMIZE FOR and the RECOMPILE hints. I will not go into details about these here, since I’m focusing on the differences between 1) and 2). See the white paper I mentioned, and of course Books Online, for more details.

For 2), SQL Server (or rather: the optimizer) has no knowledge of the contents of the variable when the plan is produced. So it cannot use the statistics to determine selectivity. In above case, it instead uses density (stored with the statistics, assuming such exists for the column). Density is basically the 1/ number of unique values for the column(s). This might be a good representation for a typical lastname, but perhaps not too good for a very common or a very uncommon lastname. Now, in 2005, we have hints for these situations as well (RECOMPILE), but again, that is not the point here.

In order for DTA to give the best advice here, it would have to know about the distribution over the data for that column and also have many executions of that query to see if “all” executions are using a typical value (sp_executesql might be better) or if the stored density value is a good representation for “all” queries that are passed from the clients. I very much doubt that DTA has this level of sophistication. Basically, I don’t know why it advices this. There might be other aspects, like “avoid dynamic SQL” (which whether that holds for this case we could argue in another post), but DTA is about performance, not best practices.

Bottom line is that these things are not simple and we should be very cautious with “rules of thumbs”.

Here’s some TSQL code to demonstrate the differences between 1) and 2). As always, only execute after you read and understood the code!

--Create a copy of the person table
--We will have lots of "Diaz" and very few "Gimmi"
FROM Person.Person

CREATE INDEX ON dbo.p(LastName)

--Create lots of Diaz
SELECT  BusinessEntityID 30000PersonTypeNameStyleTitleFirstNameMiddleNameN'Diaz'SuffixEmailPromotionAdditionalContactInfoDemographicsrowguidModifiedDate

--Make sure we have up-to-date statistics

--Verify execution plan and I/O cost
--for table scan with low selectivity
--and index seek with high selectivity

--20183 rows, table scan, 7612 pages

--1 row, index seek, 3 pages

--sp_execute alternative

--Table scan will be used for both because of execution plan re-use
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, table scan, 7612 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, table scan, 7612 pages

--Other way around
--Index search will be used for both because of execution plan re-use
EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Gimmi'
--1 row, index seek, 3 pages

EXEC sp_executesql N'SELECT FirstName, LastName, PersonType, Title
WHERE LastName = @P1',
N'@P1 nvarchar(50)''Diaz'
--20183 rows, index seek, 20291 pages

--Alternative using variable
SET @P1 'Diaz'
SELECT FirstNameLastNamePersonTypeTitle
WHERE LastName @P1
--20183 rows, index seek, 20291 pages

SET @P1 'Gimmi'
SELECT FirstNameLastNamePersonTypeTitle
WHERE LastName @P1
--1 rows, index seek, 1 pages

--Same plan even though very different selectivity
--and emptying plan cache in between

--Estimated 33 rows for both above.
--See if that is drawn from statistics density?

--Formula for density: 1/#OfUniqueValuesInColumn

--Does that match density in index statistics?

--How many rows in the table?

--So how many rows would we estimate based on density?
SELECT 0.00082918739 39944
--Yep, 33 rows.

--I.e, for the variable alternative, SQL Server has no
--knowledge of contents of those variables so it must use density instead.

--Clean up:

Rebuilding msdb on SQL Server 2008

Because of the problems I had removing Data Collector I decided to rebuild msdb. You probably heard about instmsdb.sql, but it was a long time since I actually used it. I asked about rebuilding in the MVP group and Dan Jones (MS) pointed me to a Blog post from Paul Randal on how to do this on SQL Server 2005. Here’s Paul’s blog post:


Since above is for SQL Server 2005 I realized that it might not work smoothly on 2008. And It didn’t. Below are some of the things I discovered (also posted as a comment on Paul’s blog). Read below in light of Paul’s blog. I should also say that nowhere does Paul states that his instructions work on 2008. It was me taking a chance. 🙂

You need to add startup parameter -s <instancename> if it is a named instance. Now, this I knew, but for the sake of other potential readers…
I initially started the instance from the Windows services applet by adding -T3608. That didn’t allow for detaching msdb. So I started from an OS command prompt and also added -c. This allowed me to detach msdb.
I now ran instmsdb, but that produced a number of errors. Here are a few comments about some of them:
* Complaints on xp_cmdshell. I did try enabling this first and then ran instmsdb again but same result.
* Bunch of errors when creating various Data Collector objects. This wasn’t good, because cleaning up DC was the reason to rebuild msdb in the frist place.
* 3 errors about sp_configure and -1 wasn’t allowed value (two for Agent Xps and one for xp_cmdshell).
Just for the sake of trying, I now tried to connect to the instance using SSMS Object Explorer. But I now got some error regarding Agent Xp’s when connecting. I tried to explicitly enabling Agent XP’s using sp_configure but same error. When connected there’s no node in Objects Explorer for Agent.
I took this as an indication that Agent isn’t healthy. Whether it was me doing something fishy or it isn’t as easy as just running insmsdb.sql for SQL Server 2008 – I don’t know. But I’m in for a rebuild of system databases. This isn’t that bad since it is a just a test machine. But these issues might serve as example why you want to follow Paul’s initial advice: always backup msdb (also on test machines).

We’ve come a long way …

For various reasons I decided that I want virtual machines with older (pre-2008) SQL Server versions available on my various machines. For me, virtualization (in this case VPC) is great:

  • I rarely use these installs, most often I just boot it and check some detail.
  • I don’t have to litter the host OS.
  • I don’t pay anything (performance) in the host OS, except for some disk. The overhead for an extra XP install is some 1.5 GB which nowadays isn’t that bad.

So I did a several copies of my XP VPC folder (I don’t do diff drives for various reasons).

And then started with installing SQL Server 2000 (I already had VPCs with 2005). I do work with 2000 now and then, but I mainly use SSMS to connect to 2000. So it was a bit of flashback to play around with EM again.

Next was 7.0. OK, 7.0 didn’t look that different from 2000…

Installing 6.5 was more fun. I had forgot for instance that SQL Server Agent was called “SQL Executive” back then. Also, Enterprise Manager was a totally different tool compared to 7.0/2000.

I decided to skip 6.0 since the 6.5 BOL is basically 6.0 BOL with added “What’s new” section. So having above 6.5 VPC for me also covers 6.0.

The most interesting part was to 4.21a for NT:

I first realized I made a mistake when copying the files from diskettes to CD – I put all the files in same directory. Setup expects some folder structure like DISK1, DISK2 etc. And since I don’t have the diskettes anymore, how to know what files go in which folder? What I ended up doing was to copy the setup files locally (a whopping 4.4 MB !) and modify SETUP.INF. Interestingly enough I did figure out how to modify the INF file successfully. Imagine doing that today – without knowing anything about the installation…

Anyhow, installation was successful and I checked out what tools we had. Hehe, this is where nostalgia kicked in. I already have a OS/2 VPC with SQL Server 1.1, but I can barely navigate that OS nowadays. And there were no GUI’s at all with SQL Server 1.x. Since I hadn’t seen SQL Server 4.x for many many years now, I think this was more fun than re-living 1.1.

What strikes you are of course the tools. Looking at the engine (using “Books Online”) you mainly see that a lot of todays functionality wasn’t there of course. But using the GUI makes it so much more apparent what was there and what wasn’t. And of course the whole feel of the GUIs were totally different.

The help file start page has some 9 buttons, for various sections like Datatypes, Expressions, LIKE and Wildcards, Transact-SQL Statements etc. No tree-like structure…

The release notes explain for instance that Extended stored procedures are a new thing and with that came SQL Mail.

What we nowadays call SQL Server Agent was called “SQL Monitor”.

The “SQL Administrator Win32” tool had some very rudimentary dialogs for looking at “devices”, “DB”, “Login” etc. There are some dialogs available from the menus like “Database Consistency Check”, “configure SQL Server”. I could not find for instance where to schedule backups with SQL Monitor…

The “SQL Object Manager Win32” tool wasn’t actually that bad. The “Objects” window list one row per object in the database and you can double-click it to “edit” it. Interesting enough I believe this is the first version where we had “Generate Script” functionality in the tools, for instance. Hehe, there’s even a GUI for assisting in creating a SELECT statement with rows allowing you to type text for the WHERE clause, the ORDER BY clause etc.

There’s a separate tool called “SQL Transfer Manager” which functionality over the years have been exposed in various places (EM, DTS, SSIS, DMO, SMO etc).

Back to reality. Firing up SSMS 2008 I realize how much has changed… The engine has so much more functionality. Perhaps only, say, 10-15% of what we have today we also had in, say, 4.x – if even that. Not to mention things like SSAS, SSIS, RS, etc. So, even though it was fun nostalgia to fire up an old version, I really enjoy being where we are today. 🙂

Missing F8 or ctrl-N in SSMS 2008?

Short story: Turn on 2000 keyboard layout and then back to Standard layout.

Long story:

This topic has been discussed in both the MCT (MS Certified Trainer) as well as MVP groups. Also, see http://sqlblog.com/blogs/andy_leonard/archive/2008/08/08/sql-server-2008-management-studio-function-keys.aspx, including the comments.

The mystery seems to be that in some cases you do have F8 and Ctrl-N in Standard layout, where in other cases you don’t. For instance I did a check on 4 installations where one had the desired layout (with F8) and the others didn’t:

  1. VPC. XP. Clean install. No prior SQL Server stuff. No F8 or ctrl-N.
  2. My laptop, XP. I have 2000, 2005 and 2008 tools as well as 2000, 2005 and 2008 instances installed. No F8 or ctrl-N.
  3. My desktop machine, Vista. I have 2005 and also 2008 instances. I have had 2005 SSMS which was uninstalled before I installed 2008 SSMS. Here both ctrl-N and F8 work.
  4. VPC. XP. Had 2005 both tools and instance which were upgraded to 2008. No F8 or ctrl-N.

I was doing training on 2008 last week and I really needed to find my shortcut keys (I couldn’t keep stumbling after menus all the time – too slow). So I switched to what I’m familiar with: the 2000 keyboard layout. I recall thinking for myself that perhaps if I now switch back I will have the desired Standard layout (F8 and Ctrl-N). I forgot all about it until today reading a post in the MVP group from Craig Beere suggesting exactly this. To confirm, I tried this in both a virtual machine (1 above) as well as my laptop (2 above) and it worked indeed.

One thing to watch out for: There doesn’t seem to be a way to go back to Standard layout *without* F8 and Ctrl-N. For instance when you get F8 etc, you also get a different shortcut for comment code (or was it uncomment?). So you might want to think a little bit before setting to 2000 layout and back. I’m sure in the end that somebody finds a setting somewhere to control the behavior – and then we know how to switch between the two Standard alternatives…

Make sure you play with data collector on a virtual machine

I’m in a situation where I have configured the new data collector functionality for three instances. And there’s no way to undo the config performed by the wizard! It cannot be undone by the wizard, and BOL doesn’t have information on how to do this. In fact, I suspect that you in the end need to use some of the undocumented data collector procedures to get rid of this configuration (like sp_syscollector_delete_jobs).

I’m not knocking data collector per se – it seems like a great way to get a baseline going etc. But my tip is that while you are playing with it in order to understand it – do it virtually.

Lara has reported this on connect, btw: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=334180

Are inserts quicker to heap or clustered tables?

Is it quicker and/or lower overhead to insert into a heap vs. a clustered table?
I don’t know. So I decided to do a test. Some background information first:

The test was inspired from a sidebar with Gert-Jan Strik in the open newsgroups. Basically I expressed that a heap doesn’t automatically carry lower overhead… just because it is a heap. Now, heaps vs. clustered tables is a huge topic with many aspects. I will not cover anything else here except inserts into a heap vs. a table which is clustered on an ever increasing key. No other indexes. There will be no fragmentation. I do not cover searches, covering etc. Only the pure insert aspect. OK? Good!

One might think that a heap has lower overhead because it is a … heap. But hang on for a second and think about what happens when you do an insert:

SQL Server need to find where the row should go. For this it uses one or more IAM pages for the heap, and it cross references these to one or more PFS pages for the database file(s). IMO, there should be potential for a noticable overhead here. And even more, with many users hammering the same table I can imagine blocking (waits) against the PFS and possibly also IAM pages.

Clustered table:
Now, this is dead simple. SQL server navigates the clustered index tree and find where the row should go. Since this is an ever increasing index key, each row will go to the end of the table (linked list).

The result:
So what is the conclusion? I did several executions of the code at the end of this post, with some variations. Basically there was no or very little difference whith only one user. I.e., no contention to the GAM or PFS pages. This was pretty consistent for below three scenarios:

  1. Insert with subselect. I.e., this inserts lots of rows in the same statement.
  2. Insert in a loop (one insert and row per iteration), many rows in the same transaction.
  3. Insert in a loop, one row per transaction.

Now the difference between 2 and 3 is important.
With many transactions, we incur an overhead of force-log-write-at-commit *for each row*. I.e., much more overhead against the transaction log. And indeed, the timings between 2 and 3 for one of my executions (10000 rows) showed that 2 took on average 650 ms where the same number for 3 was 5600 ms. This is about 9 times longer!!! Now, this was more or less expected, but another important aspect is when we have several users. With many users, we might run into blocking on the PFS and IAM pages. Also, with several users it is meaningless to do it all in one transaction since we will block and essentially single-thread the code anyhow. I.e., the only revelant measure where we run many users is the loop construction where each row is its own transaction (3).

There was indded a noticeable difference when I executed several inserts in parallell and had each insert in its own transaction (for clustered table vs. heap table).

Some numbers:
I did 4 repeated tests and calculated average execution time for inserting 10000 rows for a thread. With 6 parallel thread I had 22 seconds for a clustered table and 29 seconds for a heap table. With 10 threads I had 31 seconds for a clustered table and 42 seconds for a heap table.

I didn’t find performance difference more than a couple of percents for batch inserts, when I single threaded (only one thread pumping inserts), or when I had all inserts in the loop as one transaction.

Now, I would need lots of more time to run exchaustive tests, but my interpretation is that with many users doing inserts, there is an noticable overhead for the heap vs clustering on a increasing key.

The code:
Note that for parallell executions, I recommend starting the DoTheInserts procedure using SQLCMD, a BAT file and START. As always, read the code carefully (so you understand it) and execute at your own risk.

–Create the database etc.
–Makes files large enough so that inserts don’t causes autogrow
(NAME = ‘TestDb’, FILENAME = ‘C:\TestDb.mdf’, SIZE = 300MB, FILEGROWTH = 50MB)
(NAME = ‘TestDb_log’, FILENAME = ‘C:\TestDb_log.ldf’, SIZE = 200MB, FILEGROWTH = 100MB)
–Full recovery to avoid effect of system caused log truncation
USE TestDb

–Execution time log table
SomeId int identity
,spid int
,TableStructure varchar(10) CHECK (TableStructure IN (‘heap’, ‘clustered’))
,InsertType varchar(20) CHECK (InsertType IN(‘one statement’, ‘loop’))
,ExecutionTimeMs int

CREATE TABLE RowsToInsert(#rows int)

–Support procedures
IF OBJECT_ID(‘CreateTables’) IS NOT NULL DROP PROC CreateTables
CREATE TABLE HeapLoop(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
IF OBJECT_ID(‘ClusteredLoop’) IS NOT NULL DROP TABLE ClusteredLoop
CREATE TABLE ClusteredLoop(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
IF OBJECT_ID(‘HeapOneStatement’) IS NOT NULL DROP TABLE HeapOneStatement
CREATE TABLE HeapOneStatement(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
IF OBJECT_ID(‘ClusteredOneStatement’) IS NOT NULL DROP TABLE ClusteredOneStatement
CREATE TABLE ClusteredOneStatement(c1 int identity, c2 int DEFAULT 2, c3 datetime DEFAULT GETDATE(), c4 char(200) DEFAULT ‘g’)
CREATE CLUSTERED INDEX x ON ClusteredOneStatement(c1)

IF OBJECT_ID(‘TruncateTables’) IS NOT NULL DROP PROC TruncateTables
CREATE PROC TruncateTables AS
TRUNCATE TABLE ClusteredOneStatement


CREATE PROC iHeapLoop @rows int AS
DECLARE @i int = 1
WHILE @i <= @rows
SET @i = @i + 1

IF OBJECT_ID(‘iClusteredLoop’) IS NOT NULL DROP PROC iClusteredLoop
CREATE PROC iClusteredLoop @rows int AS
DECLARE @i int = 1
WHILE @i <= @rows
INSERT INTO ClusteredLoop (c2) VALUES(2)
SET @i = @i + 1

IF OBJECT_ID(‘iHeapOneStatement’) IS NOT NULL DROP PROC iHeapOneStatement
CREATE PROC iHeapOneStatement @rows int AS
INSERT INTO HeapOneStatement (c2)
SELECT TOP(@rows) 2 FROM syscolumns a CROSS JOIN syscolumns b

IF OBJECT_ID(‘iClusteredOneStatement’) IS NOT NULL DROP PROC iClusteredOneStatement
CREATE PROC iClusteredOneStatement @rows int AS
INSERT INTO ClusteredOneStatement (c2)
SELECT TOP(@rows) 2 FROM syscolumns a CROSS JOIN syscolumns b

–Proc to do the inserts
DECLARE @dt datetime, @NumberOfRowsToInsert int
SET @NumberOfRowsToInsert = (SELECT #rows FROM RowsToInsert)
EXEC DoBefore –Batch allocation, heap:
EXEC iHeapOneStatement @rows = @NumberOfRowsToInsert
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘heap’, ‘one statement’, DATEDIFF(ms, @dt, GETDATE()))

EXEC DoBefore –Batch allocation, clustered:
EXEC iClusteredOneStatement @rows = @NumberOfRowsToInsert
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘clustered’, ‘one statement’, DATEDIFF(ms, @dt, GETDATE()))

EXEC DoBefore –Single allocations, heap:
EXEC iHeapLoop @rows = @NumberOfRowsToInsert
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘heap’, ‘loop’, DATEDIFF(ms, @dt, GETDATE()))

EXEC DoBefore –Single allocations, clustered
EXEC iClusteredLoop @rows = @NumberOfRowsToInsert
INSERT INTO TimeLogger (spid, TableStructure, InsertType, ExecutionTimeMs)
VALUES(@@SPID, ‘clustered’, ‘loop’, DATEDIFF(ms, @dt, GETDATE()))

–Run the tests
EXEC CreateTables

–<Below can be executed over several connections>
EXEC DoTheInserts
EXEC DoTheInserts
EXEC DoTheInserts
EXEC DoTheInserts
–</Below can be executed over several connections>

–How did we do?
SELECT COUNT(*) AS NumberOfExecutions, TableStructure, InsertType, AVG(ExecutionTimeMs) AS AvgMs
GROUP BY TableStructure, InsertType
ORDER BY InsertType, TableStructure

–Verify that no fragmentation
,avg_fragmentation_in_percent AS frag
,page_count AS #pages
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, ‘DETAILED’)
WHERE OBJECT_NAME(OBJECT_ID) <> ‘TimeLogger’ AND index_level = 0


Code page backgrounder, courtesy of Erland Sommarskog

While browsing through the programming newsgroup today, I came across a post from Erland Sommarskog – a short backgrounder about code pages and collations. I’ve never seen code pages described so coherent and with so few words, so I asked Erland if I could quote his text in my blog (no, Erland doesn’t blog 🙂 ). So below quoted text is with Erland’s kind permission.

For those of you who want to know more about Erland or read some of his great deep-dive articles, check out http://www.sommarskog.se/.

“To start with, if we should be picky, there are no ASCII characters >= 128.
There are however lot of other character sets that defines this area.

Way back in the 80s vendors started to explore the area 128-255, and
about each vendor come with its character set(s). The contribution
from the IBM/Microsoft combo that ran MS-DOS was a number of code
pages, of which 437 was of their oldest. Later, they realized that
they did not support all languages in Western Europe, and they defined
CP850 which served Western Europe better.

Meanwhile, HP had Roman-8 and Digital had their DEC Multinational Character
Set. Eventually, ISO settled on composing a standard, and they worked
from DEC MCS – or DEC were smart to work from the ISO drafts, I don’t know
which. This resulted in ISO-8859 a family or originally eight 8-bit
character sets, which recently evolved into 15 sets.

By the time Microsoft divorced from IBM, they abandoned CP437 and
CP850 as the character set for Windows, and went with ISO-8859, at
least for Western Europe. Except that they added some printable
characters in the range 128-159 where Latin-1 has only control characters.
This became CodePage 1252, and CP1252 is the code page typically
used for 8-bit Windows applications on a computer installed in Western
Europe or the Americas. However, CP437/CP850 still lives on Windows
today; the command-line windows uses a so-called OEM character set which
is one of these.

If you have a Windows program that uses CP1252, and the server collation
is CP437, the client API will convert the data for you, so if you pass
for instance Ö which has character code 216 in CP1252, the byte that
gets stored in SQL Server will be another. When you retrieve data,
data will be converted in the other direction. However, since CP1252
and CP437 does not include the same characters, the conversion may
not be roundtrip. For instance, Å may not be in CP437, so an Å from
CP1252 will become A, and will remain A when you retrieve it.

<TiborComment>Here I removed a section which was only relevant for the newsgroup thread in question</TiborComment>

Finally, all collations have 255 characters for varchar, and at least
65535 characters for nvarchar.”

For those of you who want to dive deep in collations and such topics, check out http://msdn.microsoft.com/en-us/library/bb330962.aspx.

SQL Server 2008 and Visual Studio 2008

You probably already know that SQL Server 2008 RTM’d (yesterday). You need to be careful when installing SQL Server 2008 if you also have Visual Studio 2008 installed. It all has to do with the version of the framework that each product requires. Denis already blogged about it here: http://sqlblog.com/blogs/denis_gobo/archive/2008/08/07/8261.aspx.

I just found out that the SQL Server 2008 release notes has been updated with some extra information about this. Check it out at:


Read the section below the “Before you install” title. It is worth the 3 minutes it take to read it!!!

Analogy between SQL Server and operating systems

With SQL Server 2008 released, I was thinking back of earlier versions of SQL Server. And I decided to compare them to the MS operating systems. Not a point-in-time comparsion, like “SQL Server version x was released year a, which was the same year that OS y was released.”. I’m thinking more of the feel you have for the product. Why would anyone want to do that? I don’t know – for fun, perhaps? While writing below I realized that the comparsions/analogies worked better the older the product is. Perhaps a product need to be obsolete for us to have the sentimental feeling required for this type of comparsion? Anyhow, here goes:

SQL Server 1.x <-> DOS
(I do know it ran on OS/2, but again this is more about how you feel for the product.)
I know, perhaps not all fair, but think about it. We are talking about command-line environments, or at the best some full-screen character based applications (like edit.exe or saf.exe). And installation was floppy based where the product did fit on a couple of floppies.

SQL Server 4.x <-> OS/2 or Windows pre-95
I can’t decide here.
OS/2 had the merit that it wasn’t a bad OS, but almost no apps were developed for it (think back to version 1.2 and 1.3 and what it was at the time – and what it could have been), and it wasn’t a fun environment to work in. Windows pre-95 had the merit of being a GUI which, sort of, brought multitasking to the desktop – but what about robustness?
Same goes for SQL server 4.x. It was revolutionary in some sense, like: Imagine fitting a real RDBMS in a PC? Now smaller businesses can start using “real” RDBMSs. But OTOH, it was very unpolished. Remember the GUI tools? They were really Windows apps where some conversion tool converted them for OS/2.
So, I think it is a draw between OS/2 and Windows pre-95.

SQL Server 6.x <-> Windows NT 3.x or Windows 9x
Again, I can’t decide.
In one way, SQL Server 6.x was MS first “own” release. But OTOH, the Sybase code base was still there. MS mainly did tool stuff, along with some engine stuff (like ANSI SQL compliance). But it wasn’t a re-write of the engine.
This can compare to Windows 9x – the DOS heritage was still there, in some sense.
If you compare SQL Server 6.x to Windows NT 3.x you can also see similarities. NT 3.x was the first versions of the new revolutionary OS from MS. But it still looked like … old Windows – something you might compare with SQL Server 6.x enterprise Manager.

SQL Server 7.0 <-> NT 4
I was originally going to put Windows 2000 here, but after thinking a while, I decide for NT 4.
7.0 was the first version of the new architecture. A lot happened, where the engine was all re-written. New stuff was introduced (Profiler, DTS, Olap server). So, at the engine level, we basically got a more modern look-and-feel.
To some extent NT 4 was similar. You got a new GUI (adopted from Windows 9x). The revolution was that you now had an *stable* OS which you also could run as your desktop OS. I bet that many of you (computer nerds)/readers preferred NT 4 instead of Windows 9x at that time. I did. There were some architectural news in the OS as well, like the device driver model (some stuff were moved to kernel mode – if my memory serves me).

SQL Server 2000 <-> Windows 2000
Seems too easy, but think about it.
SQL Server 2000 was when the new architecture matured. IMO, a great release at that time. OK, some would argue that it didn’t happened that much between 7.0 and 2000, but maturing and polish of the new architecture is a major thing to me.
Windows 2000 can also be seen as becoming mature – ready to be used in masses. OK, there were some revolutionary new stuff like AD, but you can’t expect the analogy to fit 100%. 😉

SQL Server 2005 <-> Vista
Hmm, is my analogy breaking down here?
I was originally going to put Windows 2003 here. But that was a bit too much going chronologically hand-in-hand.
And I think that XP is a bit unfair (perhaps XP would be a better fit for SQL Server 2000?).
But 2005 did have lots and lots of changes and new features. And so did Vista. Vista has a rather slow adaption rate, and I have the same feeling for SQL Server 2005. Many people seems to wait for Vista+, a perhaps more cleaned-up OS? And some seem to be waiting for SQL Server 2008, even though perhaps not for the same reasons.

SQL Server 2008 <-> Vista +
This was unavoidable, considering how we got here. I won’t dwell into this, since it is too early to say how we feel about these releases in 10 years from now… 

Now why on earth did I write this post? Well, I have been doing some 6 full installations and some 12 database engine installations of SQL Server 2008 the last two days – so I’ve had a lot of time on my hands. 🙂


Fed up with hunting physical index details?

I am. I find myself endlessly hunting for index information when working against the various SQL Servers I come in contact with. And, sure, the information is there. You just need to go and get it. This generally means that I start with sp_helpindex. Then some SELECT from sys.indexes. Then some more against sys.partitions and sys.allocation units (we want space usage stats as well). And perhaps general usage stats (sys.dm_index_usage_stats). (I sometimes might even use the GUI (SSMS) reports and index dialog – but you might already know that I’m not much of a GUI person.)

The good news with all this is that I learn to use these catalog and dynamic management views. Bad news is that it is kind of … boring to do the same thing again and again.

This is why wrote sp_indexinfo. You might have your own index information procedures (which you wrote yourself or found on the Internet). If not, you are welcome to use this one. I aim to improve it over time, so suggestions are welcome. Possible improvements include:

  1. Make it a function. Functions are nice since we can order the results, aggregate, and basically do whatever we want to when we SELECT from the function. But for this I need to find out how we install a user-defined global system function – there’s no supported way to do this. I’m not sure I want to go there…
  2. Reurn missing index information as well. For this we probably want two resultsets, and only return missing index information when we targeted *one* table (no wildcards). If we do this, then function is out since a function can only return *one* result set.

If you care to give it a spin, please let me know. I just wrote the procedure, so I haven’t tested it much yet. If you do find bugs, please leave a comment and I will incorporate into the source (let me know if you want to be acknowledged). Any comments are welcome.

You find the proc at: http://karaszi.com/spindexinfo-enhanced-index-information-procedure