Why can’t we have column alias in ORDER BY?

You probably think something like “What! Of course we can!”. And so did I when I read the title for a post about this in a forum. Let us first establish that you can have a column alias in ORDER BY:

[sql]SELECT
p.ProductID AS pid
,p.ListPrice AS pr
FROM Production.Product AS p
ORDER BY pr[/sql]

The problem was that a table name was also in the ORDER BY clause. And this is when it becomes weird. The problem isn’t the column alias, it is the table alias. Basically the desire is to write something like:

SELECT
p.ProductID AS pid
,p.ListPrice AS pr
FROM Production.Product AS p
ORDER BY p.pr

Above was apparently allowed in SQL Server 2005 2000 (I never did it myself, however). (Update: Erland Sommarskog caught my error, this was allowed in 2000, not 2005.) In order to understand why above is backwards, we need to consider the logical query flow. The first part of a query which is executed is the FROM clause (logically, the optimizer can do as it pleases as long as we get the same result). In above example, I renamed the table (for the rest of the query) from “Production.Product” to “p”. That means that we cannot refer to “Product” or “Production.Product” anymore in the query. However, SQL has imperfections and the topic of this blog is an attempt from MS to rectify such earlier imperfections (some of them), i.e., improving the product.

Continuing the logical query flow (WHERE, GROUP BY, HAVING), we towards the end find the SELECT clause. Here is where you can rename the columns (a.k.a. column alias) for the rest of the query. After the SELECT list, it is meaningless to talk about different tables (where the data originally comes from), all we have is the result of the query (so far). This is why, we can in the ORDER BY refer to a column alias (generally); ORDER BY comes logically after the SELECT list.

However, the original problem was that one also wanted to include a table alias, which shouldn’t be allowed. So, why is it allowed, in the first place – consider it makes no sense from a logical standpoint? For that we need to look as somethining weird, which is allowed (both in T-SQL and ANSI SQL) and that can be somewhat useful:

SELECT
ProductID AS pid
,ListPrice AS pr
FROM Production.Product
ORDER BY ModifiedDate

We don’t need to return something to the client app in order to sort over it. Sure, that can be useful, but from a logical perspective, this is spooky. We sort over something which doesn’t exist! So, the imperfections of the SQL language is the reason things starts to break down. Basically, what we do is to sort over something as if we would have included it in the SELECT list. We are saying, in the ORDER BY clause something like

“Hey, dbms! Can you please pretend we included Product.ModifiedDate in the SELECT list so we can sort over it. Bur don’t return it to the client app, I don’t want to see that column.”

Where it really breaks down is when we combine table and column alias. Let me repeat the problematic query:

SELECT
p.ProductID AS pid
,p.ListPrice AS pr
FROM Production.Product AS p
ORDER BY p.pr

The problem is that it is meaningless to specify a table alias here, the column does exist in the SELECT list. The only reason we can refer to something table-something at all, is to let the DBMS pretend we included something additional from the base tables, so we could sort over it. Such as (allowed):

SELECT
ProductID AS pid
,ListPrice AS pr
FROM Production.Product AS p
ORDER BY p.ModifiedDate

One could of course argue that below would be more logical (not allowed):

SELECT
ProductID AS pid
,ListPrice AS pr
FROM Production.Product AS p
ORDER BY Product.ModifiedDate

Arguing that one form of two above is more logical than the other is pretty irrelevant, houever. Both are weird constructs, and the language is what it is. Anyone feel like checking the ANSI SQL standard which of above two (first, second or both) are allowed/preferred? No me, I’m off to grab some breakfast!

 

Leave a Reply

Your email address will not be published.