Now that SQL server 2019 is released I want to discuss one of the more interesting improvements, at least from a performance perspective. I believe that this alone can result in greatly improved performance and reduced resource usage. For some installations, I should add.
Just a couple of quick notes:
- This post is about scalar UDFs that does SELECT. A forthcoming post will be about scalar functions that doesn’t do SELECT.
- Jump towards the end to see a table with the performance differences. That is why you are here, after all! Then you can come back and read the introduction paragraphs.
For a couple of decades, “we all” have known that scalar functions are bad. Really bad. To the extent that they are verboten in many installations. Why are they so bad? For two reasons:
- The function is called once per row. Imagine if you have a SELECT in there, and the function is called 100,000 times. You have now executed that SELECT 100,000 times.
- But also the overhead of calling the function is significant. So, even if it does something very simple like adding two numbers the overhead of calling it many times can be huge.
Don’t believe me? Read on.
I don’t want to get too theoretical regarding how this work, digging into execution plans etc. But the big thing here in SQL server 2019 is that SQL server can in-line those functions for you! I won’t go into the pre-requisites for this, just check out the product documentation for those things. You will also see that this requires database compatibility level 150 (2019) and that the database option for this isn’t turned off.
I’m using the AdventureworksDW database, and two tables:
- DimProduct, 606 rows, 6 MB.
- FactResellerSalesXL, approx 11,000,000 rows. I have three versions of this.
- Row-based clustered index. 2.48 GB.
- Same as above but with page compression: 0.68 GB. I don’t use this table for my tests here.
- Same as the first with a clustered columnstore index: 0,48 GB
UDF with I/O
Let’s start with a UDF that performs I/O. It is dead-simple, it will just summarize the sales amount from the fact table for a certain product. Here’s the first version:
CREATE OR ALTER FUNCTION ProdSalesAmt(@ProductKey int)
RETURNS int
AS
BEGIN
DECLARE @ret int
SET @ret =
(
SELECT SUM(d.UnitPrice * d.OrderQuantity)
FROM dbo.FactResellerSalesXL AS d
WHERE d.ProductKey = @ProductKey
)
IF @ret IS NULL
SET @ret = 0
RETURN @ret
END
GO
You probably say that the function can be simplified. So here is such a simplified version. The idea is to get rid of all procedural code constructs.:
CREATE OR ALTER FUNCTION ProdSalesAmt_Simple(@ProductKey int)
RETURNS int
AS
BEGIN
RETURN (
SELECT ISNULL(SUM(d.UnitPrice * d.OrderQuantity), 0)
FROM dbo.FactResellerSalesXL AS d
WHERE d.ProductKey = @ProductKey
)
END
GO
Some of you know the trick to not use a scalar UDF but an Inline Table function instead. The calling query would have to use a CROSS APPLY, so the optimization isn’t transparent to the client code. Anyhow, here is such a version:
CREATE OR ALTER FUNCTION ProdSalesAmt_TVF(@ProductKey int)
RETURNS table
AS
RETURN (
SELECT ISNULL(SUM(d.UnitPrice * d.OrderQuantity), 0) AS SumSales
FROM dbo.FactResellerSalesXL AS d
WHERE d.ProductKey = @ProductKey
)
GO
I’m going to test with different compatibility levels for the database:
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 140
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150
And I’m also going to test with and without in-lining turned on.
ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = OFF
Automatic inlining requires compatibility level 150 and also that inining is turned on. I did not see any difference between turning off iniling one or the other way, so I will just report it as inlining turned on or off.
But what if we didn’t use a function in the first place? Sure, functions are a neat programming construct. But neat programming constructs and performance doesn’t necessarily go hand-in-hand – we learned the hard way. So, I will also use a simple join as well as a correlated subquery.
So here are the queries I will use:
-- Q1 Basic scalar function SELECT p.ProductKey, p.EnglishProductName, dbo.ProdSalesAmt(p.ProductKey) AS SumSales FROM DimProduct AS p ORDER BY SumSales -- Q2 Simplified scalar function SELECT p.ProductKey, p.EnglishProductName, dbo.ProdSalesAmt_Simple(p.ProductKey) AS SumSales FROM DimProduct AS p ORDER BY SumSales -- Q3 Inline TVF instead of scalar function SELECT p.ProductKey, p.EnglishProductName, s.SumSales AS SumSales FROM DimProduct AS p OUTER APPLY dbo.ProdSalesAmt_TVF(p.ProductKey) AS s ORDER BY SumSales -- Q4 Join instead of calling a function SELECT p.ProductKey, p.EnglishProductName, SUM(f.UnitPrice* f.OrderQuantity) AS SumSales FROM DimProduct AS p LEFT OUTER JOIN FactResellerSalesXL AS f ON f.ProductKey = p.ProductKey GROUP BY p.ProductKey, p.EnglishProductName ORDER BY SumSales -- Q5 Correlated subquery SELECT p.ProductKey, p.EnglishProductName, ( SELECT SUM(d.UnitPrice * d.OrderQuantity) FROM dbo.FactResellerSalesXL AS d WHERE d.ProductKey = p.ProductKey ) AS SumSales FROM DimProduct AS p ORDER BY SumSales -- Q6 Join using a table with a columnstore index SELECT p.ProductKey, p.EnglishProductName, SUM(f.UnitPrice* f.OrderQuantity) AS SumSales FROM DimProduct AS p LEFT OUTER JOIN FactResellerSalesXL_CCI AS f ON f.ProductKey = p.ProductKey GROUP BY p.ProductKey, p.EnglishProductName ORDER BY SumSales
I measured three things. The execution time as shown in SSMS. I also used an Extended Event trace to capture duration and logical reads. I will limit significant figures for duration and logical reads to 3.
Query | Description | Inlining | Time | Duration | Logical reads |
Q1 | Basic UDF | N | 3:23 | 204,000 | 200,000,000 |
Q1 | Basic UDF | Y | 0:32 | 32,100 | 35,600,000 |
Q2 | Simplified UDF | Y | 0:32 | 32,100 | 35,600,000 |
Q3 | Inline TVF | 0:32 | 33,200 | 35,600,00 | |
Q4 | Join | 0:00 | 692 | 331,000 | |
Q5 | Join corr sub | 0:00 | 708 | 331,000 | |
Q6 | Join col-store | 0:00 | 153 | 4,960 |
Conclusions
Let us focus on the time it takes, the “Time” column above. This is the end-user perspective, and of is what matters in the very end.
Without inlining, the end-user would have time to grab a cup of coffee and chat with a friend while waiting. This is the pre-2019 behavior, assuming you didn’t do any optimization.
With inlining, the user can swap to the email program and check for new mails while waiting. We had to do manual work prior to 2019 to get from three and a half minute to half a minute. This meant re-writing the function and the SQL that called the function. SQL server 2019 basically does that for us! Note that I didn’t see any difference with simplifying the scalar function to only contain one SELECT.
But we could be smarter than that. For this simple example we could either do a join or a correlated subquery. It might not be that simple in all situations, but there can still be room for query optimizations, evidently. Going from half a minute to half a second is a big thing. Now the end-user will barely notice the execution time.
And imagine if the dev/dba created a columnstore index to support the query. Now we basically have instant result.
Also consider the resource consumption. I’m thinking about the logical reads column here. I didn’t measure CPU time for this test, but I would expect that it to some extent reflect the execution time and logical reads.
I will discuss a scalar UDF that doesn’t do SELECT (I/O) in a later blog post.
Disclaimers
You can do many more combinations than what I did. For instance scalar inlining with columnstore index. Etc. My goal here was not to test all possible permutations. Please give the others a try if you want!
This was only one example. I’m sure that there are examples where inlining doesn’t work that well. And perhaps also where the benefits are even grater. My aim here was to give it a try, with an example with was simple enough and something that I have seen in the wild (similar cases, that is).