T-SQL Worst Practices – a compilation from Jacob’s session at Tech-Ed 2011 – Part 02 – Practices that affect performance


This is in continuation to yesterday’s post on T-SQL worst practices that affect the accuracy of the results returned/processed

What have I done? T-SQL Worst Practices

I am attempting to summarize all of the worst T-SQL practices that Jacob demonstrated at Tech-Ed 2011 (India). The summarization has been done after classifying these practices into 3 major buckets:

  1. Practices that can affect accuracy of data returned/processed (Part 01 – read here)
  2. Practices that can adversely impact performance (Part 02 – today’s post)
  3. Practices that can have unpredictable execution side-effects (Part 03 – coming tomorrow!)

While the points covered are exactly the same as covered during the Tech-Ed, I have taken hints from Jacob’s queries and have come up with my own versions using the AdventureWorks2008R2 database supplied as a sample database for SQL 11 (“Denali”) CTP01. In cases where I am directly using Jacob’s scripts, I have mentioned it as such.

Part 02 – Practices affecting performance

Using Incorrect data types in a query/Too much reliance on implicit conversions

When I was a little kid in school, my parents used to tell me that there is a lot more than what meets the eye. A lot of things happen in the background before anything reaches us, and sometimes we forget the importance and underestimate the impact of these background processes. Years later, the movie “Munnabhai M.B.B.S.” came along, and that demonstrated, amongst many others, the same concept – how we tend to ignore the cleaning crew who help maintain one of the most important things in a hospital – hygiene.

Anyway, coming over to SQL Server, one of the most taken-for-granted features is implicit conversion. Implicit conversion is the conversion of one data-type to another without the user knowing that such a conversion is happening. Please note that these conversions happen under some pre-defined rules of  data-type precedence. Data types of a lower precedence are implicitly converted to a higher precedence if a data-type mismatch is found.

With this background in mind, let’s see how this feature can land us in performance trouble.

Assume that we have the following table:

/*-----------------------------------------------------------------------------
  Date			: 23 March 2011
  SQL Version           : SQL Server 2005/2008/Denali
  Author		: Jacob Sebastian
  Email			: jacob@beyondrelational.com
  Twitter		: @jacobsebastian
  Blog			: http://beyondrelational.com/blogs/jacob
  Website		: http://beyondrelational.com

  Summary:
  This script is part of the demo presented by Jacob Sebastian during 
  Tech-ED India 2011 on "TSQL Worst Practices".

  Disclaimer:  
  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
-----------------------------------------------------------------------------*/

USE Teched 
GO

IF OBJECT_ID('orders', 'U') IS NOT NULL BEGIN
	DROP TABLE orders 
END
GO

CREATE TABLE orders (OrderID INT IDENTITY, OrderDate DATETIME, Amount MONEY, Refno VARCHAR(20))
INSERT INTO orders (OrderDate, Amount, Refno) 
SELECT TOP 100000
	DATEADD(minute, ABS(a.object_id % 50000 ), CAST('2010-02-01' AS DATETIME)),
	ABS(a.object_id % 10),
	CAST(ABS(a.object_id) AS VARCHAR)
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
INSERT INTO orders (OrderDate, Amount, Refno)  SELECT GETDATE(), 100, '555'

Now, let’s create a non-clustered index on the RefNo column.

CREATE NONCLUSTERED INDEX idx_refno ON orders(refno)
include(amount)

Next, let’s run through the following queries. Both these queries perform the same function, i.e. select a record whose reference number is 555.

SELECT count(*) 
FROM orders 
WHERE Refno = 555

SELECT COUNT(*) 
FROM orders 
WHERE Refno = '555'

These queries do their work right, but one of them is not advisable for production-class usage.

Here’s why: If the actual execution plan of these queries is enabled by pressing Ctrl+M or by going to Query –> Include Actual Execution plan, we see that Query #1 does an index scan instead of an index seek.

image

The reason is that in the first query, Refno, which is a VARCHAR column is being compared with an integer value. Per rules of data-type precedence (discussed earlier), VARCHAR values need to be implicitly converted to INT because INT has a higher precedence. Thus, SQL Server first has to create a temporary column with the INT representation of each of the Refno values, and then compare that with the supplied criteria.

In the second case, we are supplying SQL Server with the data in character format, and hence, no implicit conversion is required. SQL Server can therefore simply do an index seek to fetch the right results.

Now, let’s do it the other way around. Let’s explicitly convert Refno to an INT and then perform similar SELECT operations.

DROP INDEX idx_refno ON Orders

ALTER TABLE Orders ALTER COLUMN Refno INT

CREATE NONCLUSTERED INDEX idx_refno ON orders(refno)
include(amount)


SELECT count(*) 
FROM orders 
WHERE Refno = 555

SELECT COUNT(*) 
FROM orders 
WHERE Refno = '555'

Looking at the execution plan, we are in for a shocker. We would have expected that now, the second query should be the slower one, but it is not!

image

This is because the column being compared is of a higher precedence. Therefore, only the supplied condition parameter needs to be implicitly converted to INT, thus reducing the performance impact to being negligible.

Functions may not always be your friends

SQL Server has a great collection of functions, some of which are really, really useful. Data manipulation is a common operation in most systems, and the functions available at our disposal for date manipulation are very powerful. However, they might land us in a performance bottleneck if not used carefully.

First of all, let’s create a very simple, non-clustered index on the OrderDate column of the Sales.SalesOrderHeader table in the AdventureWorks2008R2 database.

USE [AdventureWorks2008R2]
GO
CREATE NONCLUSTERED INDEX [Idx_MissingIndexOrderDate]
    ON [Sales].[SalesOrderHeader] ([OrderDate])
GO

Next, let us try to find out all the sales done in the month of April, 2008. Now, there are a lot of solutions available, the most popular ones being the ones below. Which one do you think is the performance hog?

--Number of sales done in April, 2008
SELECT COUNT(*)
FROM Sales.SalesOrderHeader sh
WHERE MONTH(sh.OrderDate) = 4 AND YEAR(sh.OrderDate) = 2008

SELECT COUNT(*)
FROM Sales.SalesOrderHeader sh
WHERE sh.OrderDate >= '2008-04-01' AND sh.OrderDate <= '2008-04-30'

SELECT COUNT(*)
FROM Sales.SalesOrderHeader sh
WHERE sh.OrderDate BETWEEN '2008-04-01' AND '2008-04-30'

If you turn on the execution plan, you would notice that Query #1 takes up the most time, i.e. that is the slowest of them all.

image

The reason is the presence of the functions in the WHERE clause. Functions around columns in a WHERE clause cause table/index scans and must therefore be used judiciously.

If you use these functions in the SELECT clause, you will notice that there isn’t much performance delay. Here’s the proof:

--Number of sales done in April, 2008
SELECT MONTH(sh.OrderDate), YEAR(sh.OrderDate),sh.OrderDate
FROM Sales.SalesOrderHeader sh
WHERE MONTH(sh.OrderDate) = 4 AND YEAR(sh.OrderDate) = 2008

SELECT MONTH(sh.OrderDate), YEAR(sh.OrderDate),sh.OrderDate
FROM Sales.SalesOrderHeader sh
WHERE sh.OrderDate BETWEEN '2008-04-01' AND '2008-04-30'

image

Question to the audience: Now, let’s say that instead of just the date value (which we were dealing with in case of SalesOrderHeader.OrderDate), we have a DATETIME value to work with. What would be good solutions, according to for the same problem statement? Jacob touched upon this during the Tech-Ed, but I am eliminating that part from the post series so that it can become a great user exercise.

Summarizing – Part 02:

Finally, summarizing all the points that have been discussed today (Part 02) and during Tech-Ed 2011

  1. Incorrect use of data-types (especially in the WHERE clause) can cause performance issues because they would force TABLE/INDEX scans due to implicit conversion
  2. As far as possible, take the time out during design to ensure that you choose the proper data type. In our example, Refno column should probably never have been a VARCHAR column
  3. Functions around columns in a WHERE clause cause table/index scans and must therefore be used judiciously

Thank-you, Jacob, for this really great session!

You can read my memories of Tech-Ed by going through the following series:

Tech Ed 2011 (India) – Day 01 – Connect, Learn, Explore, Evolve!

Tech Ed 2011 (India) – Day 02 – IE 9 – India launch – Fast is now beautiful!

Tech Ed 2011 (India) – Day 03 – For the DBAs – Contained Databases, Extended Events, Virtualization, Wait Stats, Wait Queues & High Availability

Until tomorrow,

Be courteous. Drive responsibly.

Want to ask me a question? Do so at: http://beyondrelational.com/ask/nakul/default.aspx

Advertisement

Let me know what you think about this post by leaving your feedback here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.