T-SQL Worst Practices – a compilation from Jacob’s session at Tech-Ed 2011 – Part 01 – Practices that affect accuracy


Last month, all the techies in India had a great time at the Tech-Ed 2011 (India). Out of the many sessions being conducted in parallel, one of the T-SQL programming sessions, there was one session from the renowned SQL Server MVP – Jacob Sebastian, which I found to be very interesting and different.

What have I done? T-SQL Worst Practices

There are a lot of reading materials and classes that talk about and discuss T-SQL Best Practices. But there are very few who talk about T-SQL worst practices. That’s because it takes experience and a lot of deep understanding and intimacy with the product in hand to come up with such a list. This session was one of them, and that’s precisely why my it grabbed my attention.

In this session, Jacob demonstrated some of the most common T-SQL worst practices that one can make (besides of course not clapping when one likes a session). All of these are not major issues that would jump right out at you, but are things that appear to be absolutely normal and harmless in the development and QA environments, but turn into performance bottlenecks or may end up returning incorrect results when deployed into production.

Today, I attempt to summarize these worst practices into a blog post. We will look at:

  1. Practices that can affect accuracy of data returned/processed (Part 01)
  2. Practices that can adversely impact performance (Part 02)
  3. Practices that can have unpredictable execution side-effects (Part 03)

While the points covered are exactly the same as covered during the Tech-Ed, I have taken hints from Jacob’s queries and have come up with my own versions using the AdventureWorks2008R2 database supplied as a sample database for SQL 11 (“Denali”) CTP01. In cases where I am directly using Jacob’s scripts, I have mentioned it as such.

Part 01 – Practices affecting accuracy of data returned/processed

Using NOT IN is not really the solution

Manager: I need to find all the Sales staff who do not fall within a given set of territory IDs.

You: That’s easy. I will write up a simple T-SQL query to get you the results in a moment.

USE AdventureWorks2008R2
GO
SELECT * FROM Sales.SalesPerson WHERE TerritoryID NOT IN (1,2,3,4,5)

Manager: Is this correct? It does not appear to be so.

USE AdventureWorks2008R2
GO
--17 records
SELECT COUNT(*) FROM Sales.SalesPerson
--8 records
SELECT COUNT(*) FROM Sales.SalesPerson WHERE TerritoryID IN (1,2,3,4,5)
--6 records, instead of the expected 17-8 = 9 records!
SELECT COUNT(*) FROM Sales.SalesPerson WHERE TerritoryID NOT IN (1,2,3,4,5)

From the above calculations, we see that about records are missing from the results set. These records are such which do not have any Territory ID allocated!

You: What went wrong? The query should have worked….

Here’s what happened…

Root Cause: NULL values are special. In the world of electronics, we are all familiar with NULL as the “tri-state” – ON, OFF and a third-state mysterious state. NULL technically means undefined, and hence cannot take part in comparison operations, by default.

When we used the NOT IN operator, we were asking SQL Server to carry out a comparison which it cannot do for records with NULL values and hence, these records were ignored.

Solution 01: You can modify this behaviour of SQL Server by setting ANSI_NULLS to OFF. However, this is not recommended because:

  1. Setting ANSI_NULLS to OFF causes the comparison operators in SQL Server to stop following the ISO standards
  2. ANSI_NULLS will always be ON in a future release of SQL Server (refer Books On Line)

Solution 02: Alternatively, you can also use the NOT EXISTS clause to fetch the correct results:

USE AdventureWorks2008R2
GO
-- Solution #01 - Set ANSI_NULLS to OFF. 
/****************************************
This is not a recommended solution
****************************************/
SET ANSI_NULLS OFF
SELECT * FROM Sales.SalesPerson WHERE TerritoryID NOT IN (1,2,3,4,5)
-- Because setting ANSI_NULLS to OFF is not recommended, revert back
SET ANSI_NULLS ON

-- Solution #02 - Using the NOT EXISTS Clause
SELECT * FROM Sales.SalesPerson sp1 WHERE NOT EXISTS (SELECT *
                                                      FROM Sales.SalesPerson sp2
                                                      WHERE sp1.TerritoryID = sp2.TerritoryID AND sp2.TerritoryID 
                                                      IN (1,2,3,4,5))

COUNT(*) and LEFT v/s RIGHT joins

Assume the following scenario wherein we have a list of Sales Personnel in one table and the orders booked by them in another table. The scripts and data is supplied by Jacob, because such data does not exist in the AdventureWorks2008R2 database.

/*-----------------------------------------------------------------------------
  Date			: 23 March 2011
  SQL Version           : SQL Server 2005/2008/Denali
  Author		: Jacob Sebastian
  Email			: jacob@beyondrelational.com
  Twitter		: @jacobsebastian
  Blog			: http://beyondrelational.com/blogs/jacob
  Website		: http://beyondrelational.com

  Summary:
  This script is part of the demo presented by Jacob Sebastian during 
  Tech-ED India 2011 on "TSQL Worst Practices".

  Disclaimer:  
  THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
  ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
  TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
  PARTICULAR PURPOSE.
-----------------------------------------------------------------------------*/
USE teched 
GO

IF OBJECT_ID('salesrep','U') IS NOT NULL DROP TABLE salesrep
GO

CREATE TABLE salesrep (id int, name varchar(20))
GO

INSERT INTO salesrep(id, name) SELECT 1, 'Jacob'
INSERT INTO salesrep(id, name) SELECT 2, 'Pinal'
INSERT INTO salesrep(id, name) SELECT 3, 'Mike'
INSERT INTO salesrep(id, name) SELECT 4, 'Steve'

IF OBJECT_ID('orders', 'U') IS NOT NULL DROP TABLE orders 
GO

CREATE TABLE orders (salesrep INT, OrderID INT, Amount MONEY)
INSERT INTO orders (salesrep, OrderID, Amount) select 1, 1, 100
INSERT INTO orders (salesrep, OrderID, Amount) select 1, 2, 100
INSERT INTO orders (salesrep, OrderID, Amount) select 2, 3, 100
INSERT INTO orders (salesrep, OrderID, Amount) select 2, 4, 100
INSERT INTO orders (salesrep, OrderID, Amount) select 2, 5, 100

SELECT * FROM salesrep
SELECT * FROM orders 

Manager: I want a list of all Sales Personnel with the number of orders booked by them.

You: Sure. I will have the results ready for your review in 5 minutes

SELECT s.name,
       COUNT(*) AS OrderCount
FROM salesrep s
LEFT JOIN orders o ON o.salesrep = s.id
GROUP BY s.name  

image

You (thinking): The results don’t look right. What’s going on?

Root Cause: Again, it is the NULL values that bother us. COUNT (*) causes an aggregation of all columns in the result set. Looking at Books On Line (COUNT aggregate function), we can understand that except for COUNT, all aggregate functions ignore NULL values. COUNT is the black sheep in the sense that by default it will count a NULL and all duplicate values as a 1.

Solution(s): There are a minimum of two solutions to this mystery.

  1. The first of the solutions is to use the INNER JOIN instead of the LEFT JOIN. However, this might not be what the manager wanted (remember that the manager had asked for a list of all sales personnel with the number of orders booked by them). If we use INNER JOIN, we will only get a list of sales personnel who have booked at least one order in their name
  2. The second involves using a column from the right-hand side table of the LEFT JOIN. This would be a correct query
-- Solution 01 - Technically correct, but may not be what the manager wanted
SELECT s.name,
       COUNT(*) AS OrderCount
FROM salesrep s
INNER JOIN orders o ON o.salesrep = s.id
GROUP BY s.name   

-- Solution 02 - Technically correct and what the manager wanted
SELECT s.name,
       COUNT(o.orderid) AS OrderCount
FROM salesrep s
LEFT JOIN orders o ON o.salesrep = s.id
GROUP BY s.name  

image

Summarizing – Part 01:

Finally, summarizing all the points that have been discussed today (Part 01) and during Tech-Ed 2011

  1. Setting ANSI_NULLS to OFF is bad! Any code using this dependency should be revisited
  2. Using NOT IN may give you incorrect results. You may use NOT NULL or NOT EXISTS check instead. Be careful about the possibility of NULL values when using NOT IN
  3. COUNT(*) with LEFT JOIN may produce incorrect results
  4. Use COUNT(column) from RIGHT side table instead of COUNT(*)
  5. There are always hidden requirements as in the case of the manager of the COUNT(*) example – pay attention to each and every word!

Thank-you, Jacob, for this really great session!

You can read my memories of Tech-Ed by going through the following series:

Tech Ed 2011 (India) – Day 01 – Connect, Learn, Explore, Evolve!

Tech Ed 2011 (India) – Day 02 – IE 9 – India launch – Fast is now beautiful!

Tech Ed 2011 (India) – Day 03 – For the DBAs – Contained Databases, Extended Events, Virtualization, Wait Stats, Wait Queues & High Availability

Until tomorrow,

Be courteous. Drive responsibly.

6 thoughts on “T-SQL Worst Practices – a compilation from Jacob’s session at Tech-Ed 2011 – Part 01 – Practices that affect accuracy

  1. Hima

    Thanks Nakul for the wonderful post. Actually I planned to attend Jacob’s session and I missed it due to MVP Roundtables . Your post is very clear in explaining to others who missed the event

    Like

    Reply
  2. marc_jellinek@hotmail.com

    For Part 1 ( I need to find all the Sales staff who do not fall within a given set of territory IDs. ), using a subquery is very expensive.

    An alternative solution, which will return accurate results would be:

    SELECT
    COUNT(*)
    FROM
    Sales.SalesPerson
    WHERE
    TerritoryID NOT IN (1,2,3,4,5)
    OR TerritoryID IS NULL

    Like

    Reply

Leave a reply to benkoskysa Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.