Last month, all the techies in India had a great time at the Tech-Ed 2011 (India). Out of the many sessions being conducted in parallel, one of the T-SQL programming sessions, there was one session from the renowned SQL Server MVP – Jacob Sebastian, which I found to be very interesting and different.
What have I done? T-SQL Worst Practices
There are a lot of reading materials and classes that talk about and discuss T-SQL Best Practices. But there are very few who talk about T-SQL worst practices. That’s because it takes experience and a lot of deep understanding and intimacy with the product in hand to come up with such a list. This session was one of them, and that’s precisely why my it grabbed my attention.
In this session, Jacob demonstrated some of the most common T-SQL worst practices that one can make (besides of course not clapping when one likes a session). All of these are not major issues that would jump right out at you, but are things that appear to be absolutely normal and harmless in the development and QA environments, but turn into performance bottlenecks or may end up returning incorrect results when deployed into production.
Today, I attempt to summarize these worst practices into a blog post. We will look at:
- Practices that can affect accuracy of data returned/processed (Part 01)
- Practices that can adversely impact performance (Part 02)
- Practices that can have unpredictable execution side-effects (Part 03)
While the points covered are exactly the same as covered during the Tech-Ed, I have taken hints from Jacob’s queries and have come up with my own versions using the AdventureWorks2008R2 database supplied as a sample database for SQL 11 (“Denali”) CTP01. In cases where I am directly using Jacob’s scripts, I have mentioned it as such.
Part 01 – Practices affecting accuracy of data returned/processed
Using NOT IN is not really the solution
Manager: I need to find all the Sales staff who do not fall within a given set of territory IDs.
You: That’s easy. I will write up a simple T-SQL query to get you the results in a moment.
USE AdventureWorks2008R2 GO SELECT * FROM Sales.SalesPerson WHERE TerritoryID NOT IN (1,2,3,4,5)
Manager: Is this correct? It does not appear to be so.
USE AdventureWorks2008R2 GO --17 records SELECT COUNT(*) FROM Sales.SalesPerson --8 records SELECT COUNT(*) FROM Sales.SalesPerson WHERE TerritoryID IN (1,2,3,4,5) --6 records, instead of the expected 17-8 = 9 records! SELECT COUNT(*) FROM Sales.SalesPerson WHERE TerritoryID NOT IN (1,2,3,4,5)
From the above calculations, we see that about records are missing from the results set. These records are such which do not have any Territory ID allocated!
You: What went wrong? The query should have worked….
Here’s what happened…
Root Cause: NULL values are special. In the world of electronics, we are all familiar with NULL as the “tri-state” – ON, OFF and a third-state mysterious state. NULL technically means undefined, and hence cannot take part in comparison operations, by default.
When we used the NOT IN operator, we were asking SQL Server to carry out a comparison which it cannot do for records with NULL values and hence, these records were ignored.
Solution 01: You can modify this behaviour of SQL Server by setting ANSI_NULLS to OFF. However, this is not recommended because:
- Setting ANSI_NULLS to OFF causes the comparison operators in SQL Server to stop following the ISO standards
- ANSI_NULLS will always be ON in a future release of SQL Server (refer Books On Line)
Solution 02: Alternatively, you can also use the NOT EXISTS clause to fetch the correct results:
USE AdventureWorks2008R2 GO -- Solution #01 - Set ANSI_NULLS to OFF. /**************************************** This is not a recommended solution ****************************************/ SET ANSI_NULLS OFF SELECT * FROM Sales.SalesPerson WHERE TerritoryID NOT IN (1,2,3,4,5) -- Because setting ANSI_NULLS to OFF is not recommended, revert back SET ANSI_NULLS ON -- Solution #02 - Using the NOT EXISTS Clause SELECT * FROM Sales.SalesPerson sp1 WHERE NOT EXISTS (SELECT * FROM Sales.SalesPerson sp2 WHERE sp1.TerritoryID = sp2.TerritoryID AND sp2.TerritoryID IN (1,2,3,4,5))
COUNT(*) and LEFT v/s RIGHT joins
Assume the following scenario wherein we have a list of Sales Personnel in one table and the orders booked by them in another table. The scripts and data is supplied by Jacob, because such data does not exist in the AdventureWorks2008R2 database.
/*----------------------------------------------------------------------------- Date : 23 March 2011 SQL Version : SQL Server 2005/2008/Denali Author : Jacob Sebastian Email : email@example.com Twitter : @jacobsebastian Blog : http://beyondrelational.com/blogs/jacob Website : http://beyondrelational.com Summary: This script is part of the demo presented by Jacob Sebastian during Tech-ED India 2011 on "TSQL Worst Practices". Disclaimer: THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. -----------------------------------------------------------------------------*/ USE teched GO IF OBJECT_ID('salesrep','U') IS NOT NULL DROP TABLE salesrep GO CREATE TABLE salesrep (id int, name varchar(20)) GO INSERT INTO salesrep(id, name) SELECT 1, 'Jacob' INSERT INTO salesrep(id, name) SELECT 2, 'Pinal' INSERT INTO salesrep(id, name) SELECT 3, 'Mike' INSERT INTO salesrep(id, name) SELECT 4, 'Steve' IF OBJECT_ID('orders', 'U') IS NOT NULL DROP TABLE orders GO CREATE TABLE orders (salesrep INT, OrderID INT, Amount MONEY) INSERT INTO orders (salesrep, OrderID, Amount) select 1, 1, 100 INSERT INTO orders (salesrep, OrderID, Amount) select 1, 2, 100 INSERT INTO orders (salesrep, OrderID, Amount) select 2, 3, 100 INSERT INTO orders (salesrep, OrderID, Amount) select 2, 4, 100 INSERT INTO orders (salesrep, OrderID, Amount) select 2, 5, 100 SELECT * FROM salesrep SELECT * FROM orders
Manager: I want a list of all Sales Personnel with the number of orders booked by them.
You: Sure. I will have the results ready for your review in 5 minutes
SELECT s.name, COUNT(*) AS OrderCount FROM salesrep s LEFT JOIN orders o ON o.salesrep = s.id GROUP BY s.name
You (thinking): The results don’t look right. What’s going on?
Root Cause: Again, it is the NULL values that bother us. COUNT (*) causes an aggregation of all columns in the result set. Looking at Books On Line (COUNT aggregate function), we can understand that except for COUNT, all aggregate functions ignore NULL values. COUNT is the black sheep in the sense that by default it will count a NULL and all duplicate values as a 1.
Solution(s): There are a minimum of two solutions to this mystery.
- The first of the solutions is to use the INNER JOIN instead of the LEFT JOIN. However, this might not be what the manager wanted (remember that the manager had asked for a list of all sales personnel with the number of orders booked by them). If we use INNER JOIN, we will only get a list of sales personnel who have booked at least one order in their name
- The second involves using a column from the right-hand side table of the LEFT JOIN. This would be a correct query
-- Solution 01 - Technically correct, but may not be what the manager wanted SELECT s.name, COUNT(*) AS OrderCount FROM salesrep s INNER JOIN orders o ON o.salesrep = s.id GROUP BY s.name -- Solution 02 - Technically correct and what the manager wanted SELECT s.name, COUNT(o.orderid) AS OrderCount FROM salesrep s LEFT JOIN orders o ON o.salesrep = s.id GROUP BY s.name
Summarizing – Part 01:
Finally, summarizing all the points that have been discussed today (Part 01) and during Tech-Ed 2011
- Setting ANSI_NULLS to OFF is bad! Any code using this dependency should be revisited
- Using NOT IN may give you incorrect results. You may use NOT NULL or NOT EXISTS check instead. Be careful about the possibility of NULL values when using NOT IN
- COUNT(*) with LEFT JOIN may produce incorrect results
- Use COUNT(column) from RIGHT side table instead of COUNT(*)
- There are always hidden requirements as in the case of the manager of the COUNT(*) example – pay attention to each and every word!
Thank-you, Jacob, for this really great session!
You can read my memories of Tech-Ed by going through the following series:
Be courteous. Drive responsibly.