I was recently involved in a performance tuning exercise for a fairly complex stored procedure which processed thousands of records at a time. This post is based on one of the changes that I made to improve the performance (and something that you can do too fairly easily).
Data was being staged into intermediate result sets which is not a bad strategy if you want to reduce the overall complexity of the execution plan. The problem was in the mechanism of the staging. Data staging was done into table variables, which allow the the convenience of temporary storage with the manageability of regular variables. However, temporary variables come with a big performance problem when large data sets are involved.
Table variables can result in an incorrect cardinality estimate in a query, resulting in poor performance.
Before I explain the reason behind this, allow me to present a small demo to demonstrate the problem.
Table Variables and large data sets
The example below is quite simple – I create a table variable with one column defined as a primary key and then populate some data into the table variable. Finally, I fetch data from the table variable with the actual execution plan turned on.
--Declare the table variable DECLARE @tEmployeeList TABLE (BusinessEntityId INT NOT NULL PRIMARY KEY CLUSTERED ); --Insert some test data INSERT INTO @tEmployeeList (BusinessEntityId) SELECT BusinessEntityId FROM Person.Person WHERE (BusinessEntityId % 16) = 0; --Fetch data from the temporary table --Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown SELECT * FROM @tEmployeeList; GO
Hovering over the Clustered Index Scan operator in the actual execution plan shows us something interesting – there is a huge difference between the “Estimated Number of Rows” and the “Actual Number of Rows”.
Why is this a problem?
One might say – so, what’s the big deal? You are only selecting from the table variable. In this example, this behaviour of table variables does not have any impact. However, what would happen if this table variable is being used to join with other tables?
At the time of plan generation, the optimizer would have estimated that it would only receive a single record in the table variable. The overall query plan would have been generated with this assumption in mind.
But, at runtime, it got many more (1198 in this example) which indicates an issue with the cardinality estimate. Cardinality estimates are one of the prime reasons of poor query performance because they result in a sub-optimal plan which would slow down the query.
The problem with cardinality estimates is seen because table variables do not have any statistics defined on them and a change in the number of records will therefore not trigger plan recompiles. In most cases, the query plan is built with the estimate that the table variable either has no rows or has 1 row.
This is why table variables must not be used when there are a large number of records in the data set (per the TechNet article referenced below, the magic number is 100) and when a cost based evaluation of a query is required.
The solution is quite simple – use temporary tables instead of table variables!
--Safety Check IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL DROP TABLE #tEmployeeList; GO --Create the temporary table CREATE TABLE #tEmployeeList (BusinessEntityId INT NOT NULL PRIMARY KEY CLUSTERED ); GO --Insert some test data INSERT INTO #tEmployeeList (BusinessEntityId) SELECT BusinessEntityId FROM Person.Person WHERE (BusinessEntityId % 16) = 0; --Fetch data from the temporary table --Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown SELECT * FROM #tEmployeeList; GO --Cleanup IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL DROP TABLE #tEmployeeList; GO
Notice that both the estimated and the actual number of rows are the same which indicates that when used in a complex query, the cardinality estimate would be fairly accurate resulting in better performance.
- Table data type [TechNet Link]
- Especially refer the “Limitations and Restrictions” section
Until we meet next time,