I was recently involved in a performance tuning exercise for a fairly complex stored procedure which processed thousands of records at a time. This post is based on one of the changes that I made to improve the performance (and something that you can do too fairly easily).
Data was being staged into intermediate result sets which is not a bad strategy if you want to reduce the overall complexity of the execution plan. The problem was in the mechanism of the staging. Data staging was done into table variables, which allow the the convenience of temporary storage with the manageability of regular variables. However, temporary variables come with a big performance problem when large data sets are involved.
Table variables can result in an incorrect cardinality estimate in a query, resulting in poor performance.
Before I explain the reason behind this, allow me to present a small demo to demonstrate the problem.
Table Variables and large data sets
The example below is quite simple – I create a table variable with one column defined as a primary key and then populate some data into the table variable. Finally, I fetch data from the table variable with the actual execution plan turned on.
--Declare the table variable DECLARE @tEmployeeList TABLE (BusinessEntityId INT NOT NULL PRIMARY KEY CLUSTERED ); --Insert some test data INSERT INTO @tEmployeeList (BusinessEntityId) SELECT BusinessEntityId FROM Person.Person WHERE (BusinessEntityId % 16) = 0; --Fetch data from the temporary table --Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown SELECT * FROM @tEmployeeList; GO
Hovering over the Clustered Index Scan operator in the actual execution plan shows us something interesting – there is a huge difference between the “Estimated Number of Rows” and the “Actual Number of Rows”.
Why is this a problem?
One might say – so, what’s the big deal? You are only selecting from the table variable. In this example, this behaviour of table variables does not have any impact. However, what would happen if this table variable is being used to join with other tables?
At the time of plan generation, the optimizer would have estimated that it would only receive a single record in the table variable. The overall query plan would have been generated with this assumption in mind.
But, at runtime, it got many more (1198 in this example) which indicates an issue with the cardinality estimate. Cardinality estimates are one of the prime reasons of poor query performance because they result in a sub-optimal plan which would slow down the query.
The problem with cardinality estimates is seen because table variables do not have any statistics defined on them and a change in the number of records will therefore not trigger plan recompiles. In most cases, the query plan is built with the estimate that the table variable either has no rows or has 1 row.
This is why table variables must not be used when there are a large number of records in the data set (per the TechNet article referenced below, the magic number is 100) and when a cost based evaluation of a query is required.
The Solution
The solution is quite simple – use temporary tables instead of table variables!
--Safety Check IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL DROP TABLE #tEmployeeList; GO --Create the temporary table CREATE TABLE #tEmployeeList (BusinessEntityId INT NOT NULL PRIMARY KEY CLUSTERED ); GO --Insert some test data INSERT INTO #tEmployeeList (BusinessEntityId) SELECT BusinessEntityId FROM Person.Person WHERE (BusinessEntityId % 16) = 0; --Fetch data from the temporary table --Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown SELECT * FROM #tEmployeeList; GO --Cleanup IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL DROP TABLE #tEmployeeList; GO
Notice that both the estimated and the actual number of rows are the same which indicates that when used in a complex query, the cardinality estimate would be fairly accurate resulting in better performance.
Further Reading
- Table data type [TechNet Link]
- Especially refer the “Limitations and Restrictions” section
Until we meet next time,
Hi sir,
from my personal experience i found that if you really want to make your code less cumbersome its always good to use CTE or DERIVED table, using temp table or temp vaiable is not a good idia for that. i start using with temp variable first of course for less logging and locking, or some time where plan should use the seek per row using nested loop join ( however not a great with cold cache ) where i take advantages of that 1 ROW ;). and them move to temp table. temp variable doest not have associated statistics with them but temptable have,
but these is statement level trick option(recompile) or procedure level with recompile to use the rowcount of temp variable,
and 2012 onward there is a trace flag 2453 but can not just cant not use the values associated with them. if the data associated within is also a concern then move to temp table. there is a caching mechanism from sql server 2005 onwards apply for temp table and temp variable as well. which is great reduce the LATCH contention from tempdb. but tere is a downside for that as well once if temptable or variable cached, subsiquent calls uses that rowcount and values as well . it won`t goona fix in future releses as well microsoft closed this phenomina so then there is only one wany use updates statistics and use recompile with them. paul white has describe it very very details.
so there is no straightforward approach for using temp table or temp variable we need to investigate our requirement and testing for our environment,
thanks.
LikeLike