Category Archives: #SQLServer

All articles related to Microsoft SQL Server

#0321 – SQL Server – Each GROUP BY expression must contain at least one column that is not an outer reference. – Msg 164

#0320 – SQL Server – Dropping multiple constraints in a single statement

A couple of months ago, I read a post from Madhivanan [B|T] which taught me how to drop multiple tables in a single DROP TABLE statement [Link]. Last week I was working on a deployment script when I it hit me that I could drop multiple constraints (and columns) at the same time using the ALTER TABLE…DROP statement.

This method works on check, default and unique constraints.

Here’s a simple example:

USE AdventureWorks2012;
GO
ALTER TABLE HumanResources.Employee
    DROP CONSTRAINT CK_Employee_BirthDate,
                    CK_Employee_Gender,
                    CK_Employee_HireDate,
                    CK_Employee_MaritalStatus,
                    CK_Employee_SickLeaveHours,
                    CK_Employee_VacationHours,
                    DF_Employee_CurrentFlag,
                    DF_Employee_ModifiedDate,
                    DF_Employee_rowguid,
                    DF_Employee_SalariedFlag,
                    DF_Employee_SickLeaveHours,
                    DF_Employee_VacationHours;
GO

You can write a similar script for dropping multiple columns in the same manner.

#0319 – SQL Server – CTE and UNION

References:

Common Table Expressions (CTE) [Link]
Interesting enhancements to the VALUES Clause in SQL Server 2008 [Link] (by Madhivanan (B|T))

Until we meet next time,

Be courteous. Drive responsibly.

#0318 – SQL Server – Performance Tuning – Use Temp tables instead of table variables when working with large data sets

1 Reply

I was recently involved in a performance tuning exercise for a fairly complex stored procedure which processed thousands of records at a time. This post is based on one of the changes that I made to improve the performance (and something that you can do too fairly easily).

Data was being staged into intermediate result sets which is not a bad strategy if you want to reduce the overall complexity of the execution plan. The problem was in the mechanism of the staging. Data staging was done into table variables, which allow the the convenience of temporary storage with the manageability of regular variables. However, temporary variables come with a big performance problem when large data sets are involved.

Table variables can result in an incorrect cardinality estimate in a query, resulting in poor performance.

Before I explain the reason behind this, allow me to present a small demo to demonstrate the problem.

Table Variables and large data sets

The example below is quite simple – I create a table variable with one column defined as a primary key and then populate some data into the table variable. Finally, I fetch data from the table variable with the actual execution plan turned on.

--Declare the table variable
DECLARE @tEmployeeList TABLE 
            (BusinessEntityId INT NOT NULL 
                    PRIMARY KEY CLUSTERED
            );

--Insert some test data
INSERT INTO @tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM @tEmployeeList;
GO

Hovering over the Clustered Index Scan operator in the actual execution plan shows us something interesting – there is a huge difference between the “Estimated Number of Rows” and the “Actual Number of Rows”.

Why is this a problem?

One might say – so, what’s the big deal? You are only selecting from the table variable. In this example, this behaviour of table variables does not have any impact. However, what would happen if this table variable is being used to join with other tables?

At the time of plan generation, the optimizer would have estimated that it would only receive a single record in the table variable. The overall query plan would have been generated with this assumption in mind.

But, at runtime, it got many more (1198 in this example) which indicates an issue with the cardinality estimate. Cardinality estimates are one of the prime reasons of poor query performance because they result in a sub-optimal plan which would slow down the query.

The problem with cardinality estimates is seen because table variables do not have any statistics defined on them and a change in the number of records will therefore not trigger plan recompiles. In most cases, the query plan is built with the estimate that the table variable either has no rows or has 1 row.

This is why table variables must not be used when there are a large number of records in the data set (per the TechNet article referenced below, the magic number is 100) and when a cost based evaluation of a query is required.

The Solution

The solution is quite simple – use temporary tables instead of table variables!

--Safety Check
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

--Create the temporary table
CREATE TABLE #tEmployeeList 
            (BusinessEntityId INT NOT NULL 
                     PRIMARY KEY CLUSTERED
            );
GO

--Insert some test data
INSERT INTO #tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM #tEmployeeList;
GO

--Cleanup
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

Notice that both the estimated and the actual number of rows are the same which indicates that when used in a complex query, the cardinality estimate would be fairly accurate resulting in better performance.

#0317 – SQL Server – A confession – Why you should not work on multiple tasks when writing a deployment script?

Test Scenario

Shown below is a quick demo of the mistake that I made. The script below creates two stored procedures – dbo.proc_Add2Numbers and dbo.proc_Multiply2Numbers.

But there is something wrong. Once you have gone through the script, pause a while and see if you can figure out the error.

USE tempdb;

GO

–Create the test procedures

IF OBJECT_ID(‘dbo.proc_Add2Numbers’,’P’) IS NOT NULL

    DROP PROCEDURE dbo.proc_Add2Numbers;

GO

CREATE PROCEDURE dbo.proc_Add2Numbers

    @iA INT,

    @iB INT

AS

BEGIN

    SET NOCOUNT ON;
    SELECT (@iA + @iB) AS SumAB;

END;
IF OBJECT_ID(‘dbo.proc_Multiply2Numbers’,’P’) IS NOT NULL

    DROP PROCEDURE dbo.proc_Multiply2Numbers;

GO

CREATE PROCEDURE dbo.proc_Multiply2Numbers

    @iA INT,

    @iB INT

AS

BEGIN

    SET NOCOUNT ON;
    SELECT (@iA * @iB) AS MultiplyAB;

END;

GO

The Result

Let’s run the test that make my head spin that evening. What happened was that a test similar to the following ran fine for the first time:

USE tempdb;

GO

EXEC dbo.proc_Multiply2Numbers @iA = 2, @iB = 5;

GO

EXEC dbo.proc_Add2Numbers @iA = 2, @iB = 5;

GO

But, when I tried to run it again with a different set of parameters:

USE tempdb;

GO

EXEC dbo.proc_Multiply2Numbers @iA = 3, @iB = 6;

GO

EXEC dbo.proc_Add2Numbers @iA = 3, @iB = 6;

GO

I landed with the following error:

Msg 2812, Level 16, State 62, Line 1
Could not find stored procedure ‘dbo.proc_Multiply2Numbers’.

So, I returned the database to it’s base state and repeated the process again – same error! It had already been way beyond my normal work hours and this error did it’s part to keep in the office for an hour more!

The Root Cause

If you have already figured out the error, that’s really great – you will surely have a great day ahead! But, I was not so lucky. After an hour of scratching my head, drinking coffee and looking at the script over and over again, I finally realized by mistake:

A missing batch terminator – “GO”!

If you, like me were unable did not figure it out, look at the script again. Or better still, run the following:

USE tempdb;

GO

SELECT OBJECT_DEFINITION(OBJECT_ID(‘dbo.proc_Add2Numbers’,’P’));

GO
/***********/

/* Results */

/***********/

CREATE PROCEDURE dbo.proc_Add2Numbers

    @iA INT,

    @iB INT

AS

BEGIN

    SET NOCOUNT ON;
    SELECT (@iA + @iB) AS SumAB;

END;
–IMPORTANT: Notice the missing GO here!
IF OBJECT_ID(‘dbo.proc_Multiply2Numbers’,’P’) IS NOT NULL

    DROP PROCEDURE dbo.proc_Multiply2Numbers;

As you can see from the script results above, the batch terminator “GO” was missing between the two stored procedures in the deployment script because of which the script to check for the existence of dbo.proc_Multiple2Numbers was included in the definition of dbo.proc_Add2Numbers.

When I ran the test for the first time, I executed dbo.proc_Multiply2Numbers first. When the dbo.proc_Add2Numbers was executed, it dropped the procedure dbo.proc_Multiply2Numbers which is why it was unavailable in round #2.

Lessons Reminded/Learnt

I realized two lessons on that day:

Reminder: A stored procedure definition includes everything from the CREATE PROCEDURE statement to the batch terminator
Lessons Learnt: Do NOT multi-task!