Category Archives: #SQLServer

All about Microsoft SQL Server

#0321 – SQL Server – Each GROUP BY expression must contain at least one column that is not an outer reference. – Msg 164


I was attempting to write an ad-hoc query which aggregated some data using the GROUP BY clause for a quick data check when a copy-paste error made me stumble upon the following error:


Msg 164, Level 15, State 1, Line 8
Each GROUP BY expression must contain at least one column that is not an outer reference.


The query that returned the error looked like:

USE AdventureWorks2008R2 ;
GO
DECLARE @accountNumber VARCHAR(30) = '10-4020-000676'

SELECT  YEAR(OrderDate) AS PurchaseYear,
        @accountNumber AS AccountNumber,
        SUM(TotalDue) AS TotalPurchase
FROM    Sales.SalesOrderHeader
WHERE   AccountNumber = @accountNumber
GROUP BY YEAR(OrderDate),
        @accountNumber ;
GO

What I typically do when I work with the group by clause is that I take the columns from the SELECT clause, remove the aggregations and use that list for the GROUP BY clause. In that process, I ended up with a variable (@accountNumber) in the GROUP BY list. A variable is ultimately an expression which was treated by SQL Server as an outer reference – which is not allowed in T-SQL. The solution therefore is to change the list for the GROUP BY to not use variables.

USE AdventureWorks2008R2 ;
GO
DECLARE @accountNumber VARCHAR(30) = '10-4020-000676';

SELECT  YEAR(OrderDate) AS PurchaseYear,
        @accountNumber AS AccountNumber,
        SUM(TotalDue) AS TotalPurchase
FROM    Sales.SalesOrderHeader
WHERE   AccountNumber = @accountNumber
GROUP BY YEAR(OrderDate),
        AccountNumber ;  --Notice that the variable has been replaced 
                         --with the corresponding field name
GO

[Edit – 02/16/2014 – 01:20AM IST]:


Based on a couple of comments that I received on this post, I would like to elaborate further on the behaviour that is exhibited by SQL Server.



  • The GROUP BY clause can operate on any expressions except single-row, single-column sub-queries

  • This expression must contain at least one column belonging to the tables referenced in the FROM clause of the statement where GROUP BY is being applied

  • A variable is a single-row, single-column expression (SELECT @accountNumber is perfectly valid), and does not reference any column from the tables used in the statement – this is what makes the variable invalid in the GROUP BY clause

Until we meet next time,


Be courteous. Drive responsibly.

#0320 – SQL Server – Dropping multiple constraints in a single statement


A couple of months ago, I read a post from Madhivanan [B|T] which taught me how to drop multiple tables in a single DROP TABLE statement [Link]. Last week I was working on a deployment script when I it hit me that I could drop multiple constraints (and columns) at the same time using the ALTER TABLE…DROP statement.

This method works on check, default and unique constraints.

Here’s a simple example:

USE AdventureWorks2012;
GO
ALTER TABLE HumanResources.Employee
    DROP CONSTRAINT CK_Employee_BirthDate,
                    CK_Employee_Gender,
                    CK_Employee_HireDate,
                    CK_Employee_MaritalStatus,
                    CK_Employee_SickLeaveHours,
                    CK_Employee_VacationHours,
                    DF_Employee_CurrentFlag,
                    DF_Employee_ModifiedDate,
                    DF_Employee_rowguid,
                    DF_Employee_SalariedFlag,
                    DF_Employee_SickLeaveHours,
                    DF_Employee_VacationHours;
GO

You can write a similar script for dropping multiple columns in the same manner.

Further Reading

  • ALTER TABLE [MSDN/BOL Link]
  • Drop multiple tables in a single DROP TABLE statement [Link from Madhivanan [B|T]]
  • Leveraging constraint evaluation sequence in SQL Server [Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0319 – SQL Server – CTE and UNION


I was recently asked a very interesting question at work. Someone wanted to use the same CTE in two queries the results of which were to be combined using the UNION clause. I was approached to see whether it was feasible.

In order to work out the feasibility, I simply carried out a quick PoC which I am sharing today. It is quite possible to use the same CTE in two queries which are combined via UNION.

USE tempdb;
GO
;WITH EmployeeListCTE (BusinessEntityId, IsCurrent)
AS (SELECT el.BusinessEntityId,
           el.IsCurrent
    FROM (VALUES (1, 1),
                 (2, 0),
                 (3, 1),
                 (4, 1),
                 (5, 1)
         ) AS el(BusinessEntityId, IsCurrent)
   )
SELECT elCTE.BusinessEntityId,
       elCTE.IsCurrent
FROM EmployeeListCTE AS elCTE 
WHERE elCTE.IsCurrent = 1
UNION
SELECT elCTE.BusinessEntityId,
       elCTE.IsCurrent
FROM EmployeeListCTE AS elCTE 
WHERE elCTE.IsCurrent = 0;
GO

/* RESULTS
----------------+-----------
BusinessEntityId|IsCurrent
----------------+-----------
1               |1
3               |1
4               |1
5               |1
2               |0
*/

References:

  • Common Table Expressions (CTE) [Link]
  • Interesting enhancements to the VALUES Clause in SQL Server 2008 [Link] (by Madhivanan (B|T))

Until we meet next time,

Be courteous. Drive responsibly.

#0318 – SQL Server – Performance Tuning – Use Temp tables instead of table variables when working with large data sets


I was recently involved in a performance tuning exercise for a fairly complex stored procedure which processed thousands of records at a time. This post is based on one of the changes that I made to improve the performance (and something that you can do too fairly easily).


Data was being staged into intermediate result sets which is not a bad strategy if you want to reduce the overall complexity of the execution plan. The problem was in the mechanism of the staging. Data staging was done into table variables, which allow the the convenience of temporary storage with the manageability of regular variables. However, temporary variables come with a big performance problem when large data sets are involved.



Table variables can result in an incorrect cardinality estimate in a query, resulting in poor performance.


Before I explain the reason behind this, allow me to present a small demo to demonstrate the problem.


Table Variables and large data sets


The example below is quite simple – I create a table variable with one column defined as a primary key and then populate some data into the table variable. Finally, I fetch data from the table variable with the actual execution plan turned on.

--Declare the table variable
DECLARE @tEmployeeList TABLE 
            (BusinessEntityId INT NOT NULL 
                    PRIMARY KEY CLUSTERED
            );

--Insert some test data
INSERT INTO @tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM @tEmployeeList;
GO

Hovering over the Clustered Index Scan operator in the actual execution plan shows us something interesting – there is a huge difference between the “Estimated Number of Rows” and the “Actual Number of Rows”.


image


Why is this a problem?


One might say – so, what’s the big deal? You are only selecting from the table variable. In this example, this behaviour of table variables does not have any impact. However, what would happen if this table variable is being used to join with other tables?


At the time of plan generation, the optimizer would have estimated that it would only receive a single record in the table variable. The overall query plan would have been generated with this assumption in mind.


But, at runtime, it got many more (1198 in this example) which indicates an issue with the cardinality estimate. Cardinality estimates are one of the prime reasons of poor query performance because they  result in a sub-optimal plan which would slow down the query.


The problem with cardinality estimates is seen because table variables do not have any statistics defined on them and a change in the number of records will therefore not trigger plan recompiles. In most cases, the query plan is built with the estimate that the table variable either has no rows or has 1 row.


This is why table variables must not be used when there are a large number of records in the data set (per the TechNet article referenced below, the magic number is 100) and when a cost based evaluation of a query is required.


The Solution


The solution is quite simple – use temporary tables instead of table variables!

--Safety Check
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

--Create the temporary table
CREATE TABLE #tEmployeeList 
            (BusinessEntityId INT NOT NULL 
                     PRIMARY KEY CLUSTERED
            );
GO

--Insert some test data
INSERT INTO #tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM #tEmployeeList;
GO

--Cleanup
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

image


Notice that both the estimated and the actual number of rows are the same which indicates that when used in a complex query, the cardinality estimate would be fairly accurate resulting in better performance.


Further Reading




  • Table data type [TechNet Link]

    • Especially refer the “Limitations and Restrictions” section

Until we meet next time,


Be courteous. Drive responsibly.

#0317 – SQL Server – A confession – Why you should not work on multiple tasks when writing a deployment script?


Delivery timelines and working till late in the evening – we have all been through these situations. This post is a true story – a confession about an incident that happened with me a couple of days ago. I was busy preparing a deployment script with a large number of stored procedures when I was distracted by a phone call reasonably late in the day. When I returned back to work, I made an error in the script that I was writing. What resulted afterwards was that the late evening turned into late night at work. As strange as it may seem, but when the error was caught, I simply laughed out loud at myself.


Test Scenario


Shown below is a quick demo of the mistake that I made. The script below creates two stored procedures – dbo.proc_Add2Numbers and dbo.proc_Multiply2Numbers.


But there is something wrong. Once you have gone through the script, pause a while and see if you can figure out the error.

USE tempdb;
GO
–Create the test procedures
IF OBJECT_ID(‘dbo.proc_Add2Numbers’,’P’) IS NOT NULL
DROP PROCEDURE dbo.proc_Add2Numbers;
GO
CREATE PROCEDURE dbo.proc_Add2Numbers
@iA INT,
@iB INT
AS
BEGIN
SET NOCOUNT ON;

SELECT (@iA + @iB) AS SumAB;
END;

IF OBJECT_ID(‘dbo.proc_Multiply2Numbers’,’P’) IS NOT NULL
DROP PROCEDURE dbo.proc_Multiply2Numbers;
GO
CREATE PROCEDURE dbo.proc_Multiply2Numbers
@iA INT,
@iB INT
AS
BEGIN
SET NOCOUNT ON;

SELECT (@iA * @iB) AS MultiplyAB;
END;
GO


The Result


Let’s run the test that make my head spin that evening. What happened was that a test similar to the following ran fine for the first time:

USE tempdb;
GO
EXEC dbo.proc_Multiply2Numbers @iA = 2, @iB = 5;
GO
EXEC dbo.proc_Add2Numbers @iA = 2, @iB = 5;
GO

image


But, when I tried to run it again with a different set of parameters:

USE tempdb;
GO
EXEC dbo.proc_Multiply2Numbers @iA = 3, @iB = 6;
GO
EXEC dbo.proc_Add2Numbers @iA = 3, @iB = 6;
GO

I landed with the following error:


Msg 2812, Level 16, State 62, Line 1
Could not find stored procedure ‘dbo.proc_Multiply2Numbers’.


So, I returned the database to it’s base state and repeated the process again – same error! It had already been way beyond my normal work hours and this error did it’s part to keep in the office for an hour more!


The Root Cause


If you have already figured out the error, that’s really great – you will surely have a great day ahead! But, I was not so lucky. After an hour of scratching my head, drinking coffee and looking at the script over and over again, I finally realized by mistake:



A missing batch terminator – “GO”!


If you, like me were unable did not figure it out, look at the script again. Or better still, run the following:

USE tempdb;
GO
SELECT OBJECT_DEFINITION(OBJECT_ID(‘dbo.proc_Add2Numbers’,’P’));
GO

/***********/
/* Results */
/***********/
CREATE PROCEDURE dbo.proc_Add2Numbers
@iA INT,
@iB INT
AS
BEGIN
SET NOCOUNT ON;

SELECT (@iA + @iB) AS SumAB;
END;

–IMPORTANT: Notice the missing GO here!

IF OBJECT_ID(‘dbo.proc_Multiply2Numbers’,’P’) IS NOT NULL
DROP PROCEDURE dbo.proc_Multiply2Numbers;


As you can see from the script results above, the batch terminator “GO” was missing between the two stored procedures in the deployment script because of which the script to check for the existence of dbo.proc_Multiple2Numbers was included in the definition of dbo.proc_Add2Numbers.


When I ran the test for the first time, I executed dbo.proc_Multiply2Numbers first. When the dbo.proc_Add2Numbers was executed, it dropped the procedure dbo.proc_Multiply2Numbers which is why it was unavailable in round #2.


Lessons Reminded/Learnt


I realized two lessons on that day:



  1. Reminder: A stored procedure definition includes everything from the CREATE PROCEDURE statement to the batch terminator
  2. Lessons Learnt: Do NOT multi-task!

Until we meet next time,


Be courteous. Drive responsibly.