Category Archives: Imported from BeyondRelational

These posts are imported from my old blog page: http://beyondrelational.com/modules/2/blogs/77/nakuls-blog.aspx

#0319 – SQL Server – CTE and UNION


I was recently asked a very interesting question at work. Someone wanted to use the same CTE in two queries the results of which were to be combined using the UNION clause. I was approached to see whether it was feasible.

In order to work out the feasibility, I simply carried out a quick PoC which I am sharing today. It is quite possible to use the same CTE in two queries which are combined via UNION.

USE tempdb;
GO
;WITH EmployeeListCTE (BusinessEntityId, IsCurrent)
AS (SELECT el.BusinessEntityId,
           el.IsCurrent
    FROM (VALUES (1, 1),
                 (2, 0),
                 (3, 1),
                 (4, 1),
                 (5, 1)
         ) AS el(BusinessEntityId, IsCurrent)
   )
SELECT elCTE.BusinessEntityId,
       elCTE.IsCurrent
FROM EmployeeListCTE AS elCTE 
WHERE elCTE.IsCurrent = 1
UNION
SELECT elCTE.BusinessEntityId,
       elCTE.IsCurrent
FROM EmployeeListCTE AS elCTE 
WHERE elCTE.IsCurrent = 0;
GO

/* RESULTS
----------------+-----------
BusinessEntityId|IsCurrent
----------------+-----------
1               |1
3               |1
4               |1
5               |1
2               |0
*/

References:

  • Common Table Expressions (CTE) [Link]
  • Interesting enhancements to the VALUES Clause in SQL Server 2008 [Link] (by Madhivanan (B|T))

Until we meet next time,

Be courteous. Drive responsibly.

#0318 – SQL Server – Performance Tuning – Use Temp tables instead of table variables when working with large data sets


I was recently involved in a performance tuning exercise for a fairly complex stored procedure which processed thousands of records at a time. This post is based on one of the changes that I made to improve the performance (and something that you can do too fairly easily).


Data was being staged into intermediate result sets which is not a bad strategy if you want to reduce the overall complexity of the execution plan. The problem was in the mechanism of the staging. Data staging was done into table variables, which allow the the convenience of temporary storage with the manageability of regular variables. However, temporary variables come with a big performance problem when large data sets are involved.



Table variables can result in an incorrect cardinality estimate in a query, resulting in poor performance.


Before I explain the reason behind this, allow me to present a small demo to demonstrate the problem.


Table Variables and large data sets


The example below is quite simple – I create a table variable with one column defined as a primary key and then populate some data into the table variable. Finally, I fetch data from the table variable with the actual execution plan turned on.

--Declare the table variable
DECLARE @tEmployeeList TABLE 
            (BusinessEntityId INT NOT NULL 
                    PRIMARY KEY CLUSTERED
            );

--Insert some test data
INSERT INTO @tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM @tEmployeeList;
GO

Hovering over the Clustered Index Scan operator in the actual execution plan shows us something interesting – there is a huge difference between the “Estimated Number of Rows” and the “Actual Number of Rows”.


image


Why is this a problem?


One might say – so, what’s the big deal? You are only selecting from the table variable. In this example, this behaviour of table variables does not have any impact. However, what would happen if this table variable is being used to join with other tables?


At the time of plan generation, the optimizer would have estimated that it would only receive a single record in the table variable. The overall query plan would have been generated with this assumption in mind.


But, at runtime, it got many more (1198 in this example) which indicates an issue with the cardinality estimate. Cardinality estimates are one of the prime reasons of poor query performance because they  result in a sub-optimal plan which would slow down the query.


The problem with cardinality estimates is seen because table variables do not have any statistics defined on them and a change in the number of records will therefore not trigger plan recompiles. In most cases, the query plan is built with the estimate that the table variable either has no rows or has 1 row.


This is why table variables must not be used when there are a large number of records in the data set (per the TechNet article referenced below, the magic number is 100) and when a cost based evaluation of a query is required.


The Solution


The solution is quite simple – use temporary tables instead of table variables!

--Safety Check
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

--Create the temporary table
CREATE TABLE #tEmployeeList 
            (BusinessEntityId INT NOT NULL 
                     PRIMARY KEY CLUSTERED
            );
GO

--Insert some test data
INSERT INTO #tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM #tEmployeeList;
GO

--Cleanup
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

image


Notice that both the estimated and the actual number of rows are the same which indicates that when used in a complex query, the cardinality estimate would be fairly accurate resulting in better performance.


Further Reading




  • Table data type [TechNet Link]

    • Especially refer the “Limitations and Restrictions” section

Until we meet next time,


Be courteous. Drive responsibly.

#0317 – SQL Server – A confession – Why you should not work on multiple tasks when writing a deployment script?


Delivery timelines and working till late in the evening – we have all been through these situations. This post is a true story – a confession about an incident that happened with me a couple of days ago. I was busy preparing a deployment script with a large number of stored procedures when I was distracted by a phone call reasonably late in the day. When I returned back to work, I made an error in the script that I was writing. What resulted afterwards was that the late evening turned into late night at work. As strange as it may seem, but when the error was caught, I simply laughed out loud at myself.


Test Scenario


Shown below is a quick demo of the mistake that I made. The script below creates two stored procedures – dbo.proc_Add2Numbers and dbo.proc_Multiply2Numbers.


But there is something wrong. Once you have gone through the script, pause a while and see if you can figure out the error.

USE tempdb;
GO
–Create the test procedures
IF OBJECT_ID(‘dbo.proc_Add2Numbers’,’P’) IS NOT NULL
DROP PROCEDURE dbo.proc_Add2Numbers;
GO
CREATE PROCEDURE dbo.proc_Add2Numbers
@iA INT,
@iB INT
AS
BEGIN
SET NOCOUNT ON;

SELECT (@iA + @iB) AS SumAB;
END;

IF OBJECT_ID(‘dbo.proc_Multiply2Numbers’,’P’) IS NOT NULL
DROP PROCEDURE dbo.proc_Multiply2Numbers;
GO
CREATE PROCEDURE dbo.proc_Multiply2Numbers
@iA INT,
@iB INT
AS
BEGIN
SET NOCOUNT ON;

SELECT (@iA * @iB) AS MultiplyAB;
END;
GO


The Result


Let’s run the test that make my head spin that evening. What happened was that a test similar to the following ran fine for the first time:

USE tempdb;
GO
EXEC dbo.proc_Multiply2Numbers @iA = 2, @iB = 5;
GO
EXEC dbo.proc_Add2Numbers @iA = 2, @iB = 5;
GO

image


But, when I tried to run it again with a different set of parameters:

USE tempdb;
GO
EXEC dbo.proc_Multiply2Numbers @iA = 3, @iB = 6;
GO
EXEC dbo.proc_Add2Numbers @iA = 3, @iB = 6;
GO

I landed with the following error:


Msg 2812, Level 16, State 62, Line 1
Could not find stored procedure ‘dbo.proc_Multiply2Numbers’.


So, I returned the database to it’s base state and repeated the process again – same error! It had already been way beyond my normal work hours and this error did it’s part to keep in the office for an hour more!


The Root Cause


If you have already figured out the error, that’s really great – you will surely have a great day ahead! But, I was not so lucky. After an hour of scratching my head, drinking coffee and looking at the script over and over again, I finally realized by mistake:



A missing batch terminator – “GO”!


If you, like me were unable did not figure it out, look at the script again. Or better still, run the following:

USE tempdb;
GO
SELECT OBJECT_DEFINITION(OBJECT_ID(‘dbo.proc_Add2Numbers’,’P’));
GO

/***********/
/* Results */
/***********/
CREATE PROCEDURE dbo.proc_Add2Numbers
@iA INT,
@iB INT
AS
BEGIN
SET NOCOUNT ON;

SELECT (@iA + @iB) AS SumAB;
END;

–IMPORTANT: Notice the missing GO here!

IF OBJECT_ID(‘dbo.proc_Multiply2Numbers’,’P’) IS NOT NULL
DROP PROCEDURE dbo.proc_Multiply2Numbers;


As you can see from the script results above, the batch terminator “GO” was missing between the two stored procedures in the deployment script because of which the script to check for the existence of dbo.proc_Multiple2Numbers was included in the definition of dbo.proc_Add2Numbers.


When I ran the test for the first time, I executed dbo.proc_Multiply2Numbers first. When the dbo.proc_Add2Numbers was executed, it dropped the procedure dbo.proc_Multiply2Numbers which is why it was unavailable in round #2.


Lessons Reminded/Learnt


I realized two lessons on that day:



  1. Reminder: A stored procedure definition includes everything from the CREATE PROCEDURE statement to the batch terminator
  2. Lessons Learnt: Do NOT multi-task!

Until we meet next time,


Be courteous. Drive responsibly.

#0316 – SQL Server – sp_help and multi-part naming of objects – Msg 102 – Incorrect syntax near ‘.’


I was recently working on exploring a couple of tables in a database that I was troubleshooting for performance purposes. I was using the system stored procedure sp_help and all was fine until I started accessing tables with a schema other than the default – “dbo”.


As soon as I started to access tables with schemas other than “dbo”, I encountered the following error:

USE AdventureWorks2012;
GO
sp_help HumanResources.Employee;
GO

Msg 102, Level 15, State 1, Line 1
Incorrect syntax near ‘.’.


Initially I thought that it was because I had not enclosed them in square brackets to indicate object identifiers, so, I tried that with the same results:

USE AdventureWorks2012;
GO
sp_help [HumanResources].[Employee];
GO

Msg 102, Level 15, State 1, Line 1
Incorrect syntax near ‘.’.


Then, I realized that sp_help is possibly unable to handle the multi-part naming convention of SQL Server objects, which to me is quite odd. So, I enclosed the entire two-part name inside of square brackets and it worked!

USE AdventureWorks2012;
GO
sp_help [HumanResources.Employee];
GO

image


And so did this:

USE AdventureWorks2012;
GO
sp_help ‘HumanResources.Employee’;
GO

As I mentioned earlier, I find this behaviour of sp_help to be strange. I can understand why it accepts [schemaname.objectname] format, but what I can’t understand is why it can’t accept [schemaname].[objectname]. While I was able to get through the exploratory process with my newly discovered workaround, I am quite sure that many developers would be ending up with this error day in and day out.


Have you encountered this behaviour?


Further Reading:



  • An Introduction to Multi-part naming standards for Object name referencing [Link]
  • Scripts to generate and parse multi-part database object names – PARSENAME() function [Link]
  • sp_help [Books On Line Link]

Until we meet next time,


Be courteous. Drive responsibly.

#0315-SQL Server-Different ways to check for existence of an object (table/SP, etc)


I was recently reviewing the deployment scripts for a couple of products and I noticed that each team/DBA had their own style – and neither of them was incorrect. So, I sat about documenting each of those different styles for my own academic interest. I was able to document a total of 6 different ways in which various teams/DBAs and tools check for existence of a SQL Server object (table, stored procedure, etc).

These 6 different ways are shown below:

USE AdventureWorks2012;
GO

--1. The ISO compliant way
SELECT *
FROM INFORMATION_SCHEMA.TABLES AS ist
WHERE ist.TABLE_SCHEMA = 'HumanResources'
  AND ist.TABLE_NAME = 'Employee';
GO

--2. Using SQL Server Catalog Views
SELECT *
FROM sys.tables AS st
WHERE st.schema_id = SCHEMA_ID('HumanResources')
  AND st.name = 'Employee'
  AND st.is_ms_shipped = 0; --We are only looking for user objects!
GO

--3. Using SQL Server Catalog Views
SELECT *
FROM sys.objects AS so
WHERE so.type = 'U'
  AND so.schema_id = SCHEMA_ID('HumanResources')
  AND so.name = 'Employee'
  AND so.is_ms_shipped = 0; --We are only looking for user objects!
GO

--4. Using the OBJECT_ID function (The easy way)
--If the OBJECT_ID does not return a NULL value, the object exists
--For the object type value, refer http://technet.microsoft.com/en-us/library/ms190324.aspx
SELECT OBJECT_ID('HumanResources.Employee','U') AS ObjectId;
GO

--5. A Hybrid appraoch
SELECT *
FROM sys.objects AS so
WHERE so.object_id = OBJECT_ID('HumanResources.Employee','U')
  AND so.is_ms_shipped = 0; --We are only looking for user objects!
GO

--6. The SSMS way 
--   (if you have set your scripting options to check for object existence)
SELECT *
FROM sys.objects AS so
WHERE so.object_id = OBJECT_ID('HumanResources.Employee')
  AND so.type IN ('U');
GO

You can see here that we have a wide variety of methods that check for existence of an object – from ISO compliant ways to purely SQL Server specific ways.

My favourite method is method #4, using the function OBJECT_ID(). What one is your favourite?

Do note that OBJECT_ID will not work for objects which are not schema scoped, e.g. DDL triggers.

Further Reading

Until we meet next time,

Be courteous. Drive responsibly.