Author Archives: nakulvachhrajani

Unknown's avatar

About nakulvachhrajani

Nakul Vachhrajani is a SQL Server MCTS, TOGAF certified Technical Architect with Capgemini having a total IT experience of more than 16 years. Nakul is an active blogger, and has been a guest columnist for SQLAuthority.com and SQLServerCentral.com. Nakul presented a webcast on the “Underappreciated Features of Microsoft SQL Server” at the Microsoft Virtual Tech Days Exclusive Webcast series (May 02-06, 2011) on May 06, 2011. He is also the author of a research paper on Database upgrade methodologies, which was published in a CSI journal, published nationwide. In addition to his passion about SQL Server, Nakul also contributes to the academia out of personal interest. He visits various colleges and universities as an external faculty to judge project activities being carried out by the students. Disclaimer: The opinions expressed herein are his own personal opinions and do not represent his employer’s view in anyway.

#0322 – SQL Server – Why is it not a good idea to implement NOT NULL check as a CHECK constraint


I continue to be surprised by the various practices that developers use to achieve the same result.


The common understanding with SQL Server DDL is that in most cases, there is only one method to implement a given design element. For example, to ensure that a column does not accept NULL values, the solution that comes to mind is to define it as a NOT NULL column.


Recently, I came across a trainee who proposed to implement the NOT NULL requirement for a given column in a way that’s not conventional – he proposed to use CHECK constraints. I had never heard of anyone using CHECK constraints to enforce a NOT NULL check, but I liked the thought process and hence wrote a quick prototype to demonstrate why this was not a good idea.


The following is how I explained why we don’t use CHECK constraints to enfore NOT NULL values (for sake of brevity, I am demonstrating with a table that has only one column).

USE tempdb ;
GO

--Safety Check
IF OBJECT_ID('dbo.NotNullCheckConstraint', 'U') IS NOT NULL 
    BEGIN
        DROP TABLE dbo.NotNullCheckConstraint ;
    END
GO

--Table kept short for sake of brevity
CREATE TABLE dbo.NotNullCheckConstraint ( Age INT ) ;
GO

ALTER TABLE dbo.NotNullCheckConstraint
    ADD CONSTRAINT chk_AgeSpecified CHECK ( Age > 0 ) ;
GO

On most environments, though, this will not stop the user from entering NULLs. Let’s try it out:

USE tempdb;
GO
--Insert Test Data to prove that the CHECK constraint does not stop NULL values
INSERT  INTO dbo.NotNullCheckConstraint ( Age )
VALUES  ( NULL ),
        ( NULL ) ;
GO

--Select data from the table
SELECT  Age
FROM    dbo.NotNullCheckConstraint ;
GO
/*
Results:
--------
Age
--------
NULL
NULL
*/

The only real way to enforce a NOT NULL values via CHECK constraint is to alter the constraint to the following:

--Truncate the test table
TRUNCATE TABLE dbo.NotNullCheckConstraint;
GO

--Drop the old constraint
ALTER TABLE dbo.NotNullCheckConstraint 
    DROP CONSTRAINT chk_AgeSpecified ;
GO

--Create a new constraint
ALTER TABLE dbo.NotNullCheckConstraint
    ADD CONSTRAINT chk_AgeSpecifiedWithNotNullCheck CHECK ( Age IS NOT NULL
                                                            AND Age > 0 ) ;
GO

Let us now try to insert NULL values into the table:

USE tempdb ;
GO
--Insert Test Data to prove that the new CHECK constraint did stop NULL values
INSERT  INTO dbo.NotNullCheckConstraint ( Age )
VALUES  ( NULL ),
        ( NULL ) ;
GO

--Select data from the table
SELECT  Age
FROM    dbo.NotNullCheckConstraint ;
GO
/*
Results:
--------
Age
--------
*/

We encounter the following error message:


Msg 547, Level 16, State 0, Line 2
The INSERT statement conflicted with the CHECK constraint “chk_AgeSpecifiedWithNotNullCheck”. The conflict occurred in database “tempdb”, table “dbo.NotNullCheckConstraint”, column ‘Age’.

The statement has been terminated.


(0 row(s) affected)


In this case, the CHECK constraint did indeed prevent us from inserting NULL values into the table.


But, is this a good idea?



The answer, in my humble opinion is – No.


And, I have two reaons for it:



  1. CHECK constraints are ideally suited to enforce business rules and business logic rather than data integrity


    • They are evaluated after the SQL Server database engine is sure that the input values do not violate any properties or boundaries of the defined column

    • I even wrote an article about the constraint execution sequence on SQLServerCentral.com here – Leveraging Constraint Evaluation Sequence in SQL Server. The examples in the article demonstrate how the NOT NULL constraint is evaluated before CHECK constraints

  2. A CHECK constraint can be disabled by a user having sufficient priviledges thereby allowing the application to insert NULL values – a NOT NULL column definition on the other hand would remain in effect

Here’s a quick check for point #2 (disabling a CHECK constraint may allow to enter invalid data).

USE tempdb;
GO
--Now, disable the constraint
ALTER TABLE dbo.NotNullCheckConstraint
    NOCHECK CONSTRAINT chk_AgeSpecifiedWithNotNullCheck;
GO

--Insert Test Data to prove that the new CHECK constraint did NOT stop NULL values
INSERT  INTO dbo.NotNullCheckConstraint ( Age )
VALUES  ( NULL ),
        ( NULL ) ;
GO

--Select data from the table
SELECT  Age
FROM    dbo.NotNullCheckConstraint ;
GO
/*
Results:
--------
Age
--------
NULL
NULL
*/

/********************************************************************************/
USE tempdb ;
GO
--Now, re-enable the constraint
--Notice no error is reported
ALTER TABLE dbo.NotNullCheckConstraint
    CHECK CONSTRAINT chk_AgeSpecifiedWithNotNullCheck;
GO

--Insert Test Data to prove that the new CHECK constraint did stop NULL values
--This will report an error
INSERT  INTO dbo.NotNullCheckConstraint ( Age )
VALUES  ( NULL ),
        ( NULL ) ;
GO

--Select data from the table - results from earlier run are still available
SELECT  Age
FROM    dbo.NotNullCheckConstraint ;
GO
/*
Results:
--------
Age
--------
NULL
NULL
*/

Summary


Roughly defined, patterns are common solutions to common problems. There is a reason why Patterns and Practices were developed and this a classic example. A CHECK constraint can be used to enforce a NOT NULL check, but it is not a normal pattern and the reaons are:



  1. CHECK constraints are evaluated after NOT NULL validation

  2. CHECK constraints may be disabled allowing the user to enter invalid data into the table

Further Reading



  • Leveraging Constraint Evaluation Sequence in SQL Server [Link]

  • Using Regular Expressions with CHECK constraints [Link]

  • Defining CHECK constraints on computed columns [Link]

  • Defining “complex” CHECK constraints [Link]

Until we meet next time,


Be courteous. Drive responsibly.

#0321 – SQL Server – Each GROUP BY expression must contain at least one column that is not an outer reference. – Msg 164


I was attempting to write an ad-hoc query which aggregated some data using the GROUP BY clause for a quick data check when a copy-paste error made me stumble upon the following error:


Msg 164, Level 15, State 1, Line 8
Each GROUP BY expression must contain at least one column that is not an outer reference.


The query that returned the error looked like:

USE AdventureWorks2008R2 ;
GO
DECLARE @accountNumber VARCHAR(30) = '10-4020-000676'

SELECT  YEAR(OrderDate) AS PurchaseYear,
        @accountNumber AS AccountNumber,
        SUM(TotalDue) AS TotalPurchase
FROM    Sales.SalesOrderHeader
WHERE   AccountNumber = @accountNumber
GROUP BY YEAR(OrderDate),
        @accountNumber ;
GO

What I typically do when I work with the group by clause is that I take the columns from the SELECT clause, remove the aggregations and use that list for the GROUP BY clause. In that process, I ended up with a variable (@accountNumber) in the GROUP BY list. A variable is ultimately an expression which was treated by SQL Server as an outer reference – which is not allowed in T-SQL. The solution therefore is to change the list for the GROUP BY to not use variables.

USE AdventureWorks2008R2 ;
GO
DECLARE @accountNumber VARCHAR(30) = '10-4020-000676';

SELECT  YEAR(OrderDate) AS PurchaseYear,
        @accountNumber AS AccountNumber,
        SUM(TotalDue) AS TotalPurchase
FROM    Sales.SalesOrderHeader
WHERE   AccountNumber = @accountNumber
GROUP BY YEAR(OrderDate),
        AccountNumber ;  --Notice that the variable has been replaced 
                         --with the corresponding field name
GO

[Edit – 02/16/2014 – 01:20AM IST]:


Based on a couple of comments that I received on this post, I would like to elaborate further on the behaviour that is exhibited by SQL Server.



  • The GROUP BY clause can operate on any expressions except single-row, single-column sub-queries

  • This expression must contain at least one column belonging to the tables referenced in the FROM clause of the statement where GROUP BY is being applied

  • A variable is a single-row, single-column expression (SELECT @accountNumber is perfectly valid), and does not reference any column from the tables used in the statement – this is what makes the variable invalid in the GROUP BY clause

Until we meet next time,


Be courteous. Drive responsibly.

#0320 – SQL Server – Dropping multiple constraints in a single statement


A couple of months ago, I read a post from Madhivanan [B|T] which taught me how to drop multiple tables in a single DROP TABLE statement [Link]. Last week I was working on a deployment script when I it hit me that I could drop multiple constraints (and columns) at the same time using the ALTER TABLE…DROP statement.

This method works on check, default and unique constraints.

Here’s a simple example:

USE AdventureWorks2012;
GO
ALTER TABLE HumanResources.Employee
    DROP CONSTRAINT CK_Employee_BirthDate,
                    CK_Employee_Gender,
                    CK_Employee_HireDate,
                    CK_Employee_MaritalStatus,
                    CK_Employee_SickLeaveHours,
                    CK_Employee_VacationHours,
                    DF_Employee_CurrentFlag,
                    DF_Employee_ModifiedDate,
                    DF_Employee_rowguid,
                    DF_Employee_SalariedFlag,
                    DF_Employee_SickLeaveHours,
                    DF_Employee_VacationHours;
GO

You can write a similar script for dropping multiple columns in the same manner.

Further Reading

  • ALTER TABLE [MSDN/BOL Link]
  • Drop multiple tables in a single DROP TABLE statement [Link from Madhivanan [B|T]]
  • Leveraging constraint evaluation sequence in SQL Server [Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0319 – SQL Server – CTE and UNION


I was recently asked a very interesting question at work. Someone wanted to use the same CTE in two queries the results of which were to be combined using the UNION clause. I was approached to see whether it was feasible.

In order to work out the feasibility, I simply carried out a quick PoC which I am sharing today. It is quite possible to use the same CTE in two queries which are combined via UNION.

USE tempdb;
GO
;WITH EmployeeListCTE (BusinessEntityId, IsCurrent)
AS (SELECT el.BusinessEntityId,
           el.IsCurrent
    FROM (VALUES (1, 1),
                 (2, 0),
                 (3, 1),
                 (4, 1),
                 (5, 1)
         ) AS el(BusinessEntityId, IsCurrent)
   )
SELECT elCTE.BusinessEntityId,
       elCTE.IsCurrent
FROM EmployeeListCTE AS elCTE 
WHERE elCTE.IsCurrent = 1
UNION
SELECT elCTE.BusinessEntityId,
       elCTE.IsCurrent
FROM EmployeeListCTE AS elCTE 
WHERE elCTE.IsCurrent = 0;
GO

/* RESULTS
----------------+-----------
BusinessEntityId|IsCurrent
----------------+-----------
1               |1
3               |1
4               |1
5               |1
2               |0
*/

References:

  • Common Table Expressions (CTE) [Link]
  • Interesting enhancements to the VALUES Clause in SQL Server 2008 [Link] (by Madhivanan (B|T))

Until we meet next time,

Be courteous. Drive responsibly.

#0318 – SQL Server – Performance Tuning – Use Temp tables instead of table variables when working with large data sets


I was recently involved in a performance tuning exercise for a fairly complex stored procedure which processed thousands of records at a time. This post is based on one of the changes that I made to improve the performance (and something that you can do too fairly easily).


Data was being staged into intermediate result sets which is not a bad strategy if you want to reduce the overall complexity of the execution plan. The problem was in the mechanism of the staging. Data staging was done into table variables, which allow the the convenience of temporary storage with the manageability of regular variables. However, temporary variables come with a big performance problem when large data sets are involved.



Table variables can result in an incorrect cardinality estimate in a query, resulting in poor performance.


Before I explain the reason behind this, allow me to present a small demo to demonstrate the problem.


Table Variables and large data sets


The example below is quite simple – I create a table variable with one column defined as a primary key and then populate some data into the table variable. Finally, I fetch data from the table variable with the actual execution plan turned on.

--Declare the table variable
DECLARE @tEmployeeList TABLE 
            (BusinessEntityId INT NOT NULL 
                    PRIMARY KEY CLUSTERED
            );

--Insert some test data
INSERT INTO @tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM @tEmployeeList;
GO

Hovering over the Clustered Index Scan operator in the actual execution plan shows us something interesting – there is a huge difference between the “Estimated Number of Rows” and the “Actual Number of Rows”.


image


Why is this a problem?


One might say – so, what’s the big deal? You are only selecting from the table variable. In this example, this behaviour of table variables does not have any impact. However, what would happen if this table variable is being used to join with other tables?


At the time of plan generation, the optimizer would have estimated that it would only receive a single record in the table variable. The overall query plan would have been generated with this assumption in mind.


But, at runtime, it got many more (1198 in this example) which indicates an issue with the cardinality estimate. Cardinality estimates are one of the prime reasons of poor query performance because they  result in a sub-optimal plan which would slow down the query.


The problem with cardinality estimates is seen because table variables do not have any statistics defined on them and a change in the number of records will therefore not trigger plan recompiles. In most cases, the query plan is built with the estimate that the table variable either has no rows or has 1 row.


This is why table variables must not be used when there are a large number of records in the data set (per the TechNet article referenced below, the magic number is 100) and when a cost based evaluation of a query is required.


The Solution


The solution is quite simple – use temporary tables instead of table variables!

--Safety Check
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

--Create the temporary table
CREATE TABLE #tEmployeeList 
            (BusinessEntityId INT NOT NULL 
                     PRIMARY KEY CLUSTERED
            );
GO

--Insert some test data
INSERT INTO #tEmployeeList (BusinessEntityId)
SELECT BusinessEntityId
FROM Person.Person
WHERE (BusinessEntityId % 16) = 0;

--Fetch data from the temporary table
--Make sure that "Show Actual Execution Plan" (Ctrl + M) is shown
SELECT * FROM #tEmployeeList;
GO

--Cleanup
IF OBJECT_ID('tempdb..#tEmployeeList','U') IS NOT NULL
    DROP TABLE #tEmployeeList;
GO

image


Notice that both the estimated and the actual number of rows are the same which indicates that when used in a complex query, the cardinality estimate would be fairly accurate resulting in better performance.


Further Reading




  • Table data type [TechNet Link]

    • Especially refer the “Limitations and Restrictions” section

Until we meet next time,


Be courteous. Drive responsibly.