Category Archives: Imported from BeyondRelational

These posts are imported from my old blog page: http://beyondrelational.com/modules/2/blogs/77/nakuls-blog.aspx

OUTPUT clause – Underappreciated features of Microsoft SQL Server


This is part of the series of posts that I am running under the title "Under Appreciated Features of SQL Server". The series has been inspired by Andy Warren’s editorial on SQLServerCentral.com.

The UPDATE Statement

Today, I will attempt to approach describing the OUTPUT clause in a slightly different manner. We will simply forget about it for a moment. We will talk about something much more simple and well-known like the UPDATE.

But, the UPDATE statement is not that simple – it’s actually two statements rolled into one – a DELETE and an INSERT are actually wrapped into a single statement.

Does that sound familiar? I am sure that you would have heard about the two special tables that SQL Server provides – the DELETED and INSERTED tables (Reference: http://msdn.microsoft.com/en-us/library/aa214435(SQL.80).aspx). These tables are used within triggers to:

  • Extend referential integrity between tables
  • Insert or update data in base tables underlying a view
  • Check for errors and take action based on the error
  • Find the difference between the state of a table before and after a data modification and take action(s) based on that difference

Here’s a quick example:

USE AdventureWorks2008R2
GO
--Step 01: Create two tables - the test table, and an Audit table
CREATE TABLE MyTABLE (MyId INT, MyCity VARCHAR(20))
CREATE TABLE MyAudit (MyChangedId INT, OldCityValue VARCHAR(20), NewCityValue VARCHAR(20), ModificationDate DATETIME)
GO

--Step 02: Insert some test data into these tables
INSERT INTO MyTABLE VALUES (1,'Gandhinagar'),
                           (2,'Ahmedabad'),
                           (3,'Mumbai'),
                           (4,'Delhi'),
                           (5,'Bangalore')
GO

--Step 03: Create a test trigger to demonstrate the most typical use of DELETED & INSERTED tables
CREATE TRIGGER trig_MyTable
ON MyTABLE
FOR UPDATE
AS
BEGIN
    INSERT INTO MyAudit (MyChangedId, OldCityValue, NewCityValue, ModificationDate)
                SELECT [DELETED].[MyId],
                       [DELETED].[MyCity],
                       [INSERTED].[MyCity],
                       GETUTCDATE()
                FROM [DELETED] [DELETED]
                INNER JOIN [INSERTED] [INSERTED] ON [DELETED].[MyId] = [INSERTED].[MyId]
END
GO

--Step 04: Perform a test update
SET STATISTICS IO, TIME ON

UPDATE mt
SET mt.MyCity = 'New Delhi'
FROM MyTABLE mt
WHERE mt.MyId = 4

SET STATISTICS IO, TIME OFF

--Step 05: Take a quick look at the data
SELECT MyId, MyCity FROM MyTABLE
SELECT MyChangedId, OldCityValue, NewCityValue, ModificationDate FROM MyAudit

--Step 06 - Perform some cleanup
DROP TABLE MyTABLE
DROP TABLE MyAudit

Upon running these scripts, you would notice that this is quite an reasonable method to maintain a custom log of critical transactions/queries.

image

The OUTPUT Clause

However, it might be possible that sometimes, you may not want to use triggers (because of personal choice, or because of the fact that triggers can be disabled); or this is a one-off debugging exercise or just an effort to learn the internals of SQL Server. In such cases, the recently introduced OUTPUT clause can be very useful.

USE AdventureWorks2008R2
GO
--Step 01: Create two tables - the test table, and an Audit table
CREATE TABLE MyTABLE (MyId INT, MyCity VARCHAR(20))
CREATE TABLE MyAudit (MyChangedId INT, OldCityValue VARCHAR(20), NewCityValue VARCHAR(20), ModificationDate DATETIME)
GO

--Step 02: Insert some test data into these tables
INSERT INTO MyTABLE VALUES (1,'Gandhinagar'),
                           (2,'Ahmedabad'),
                           (3,'Mumbai'),
                           (4,'Delhi'),
                           (5,'Bangalore')
GO

--Step 03: Perform a test update
SET STATISTICS IO, TIME ON

UPDATE mt
SET mt.MyCity = 'New Delhi'
    OUTPUT [deleted].[MyId], [deleted].MyCity, [inserted].MyCity, GETUTCDATE()
    INTO MyAudit
FROM MyTABLE mt
WHERE mt.MyId = 4

SET STATISTICS IO, TIME OFF

--Step 04: Take a quick look at the data
SELECT MyId, MyCity FROM MyTABLE
SELECT MyChangedId, OldCityValue, NewCityValue, ModificationDate FROM MyAudit

--Step 05 - Perform some cleanup
DROP TABLE MyTABLE
DROP TABLE MyAudit

What’s the benefit?

Drastically reduced & simplified code, and yet we get the same results. At the end of the day, it’s all about the benefits of a particular approach. So, do we have any improvement? We had the I/O and TIME statistics ON during the UPDATE statements. Let’s see what stories they tell us.

Here’s the output from the conventional trigger-based approach.

Table ‘MyTABLE’. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

(1 row(s) affected)

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 8 ms.

Table ‘MyAudit’. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 120 ms.

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 131 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

And here’s the same from the OUTPUT clause:

SQL Server parse and compile time:

   CPU time = 10 ms, elapsed time = 15 ms.

Table ‘MyAudit’. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘MyTABLE’. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 58 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 0 ms.

While there is no difference in the I/O (1 scan, 2 logical reads), the execution time has definitely improved with a total of (8+120+131) ms = 159ms with the trigger based approach v/s (15+58) ms = 75ms with the OUTPUT clause.

Finally, the OUTPUT clause can be used with all 3 of the major T-SQL statements – UPDATE, INSERT and DELETE!

Some finer points

The OUTPUT clause does not support:

  1. Subqueries or user-defined functions that perform user or system data access, or are assumed to perform such access. User-defined functions are assumed to perform data access if they are not schema-bound
  2. A remote table, view, or common table expression as it’s target
  3. A target with a FOREIGN KEY constraint, or referenced by a FOREIGN KEY constraint
  4. Triggers on the target
  5. A target participating in replication

Besides the above, be careful with the behaviour of @@ROWCOUNT. It will only return the number of rows affected by the outer INSERT statement.

Lots more information can be obtained by visiting the Books-On-Line page at: http://msdn.microsoft.com/en-us/library/ms177564(v=SQL.110).aspx

In Conclusion – a small challenge

Now that you are powered with the knowledge of the OUTPUT clause, can you answer the following Question of the Day at SQLServerCentral?

http://www.sqlservercentral.com/questions/T-SQL/72013/

Do leave a note as to how you did, and what are your thoughts on the same.

Until we meet next time,

Be courteous. Drive responsibly.

Common Table Expressions (CTE) – Underappreciated Features of Microsoft SQL Server


This is part of the series of posts that I am running under the title "Under Appreciated Features of SQL Server". The series has been inspired by Andy Warren’s editorial on SQLServerCentral.com.

Did you ever wonder if you could create a temporary result set and use that within a single SELECT, INSERT, UPDATE or DELETE statement? I bet you have, and before SQL Server 2005 came along, I am sure that you used either sub-queries, cursors or temporary tables  to achieve the desired results.

Starting SQL Server 2005, we have a much better, and much more efficient method of achieving this – the common table expression, or the CTE. Much has been written about CTEs and their benefits, and hence I will be mostly presenting a summary of all the great articles and posts on the topic that I have chanced upon till date:

  1. An introduction to CTEs – http://www.simple-talk.com/content/article.aspx?article=260
  2. The things that you can do with a CTE – http://msdn.microsoft.com/en-us/library/ms175972.aspx
  3. A few data points on CTE – http://msdn.microsoft.com/en-us/magazine/cc163346.aspx#S3

Something interesting – CTE in a CREATE VIEW statement

Yes, CTEs can be used in a CREATE VIEW statement as is demonstrated by the queries below:

USE AdventureWorks2008R2;
GO
CREATE VIEW vSalesCTE
AS
    -- Define the CTE expression name and column list.
    WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
    AS
    -- Define the CTE query.
    (
        SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
        FROM Sales.SalesOrderHeader
        WHERE SalesPersonID IS NOT NULL
    )
    SELECT Sales_CTE.SalesPersonID, Sales_CTE.SalesOrderID, Sales_CTE.SalesYear
    FROM Sales_CTE;
GO

--Use the VIEW containing a CTE
SELECT vcte.SalesPersonID, COUNT(vcte.SalesOrderID) AS TotalSales, vcte.SalesYear
FROM vSalesCTE vcte
GROUP BY vcte.SalesYear, vcte.SalesPersonID
ORDER BY vcte.SalesPersonID, vcte.SalesYear

Don’t overestimate CTEs – don’t use them multiple times within the same query

As is the case with everything, CTEs in excess can also cause more harm than good. CTEs are more of “expandable” blocks of code, and hence, if you use them more than once in a query, you will end up with performance issues. This is because the entire data set is populated the number of times a CTE is used. For cases where you need to use the same data set again and again, use temporary tables or table variables instead.

I hope that CTE’s opened up a whole new programming paradigm for all. If you have not started experimenting with CTEs, I recommend that you start doing so – it will be fun.

Until we meet next time,

Be courteous. Drive responsibly.

Windowing functions – Underappreciated features of Microsoft SQL Server


This is part of the series of posts that I am running under the title “Under Appreciated Features of SQL Server“. The series has been inspired by Andy Warren’s editorial on SQLServerCentral.com.

Introduction

Normally, we all work with subsets of rows that meet a pre-defined, user defined criteria (Ex. records corresponding to transactions done on any given date, etc). Any subset of user-defined records or rows is called a “window”. Please note that the keyword here is user-defined. A window thus corresponds to rows returned as a result of a particular user query.

Any function that operates on each rows of this user-defined set of rows or “window” is called a “windowing function”. Windowing functions are an ISO defined SQL standard, and Microsoft SQL Server (2005 and up) provides us with the option to use ranking and aggregate windowing functions. Today we will study a few of these powerful functions.

SQL Server Books-On-Line is a great learning resource for these, and hence I have basically compiled the information in Books-On-Line as an introduction to the windowing functions.

OVER clause

The OVER clause determines the partitioning and ordering of the rowset before the associated window function is applied. This is similar to the function of the GROUP BY and the ORDER BY clause, but is executed during the SELECT operation itself – so that the output of these operations can be provided to the ranking functions we will see in a minute. The ORDER BY clause has been enhanced to work with the OVER clause that allows us to create ordered partitions.

The advantage that the OVER clause and the windowing functions have over the conventional GROUP BY and ORDER BY implementations is that the partitions and ordering for each function is independent. To do this conventionally, we would have required the use of sub-queries – the OVER clause is far more efficient than that. In the following example, I take the script provided in Books On Line and attempt to reproduce just one aggregate operation (SUM) using the conventional methods.

--Enable the actual execution plan to see the performance benefits of the OVER clause
--Refer (http://beyondrelational.com/blogs/nakul/archive/2011/03/10/some-underappreciated-productivity-features-of-ssms.aspx)
--or use Ctrl+M

--Clear out the procedure cache
DBCC FREEPROCCACHE
GO

USE AdventureWorks2008;
GO
SELECT SalesOrderID, ProductID, OrderQty
    ,SUM(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Total'
    ,AVG(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Avg'
    ,COUNT(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Count'
    ,MIN(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Min'
    ,MAX(OrderQty) OVER(PARTITION BY SalesOrderID) AS 'Max'
FROM Sales.SalesOrderDetail 
WHERE SalesOrderID IN(43659,43664);
GO

--Clear out the procedure cache
DBCC FREEPROCCACHE
GO

USE AdventureWorks2008;
GO
SELECT sd.SalesOrderID, sd.ProductID, sd.OrderQty,
       (SELECT SUM(sd1.OrderQty) FROM Sales.SalesOrderDetail sd1 WHERE sd1.SalesOrderID = sd.SalesOrderID) AS 'Total'
FROM Sales.SalesOrderDetail sd
WHERE sd.SalesOrderID IN (43659, 43664)
GROUP BY sd.SalesOrderID, sd.ProductID, sd.OrderQty

Let’s see how the actual execution plan came along. (To display the actual execution plan, you may refer my previous post or use Ctrl+M before executing the query)

image

The screen-shot says it all! Performing all aggregation operations – SUM, AVG, COUNT, MIN and MAX using the OVER clause are far more efficient than doing a single SUM operation using the conventional methods.

Finally, please note that the OVER clause cannot be used with the CHECKSUM aggregate function.

Ranking functions

As the name suggests, ranking functions return a ranking value for each row in the particular window/partition. Ranking may be unique depending upon the type of ranking function used. Please note that all ranking functions are non-deterministic. You may want to keep this in mind for performance considerations.

Four ranking functions are available in Microsoft SQL Server, and they are as under:

Ranking Function Brief description (per BOL) Unique ranking? BOL Link
RANK Returns the rank of each row within the partition of a result set.
If two records share the same rank, a gap is observed in the sequence, ex. if two records share the same rank #1, the next rank issued will be #3.
N Click here
DENSE_RANK Returns the rank of rows within the partition of a result set, without any gaps in the ranking.
Ex. if two records share the same rank #1, the next rank issued will be #2.
N Click here
NTILE

Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

N Click here
ROW_NUMBER

Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

Y Click here

To demonstrate these functions in action, I have  slightly modified the example provided to us in BOL as under:

USE AdventureWorks2008;
GO

--Ordering the result by the Postal Code
SELECT p.FirstName, p.LastName
    
    --Unique row-number
    ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS 'Row Number'
    
    --Provide a rank, such that the largest Postal Code gets the highest rank 
    --Break in the sequence expected if more than one record share the same rank
    ,RANK() OVER (ORDER BY a.PostalCode) AS 'Rank'
    
    --Provide a rank, such that the largest Postal Code gets the highest rank
    --No break in the sequence
    ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS 'Dense Rank'

    --Break the result set in 4 groups.
    --Because for AdventureWorks, the result is a set of 13 records, we have the set with the
    --largest postal code has 4 records, rest have 3. The result seen in the column 'Quartile' is
    --actually the group number in which the record belongs to
    ,NTILE(4) OVER (ORDER BY a.PostalCode) AS 'Quartile'

    ,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s 
    INNER JOIN Person.Person p 
        ON s.BusinessEntityID = p.BusinessEntityID
    INNER JOIN Person.Address a 
        ON a.AddressID = p.BusinessEntityID
WHERE TerritoryID IS NOT NULL 
    AND SalesYTD <> 0;

 

The results of executing the above script are as under:

image

  • Ordering is by PostalCode to determine the RANK. Hence, we will have only two sets of values – beginning at the actual row number (i.e. 1 and 6)
  • On the other hand, for DENSERANK(), we continue to have two sets of values, but they longer follow the Row Number
  • Finally, notice that for NTILE, the SQL Server attempted to divide 13 records into 4 groups, which resulted in the 1st group being larger than the other 3

Reference

The following are good references for understanding windowing functions:

  1. OVER clause – http://msdn.microsoft.com/en-us/library/ms189461.aspx
  2. RANKING functions – http://msdn.microsoft.com/en-us/library/ms189798.aspx
  3. Aggregate functions – http://msdn.microsoft.com/en-us/library/ms173454.aspx

In Conclusion

Windowing functions are so efficient and easy to use that starting the day I was introduced to them, I have always used them wherever possible. I suggest you too, give it a try. Do post your feedback and also mention the challenges that you have in working with windowing functions.

Until we meet next time,

Be courteous. Drive responsibly.

T-SQL Quiz 2011 is ON!


This month, we will exercise some gray cells. The T-SQL Quiz 2011 is now in progress, and I hope that by now, all of the T-SQL fans amongst you would have visited it.

For those who came in late, the T-SQL Quiz is an annual Quiz on T-SQL organized on this very web-site! I have been following it and participating since the last two years, and below is my experience so far:

  • T-SQL Quiz is a great way to learn 30 new things in 30 days! – a crash course in the comfort of your home without any investment
  • Some of the best Quizmasters contribute to the quiz – they have tons of industry experience, and are active community contributors
  • You can have as many contributions as you wish
  • T-SQL Quiz is always open for viewing – even after the submissions close. This can serve as great future reference when similar situations come up at work
  • Anyone can participate!
  • Most importantly, it’s fun!

So, how can I start?

Well, you start by visiting the T-SQL 2011 Quiz URL – http://beyondrelational.com/quiz/SQLServer/TSQL/2011/default.aspx – and then clicking on each day of the calendar to look at, and answer the question for that particular day.

But, I am online only once a week…

That’s not a problem. Each question is open for a period of 30 days, so you have plenty of time to research, learn about and then answer the question.

The rewards

This is the most interesting part – rewards! The first prize is an Apple iPad sponsored by Redgate Software!

So, what are you waiting for? Give T-SQL Quiz 2011 a try this week-end itself. I am sure you will enjoy the experience. If you have any questions, head over to the FAQs section at http://beyondrelational.com/quiz/faq.aspx

Until we meet next time,

Be courteous. Drive responsibly.

Some underappreciated productivity features of SSMS


As we look forward to Project Juneau; I thought of quickly writing about some features of it’s predecessor – the SQL Server Management Studio (well-known as SSMS). SSMS had power-packed features available at a single click of the mouse – which we might be using in our daily work.

None of what I am about to mention is new, it’s just that they are some great productivity enhancing methods of doing the same thing. I hope that you like these tips, and adopt them in your day-to-day encounters with SSMS.

Trace Query in SQL Server Profiler

When working on a performance tuning exercise, the SQL Server Profiler is one of the most used tools. Assume that we have a query all ready for execution in SSMS and need to launch a new instance of the SQL Profiler. There’s no need to navigate out of the Query Editor window! The context menu has a direct link to launch the profiler! What’s better is that launching Profiler using this option logs in to the server automatically, and starts off your default trace!

Once the profiler is launched and the default trace started off (automatically!), all we need to do is run the query after coming back to SSMS.

image image

Query Options

We may want to run a SQL query with multiple different execution settings, and do not really want to use the keyboard to type in the options with their proper syntax, the context menu on the query editor is there for our rescue.

In the screen-shots below, I have demonstrated how simply using the Query options window takes effect:

image image

Because setting the query options is only local to the particular session, opening up a new SSMS window and running the query there will confirm that changes are not persisted.

Other “cool” features

In addition to the two features I wrote about today, you can also do the following directly from the Query Editor window itself:

  1. Change the active connection
  2. Display Estimated and Actual execution plans
  3. Launch the Database engine tuning advisor
  4. Include Client statistics – (great introduction to client statistics is from Mr. Pinal Dave’s (blog) post – http://blog.sqlauthority.com/2009/10/01/sql-server-sql-server-management-studio-and-client-statistics/)

If you want to know more about SQL 11 (“Denali”)

If you want to know more about SQL 11 (“Denali”), a good place to start will be the “What’s new” listing available on this very web-site! You can reach this list right here – http://beyondrelational.com/whatisnew/sqlserver/denali.aspx. Simply click on the feature you want to explore and if one of us from the community has written about it, chances are, it is linked here.

Until we meet next time,

Be courteous. Drive responsibly.