This is in continuation to my series on the Underappreciated features of Microsoft SQL Server. The series has been inspired from Andy Warren’s editorial on SQLServerCentral.com of the same name.
After a number of posts on my visit to Tech-Ed 2011 (India), we are now back on track and will continue from where we left off. We were discussing the T-SQL related underappreciated features and the next one in line was the PIVOT operator. Because PIVOT has been around since SQL Server 2005, and is simplifies a very important functionality, I was quite surprised to see this on the list of underappreciated features. Hence, I will start off with an introduction, provide and explain an example, and then point out the reader to a couple of reference resources that I used when I was learning this T-SQL enhancement.
Cross-tabs are a very important piece of any OLTP application. These systems typically have large amounts of data – all in great detail, but arranged “row-wise”. This arrangement is very much required for a high-performance OLTP system because it generally involves more incoming transactions and less of read operations. However, not every “row-wise” detail can be put up on a report – it makes great technical sense, but not business sense. For the data to have some business value, it needs to have the ability to be aggregated effectively when requested. What this means is that we need to convert one table-valued expression into another table.
The problem
Let’s take an example. In a sales setup, it is important to measure the number of orders placed by certain employees. In the AdventureWorks2008R2 sample database, this information is stored such that all unique values of interest (EmployeeId and VendorId) are in individual columns.
USE AdventureWorks2008R2 GO SELECT poh.PurchaseOrderID, poh.EmployeeID, poh.VendorID FROM Purchasing.PurchaseOrderHeader poh
What we need is that the unique values from one column in the expression (EmployeeID) are converted or transformed into multiple columns in the output, and an aggregation is performed on the remaining columns in the output. The unique values in the EmployeeID column themselves need to become fields in the final result set.
EmployeeID as Columns | |
VendorID as Rows | Aggregations – COUNT of PurchaseOrderIDs for the vendor in the row key for the employee in the column key |
The Conventional Solution
Conventionally (before SQL 2005), we would have ended up doing something like:
USE AdventureWorks2008R2 GO SELECT poh.VendorID, (SELECT COUNT(poh1.PurchaseOrderID) FROM Purchasing.PurchaseOrderHeader poh1 WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 250) AS Employee1, (SELECT COUNT(poh1.PurchaseOrderID) FROM Purchasing.PurchaseOrderHeader poh1 WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 251) AS Employee2, (SELECT COUNT(poh1.PurchaseOrderID) FROM Purchasing.PurchaseOrderHeader poh1 WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 256) AS Employee3, (SELECT COUNT(poh1.PurchaseOrderID) FROM Purchasing.PurchaseOrderHeader poh1 WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 257) AS Employee4, (SELECT COUNT(poh1.PurchaseOrderID) FROM Purchasing.PurchaseOrderHeader poh1 WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 260) AS Employee5 FROM Purchasing.PurchaseOrderHeader poh WHERE poh.EmployeeID IN ( 250, 251, 256, 257, 260 ) GROUP BY poh.VendorID
Such a query would have given us the result as shown in the screen-shot below, which is what we require.
When we look at the execution plan of this query, we find that it is anything but efficient – in fact, it’s terrible! Due to space constraints, I am only showing the core of the execution plan (i.e. where most computations are concentrated).
Depending upon the situation, you may also want to use a complex series of the SELECT…CASE statements.
The solution – PIVOT & UNPIVOT operators
Come SQL 2005, we had a exciting T-SQL enhancements introduced by Microsoft. Some of them are the introduction of the PIVOT and UNPIVOT operators.
The following is the T-SQL query using PIVOT to carry out the same computation demonstrated above:
USE AdventureWorks2008R2; GO SELECT VendorID, [250] AS Emp1, [251] AS Emp2, [256] AS Emp3, [257] AS Emp4, [260] AS Emp5 FROM (SELECT PurchaseOrderID, EmployeeID, VendorID FROM Purchasing.PurchaseOrderHeader) p PIVOT (COUNT (PurchaseOrderID) FOR EmployeeID IN ( [250], [251], [256], [257], [260] ) ) AS pvt ORDER BY pvt.VendorID;
In this example, the results of the following sub-query are PIVOT’ed on the EmployeeID column.
USE AdventureWorks2008R2; GO SELECT PurchaseOrderID, EmployeeID, VendorID FROM Purchasing.PurchaseOrderHeader
Here is a walk-through of the above query:
- The PurchaseOrderID column serves as the grouping column along with the EmployeeID
- This set is then aggregated to produce an output as under:
Some points to consider:
- When PIVOT and UNPIVOT are used against databases that are upgraded to SQL Server 2005 or later, the compatibility level of the database must be set to 90 or higher
- When aggregate functions are used with PIVOT, the presence of any null values in the value column are not considered when computing an aggregation
To demonstrate UNPIVOT, let’s store the output of the PIVOT operation above into a temporary table variable and then execute the UNPIVOT operation on it. By doing so, there is a very important point I want to demonstrate.
USE AdventureWorks2008R2; GO --Declare a temporary table variable to hold the output of the PIVOT DECLARE @pvt TABLE (VendorID INT, Emp1 INT, Emp2 INT, Emp3 INT, Emp4 INT, Emp5 INT) INSERT INTO @pvt SELECT VendorID, [250] AS Emp1, [251] AS Emp2, [256] AS Emp3, [257] AS Emp4, [260] AS Emp5 FROM (SELECT PurchaseOrderID, EmployeeID, VendorID FROM Purchasing.PurchaseOrderHeader) p PIVOT (COUNT (PurchaseOrderID) FOR EmployeeID IN ( [250], [251], [256], [257], [260] ) ) AS pvt ORDER BY pvt.VendorID --UNPIVOT the table. SELECT VendorID, Employee, Orders FROM (SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5 FROM @pvt) p UNPIVOT (Orders FOR Employee IN (Emp1, Emp2, Emp3, Emp4, Emp5) )AS unpvt; GO
The following is the output:
At this point, I would like you to understand the following major differences between PIVOT and UNPIVOT:
- UNPIVOT is not the exact reverse of PIVOT. This is because during a PIVOT operation, we aggregate data, which causes loss of data-granularity by merging multiple rows into a single row
- Also, because PIVOT ignores NULL values during the aggregations, they would no longer be present in the output of the UNPIVOT operation
Comparing PIVOT with the conventional query
Any feature is only useful if it has it’s own benefits. Obviously, PIVOT comes with the benefit of being more read-able and maintainable, but is it any better in terms of performance over the older, more conventional query?
So, I put both queries together, turned on the execution plan and executed them. The below was what the execution plan showed me. Do I need to say anything more?
Some great reference resources
I hope that you can now appreciate the benefits of using newer features of T-SQL as they come along. While there is an age-old saying that “If it isn’t broken, don’t fix it”, but in this case, we are not fixing anything – we are doing value addition – which is always welcome. As we say good-bye today, I will leave you with some great reference resources that helped me understand PIVOT:
- Getting started with PIVOT Queries in SQL Server 2005/2008 by Jacob Sebastian
- Another PIVOT Query example by Jacob Sebastian
- Posts on PIVOT & UNPIVOT tables by Pinal Dave
- Dynamic PIVOT in SQL Server 2005 by Madhivanan
- Dynamic cross-tab with multiple PIVOT columns by Madhivanan
Until we meet next time,
Be courteous. Drive responsibly.