The T-SQL language constantly undergoes modifications. Every release bring in something new, something to help develop an easier solution to a common business problem. With Microsoft SQL Server 2012, enhancements have been incorporated to the Windows functions to allow an easier solution to what is commonly referred to as the “Running Total Problem”.
The whole concept of running totals is that as the transactions continue to be added, the system maintains the total of a given sequence. This allows the system to fetch this summary without having to maintain any record of the individual transactions itself.
Here’s an example based on the transactions for one of the SalesOrders in the AdventureWorks2012 sample database.
SalesOrderId | UnitPrice | OrderQty | Running Total | Running Total Formula |
43659 | 2024.994 | 1 | 2024.994 | (Unit Price * Order Quantity) |
43659 | 2024.994 | 3 | 8099.976 | Total from previous line + (Unit Price * Order Quantity) |
43659 | 2024.994 | 1 | 10124.97 | (same as above) |
43659 | 2039.994 | 1 | 12164.964 | (same as above) |
43659 | 2039.994 | 1 | 14204.958 | (same as above) |
43659 | 2039.994 | 2 | 18284.946 | (same as above) |
43659 | 2039.994 | 1 | 20324.94 | (same as above) |
43659 | 28.8404 | 3 | 20411.4612 | (same as above) |
43659 | 28.8404 | 1 | 20440.3016 | (same as above) |
43659 | 5.7 | 6 | 20474.5016 | (same as above) |
43659 | 5.1865 | 2 | 20484.8746 | (same as above) |
43659 | 20.1865 | 4 | 20565.6206 | (same as above) |
Essentially, the running total at each row is the sum of the running total value from the previous row and the expression to be added from the current row (in this case, Unit Price * Order Quantity). That’s where the problems start in SQL Server.
The running total is calculated in a sequence and unfortunately, SQL Server does not guarantee the order of records in a result set which makes it difficult to compute the running total in a SET based operation. The solution to this problem has therefore been a CURSOR operation where each record is independently processed in a loop.
However, the T-SQL enhancements made in SQL Server 2012 have a possible solution to this problem. The the aggregate function SUM() has been enhanced to operate as a windowing function which now supports the ORDER BY clause in the OVER clause. This gives the ability for SUM to be performed in the intended sequence of the transactions. Here’s a query to demonstrate this:
USE AdventureWorks2012; GO SELECT SalesOrderID, SalesOrderDetailID, UnitPrice, OrderQty, LineTotal, SUM(UnitPrice * OrderQty) OVER (PARTITION BY SalesOrderId ORDER BY SalesOrderDetailID ) AS RunningTotal FROM Sales.SalesOrderDetail WHERE SalesOrderID = 43659; GO
NOTE: Attempting to run this query against a SQL Server 2008R2 or lower environment will result in an error related to missing support of the ORDER BY clause. They support the use of OVER and PARTITION BY with the SUM() function, but do not support ORDER BY.
The result set available from the query is shown below.
As can be seen from the result set, Microsoft SQL Server auto-calculated the running total for the given Sales Order in a given sequence. These results also match with the table provided above.
Finally, the purpose of a running total is that a system/user should not have to maintain a record of all the transactions. Hence, the maximum value of the running total should match the Sub-total stored on the SalesOrder header record.
USE AdventureWorks2012; GO SELECT SalesOrderID, SubTotal, TaxAmt, Freight, TotalDue FROM Sales.SalesOrderHeader WHERE SalesOrderID = 43659; GO
Summary
The ORDER BY clause can be used when using SUM() over a window defined by the OVER clause to generate a running total of values in a given sequence.
It is really amazing that the solution to a problem that has challenged many brains over the years can be this simple. Kudos to the #SQLServer product development team at #Microsoft. Thanks, guys!
Further Reading
- Windowing functions – Underappreciated features of Microsoft SQL Server [Link]
- Aggregate Function – SUM [Link]
- Over Clause [Link]
- Running Totals, Wikipedia [Link]
Until we meet next time,
One thing to be careful of here – your example uses the default for the ROWS/RANGE clause, which is RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW. Range works by the value of the column, so if you have a duplicate value then those are added together (based upon the ORDER BY column(s) within a partition).
A quick example:
DECLARE @test TABLE (RowID INTEGER IDENTITY, TotalValue INTEGER);
INSERT INTO @test (TotalValue) VALUES (1), (2), (3), (3), (4);
SELECT RowID, TotalValue,
SUM(TotalValue) OVER (ORDER BY TotalValue) AS [Range],
SUM(TotalValue) OVER (ORDER BY TotalValue ROWS UNBOUNDED PRECEDING) AS [Rows]
FROM @test;
Results:
RowID TotalValue Range Rows
———– ———– ———– ———–
1 1 1 1
2 2 3 3
3 3 9 6
4 3 9 9
5 4 13 13
You can see how the two values of “3” were handled differently. For a true “Running Total”, ensure that you specify the ROWS clause and don’t just take the default.
LikeLike
Thank-you for the feedback, Wayne! Yes, the duplicate values will have an effect in the default clause that my formula uses. The UNBOUNDED PRECEDING clause must be mentioned if the column being aggregated may have duplicate values which need to be eliminated.
LikeLike
Nakul,
No, you need to not use the default ROWS/RANGE clause of RANGE. To accomplish that, you need to specify ROWS.
UNBOUNDED PRECEDING is the starting window declaration. In my example, I used UNBOUNDED PRECEDING, and accepted the default ending window declaration of CURRENT ROW. When specifing ROWS (or RANGE), you need to specify the window boundaries, so in my example I ended up using “ROWS UNBOUNDED PRECEDING”. I could have also used “ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”.
You might want to look over the “OVER Clause” BOL at [http://technet.microsoft.com/en-us/library/ms189461.aspx][1]
[1]: http://technet.microsoft.com/en-us/library/ms189461.aspx
LikeLike
Nakul, Wayne was pointing out that ROWS must be explicitly stated since you are performing a running totals scenario. Without specifying ROWS, then the default is RANGE BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW (e.g. it is Range and not ROW).
LikeLike