We were recently working on an ad-hoc reporting request that involved customizations to an existing report (more specifically, customizations to an existing query). The query was to be embedded into a custom Excel reporting solution.
One of the requirements that we had was to ensure that by default the results appear in a given order. It’s a fairly common request, but what made this one stand out is the fact that we were asked to perform the sort on a column that was not part of the SELECT statement.
Every result that is processed by Microsoft SQL Server has two elements – a set of records and a set of columns, common to all data represented by the set of records.
The normal understanding of Microsoft SQL Server’s ORDER BY clause is that it works on the “selected” set of records. This is often misinterpreted to mean that it works on the columns selected, i.e. one can only use ORDER BY on columns which are selected in the SELECT clause.
So, what’s the truth and what’s going on behind the scenes.
The following is a simple query that fetches details about some Employees from the HumanResources schema of the AdventureWorks2012 sample database. The only catch is that it is performing an ORDER BY operation on the DepartmentId column, which is not included in the SELECT clause.
USE AdventureWorks2012; GO --Notice that DepartmentId used in the ORDER BY is not fetched --in the results returned by the SELECT clause SELECT hredh.BusinessEntityID, hre.JobTitle, --hredh.DepartmentID, hrd.Name, hredh.StartDate, hredh.EndDate, hredh.ShiftID, hrs.Name FROM HumanResources.Shift AS hrs INNER JOIN HumanResources.EmployeeDepartmentHistory AS hredh ON hredh.ShiftID = hrs.ShiftID INNER JOIN HumanResources.Employee AS hre ON hredh.BusinessEntityID = hre.BusinessEntityID INNER JOIN HumanResources.Department AS hrd ON hredh.DepartmentID = hrd.DepartmentID ORDER BY hredh.DepartmentID; GO
Attempting to execute this query does not return any error. The results do not appear to have been returned in any specific order (un-commenting the DepartmentId from the SELECT clause and running the query again will clear out the confusion).
Studying the execution plan
So, the question comes – what’s going on in the background that helps Microsoft SQL Server perform an ORDER BY on a column (or a set of columns) which are not even selected?
A quick look at the execution plan reveals the truth – DepartmentId is actually fetched from the underlying tables, and is filtered from the final result once the ORDER BY/Sort operation is complete.
This is identical to the execution plan produced if we keep the ORDER BY in the SELECT clause.
This proves that when an ORDER BY operation is performed, the columns required to perform the SORT need to be fetched by the database engine. This is irrespective of whether the query needs a particular column to be returned via the SELECT clause.
While the database engine will fetch the column as part of data retrieval, there is no restriction that mandates the presence of a column in the SELECT if a sort/ORDER BY is being done on that column in the query.
A Word of caution
While it is legal to have an implementation as highlighted above, be careful when combining this with the DISTINCT and TOP clauses – because the column(s) on which the ordering/sorting is done are not available, it can result into query results which are difficult to understand.
Until we meet next time,