#0340 – SQL Server – Myths – ORDER BY can only work on columns fetched in the SELECT clause


We were recently working on an ad-hoc reporting request that involved customizations to an existing report (more specifically, customizations to an existing query). The query was to be embedded into a custom Excel reporting solution.

One of the requirements that we had was to ensure that by default the results appear in a given order. It’s a fairly common request, but what made this one stand out is the fact that we were asked to perform the sort on a column that was not part of the SELECT statement.

The myth

Every result that is processed by Microsoft SQL Server has two elements – a set of records and a set of columns, common to all data represented by the set of records.

The normal understanding of Microsoft SQL Server’s ORDER BY clause is that it works on the “selected” set of records. This is often misinterpreted to mean that it works on the columns selected, i.e. one can only use ORDER BY on columns which are selected in the SELECT clause.

So, what’s the truth and what’s going on behind the scenes.

The test

The following is a simple query that fetches details about some Employees from the HumanResources schema of the AdventureWorks2012 sample database. The only catch is that it is performing an ORDER BY operation on the DepartmentId column, which is not included in the SELECT clause.

USE AdventureWorks2012;
GO
--Notice that DepartmentId used in the ORDER BY is not fetched
--in the results returned by the SELECT clause
SELECT hredh.BusinessEntityID,
hre.JobTitle,
--hredh.DepartmentID,
hrd.Name,
hredh.StartDate,
hredh.EndDate,
hredh.ShiftID,
hrs.Name
FROM HumanResources.Shift AS hrs
INNER JOIN HumanResources.EmployeeDepartmentHistory AS hredh ON hredh.ShiftID = hrs.ShiftID
INNER JOIN HumanResources.Employee AS hre ON hredh.BusinessEntityID = hre.BusinessEntityID
INNER JOIN HumanResources.Department AS hrd ON hredh.DepartmentID = hrd.DepartmentID
ORDER BY hredh.DepartmentID;
GO

Attempting to execute this query does not return any error. The results do not appear to have been returned in any specific order (un-commenting the DepartmentId from the SELECT clause and running the query again will clear out the confusion).

image

Studying the execution plan

So, the question comes – what’s going on in the background that helps Microsoft SQL Server perform an ORDER BY on a column (or a set of columns) which are not even selected?

A quick look at the execution plan reveals the truth – DepartmentId is actually fetched from the underlying tables, and is filtered from the final result once the ORDER BY/Sort operation is complete.

image

image

This is identical to the execution plan produced if we keep the ORDER BY in the SELECT clause.

Conclusion

This proves that when an ORDER BY operation is performed, the columns required to perform the SORT need to be fetched by the database engine. This is irrespective of whether the query needs a particular column to be returned via the SELECT clause.

While the database engine will fetch the column as part of data retrieval, there is no restriction that mandates the presence of a column in the SELECT if a sort/ORDER BY is being done on that column in the query.

A Word of caution

While it is legal to have an implementation as highlighted above, be careful when combining this with the DISTINCT and TOP clauses – because the column(s) on which the ordering/sorting is done are not available, it can result into query results which are difficult to understand.

Until we meet next time,

Be courteous. Drive responsibly.

Advertisements

6 thoughts on “#0340 – SQL Server – Myths – ORDER BY can only work on columns fetched in the SELECT clause

    1. nakulvachhrajani Post author

      Agreed, such situations may warrant a review of the indexes. However, in situations where the usage is not that frequent or when the schema is not controlled by us, adding an index may not be feasible/possible.

      Like

      Reply
  1. Joe Ceko

    The ANSI/ISO Standards put the ORDER BY in a cursor, and not part of a query like T-SQL. The result table has to exists first to be sorted in that tiered of the system.

    It is also a good idea to be able to display the sorting columns for the end users. ,

    Like

    Reply

Let me know what you think about this post by leaving your feedback here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s