Knowing the internals of how a system works has always fascinated me. Although a bit theoretical, knowing about query evaluation is a very exciting topic and I attempt to learn something new about it every now and then.
One of my seniors had a mantra which I remember till date – when evaluating a logical “AND” condition, the costliest condition should always be the last one to be evaluated. Today, I will try to demonstrate why this is a good practice.
The evaluation of the logical AND is actually quite simple. A set of checks evaluate to FALSE even if one of the checks returns FALSE. This simple rule means that the AND evaluation must be a “short-circuit” operation, i.e. as soon as a condition is encountered which determines the end result, all checks beyond that point can be ignored.
The test
The test below demonstrates this concept. What I have in the script is a simple query that returns results based on the evaluation of two conditions – one of which is bound to raise an exception.
USE AdventureWorks2008R2; GO BEGIN TRY SELECT * FROM HumanResources.Employee WHERE (1 = 0) --This one is false making the entire AND condition FALSE AND (1/0 = 0); --This one results in an exception PRINT 'Execution was successful.'; END TRY BEGIN CATCH PRINT 'All conditions were evaluated resulting in an exception!' + CHAR(10) + 'Error Message : ' + ERROR_MESSAGE() + CHAR(10) + 'Error Number : ' + CAST(ERROR_NUMBER() AS VARCHAR(10)); END CATCH GO /********************* RESULTS *********************/ /* (0 row(s) affected) Execution was successful. */
In this first query, the first condition itself evaluates to FALSE. Hence, there is no reason for the evaluation of the logical AND to continue. The query therefore returns no results. Had both conditions been evaluated, we would have received an exception.
USE AdventureWorks2008R2; GO BEGIN TRY SELECT * FROM HumanResources.Employee WHERE (1 = 1) --This one is TRUE, --because of which the next condition will be evaluated AND (1/0 = 0); --This one results in an exception PRINT 'Execution was successful.'; END TRY BEGIN CATCH PRINT 'All conditions were evaluated resulting in an exception!' + CHAR(10) + 'Error Message : ' + ERROR_MESSAGE() + CHAR(10) + 'Error Number : ' + CAST(ERROR_NUMBER() AS VARCHAR(10)); END CATCH GO /********************* RESULTS *********************/ /* (0 row(s) affected) All conditions were evaluated resulting in an exception! Error Message : Divide by zero error encountered. Error Number : 8134 */
Because the first condition retuned a TRUE in the query above, it returned an exception because the evaluation of the logical AND operation had to evaluate both conditions.
Conclusion
This simple test proves that the logical AND operation is a “short-circuit” operation. Therefore, the costliest condition (maybe a sub-query) in a set to be evaluated with a logical AND must be placed towards the end of the sequence to prevent un-necessary evaluations. Only if the preceding conditions are TRUE, will the more complex conditions be evaluated under this logic.
Until we meet next time,
When I am swapping conditions in 1st query, still it is give same result.
SELECT *
FROM HumanResources.Employee
WHERE (1/0 = 0)
AND (1 = 0);
PRINT ‘Execution was successful.’;
LikeLike
@harsh: Check the execution plans (estimated plan will also be fine):
USE AdventureWorks2008R2;
GO
SELECT *
FROM HumanResources.Employee
WHERE (1 = 0) AND (1/0 = 0);
You will find that SQL Server does a constant scan – it does not even go to the table (this can be seen from the Actual Execution plan also).
However, in the following case, SQL Server attempts to evaluate the entire query:
USE AdventureWorks2008R2;
GO
SELECT *
FROM HumanResources.Employee
WHERE (1 = 1) AND (1/0 = 0);
Considering this, let’s attempt to run a batch of 2 queries – one where the short-circuit should prevent evaluation of a complex sub-query, and other in which it would be executed:
USE AdventureWorks2008R2;
GO
SELECT *
FROM HumanResources.Employee
WHERE (1 = 0) AND ((SELECT EmailPromotion
FROM Person.Person
WHERE Person.BusinessEntityID = Employee.BusinessEntityID) = 1);
GO
SELECT *
FROM HumanResources.Employee
WHERE (1 = 1) AND ((SELECT EmailPromotion
FROM Person.Person
WHERE Person.BusinessEntityID = Employee.BusinessEntityID) = 1);
GO
You can try executing these queries and validate the execution plan & performance to understand the importance of the placement.
LikeLike
It is not guaranteed that placement of conditions affects result and/or performance.
1)
SELECT *
FROM HumanResources.Employee
WHERE (1 = 0) AND (1/0 = 0);
SELECT *
FROM HumanResources.Employee
WHERE (1/0 = 0) AND (1 = 0);
Here, Result is same and execution plan is same.
2)
SELECT *
FROM HumanResources.Employee
WHERE (1 = 0) AND ((SELECT EmailPromotion
FROM Person.Person
WHERE Person.BusinessEntityID = Employee.BusinessEntityID) = 1);
SELECT *
FROM HumanResources.Employee
WHERE ((SELECT EmailPromotion
FROM Person.Person
WHERE Person.BusinessEntityID = Employee.BusinessEntityID) = 1)
AND (1 = 0)
Again, result is same and execution plan also. Even, 1=0 condition is written second, SQL Server performs constant scan first and don’t scan table.
3)
SELECT *
FROM HumanResources.Employee
WHERE (BusinessEntityID = 0) AND ((SELECT EmailPromotion
FROM Person.Person
WHERE Person.BusinessEntityID = Employee.BusinessEntityID) = 1);
SELECT *
FROM HumanResources.Employee
WHERE ((SELECT EmailPromotion
FROM Person.Person
WHERE Person.BusinessEntityID = Employee.BusinessEntityID) = 1)
AND (BusinessEntityID = 0)
(Note: I have not checked this queries with Actual AdentureWorks DB.I have assumed no records will have BusinessEntityID =0 )
Here, even if first criteria (BusinessEntityID =0) evaluate to FALSE , still it perform 2nd criteria. And also, this time its not constant scan.
4)
SELECT * FROM TableName WHERE ISNUMERIC(ColumnName)=1 AND CAST(ColumnName AS DECIMAL)>0
SELECT * FROM TableName WHERE CAST(ColumnName AS DECIMAL)>0 AND ISNUMERIC(ColumnName)=1
Here, it actually affects result.
If column has any non-numeric data, then 2nd query will throw error.
LikeLike
@harsh: I guess we are both saying the same thing, but in different ways.
Because the logical AND is a short-circuit operator, in most cases one can gain some benefit by writing a query such that the database engine can easily determine whether to continue the evaluation of all the expressions in the AND or not. This is why I propose that the most complex query, which may not be the most decisive may be placed towards the end.
Of course, these recommendations depend upon the nature of data in a system and the design of the quries (for example, if all conditions are simple comparisions, one may not gain at all. But, if sub-queries are involved, a gain may be seen if the preceeding conditions tend to evaluate to “false” often).
LikeLike