Indexes are not new, but Microsoft SQL Server 2008 introduced a new feature called “Filtered Indexes”. A filtered index is an optimized non-clustered index (for obvious reasons, you cannot have a filtered clustered index), especially suited to cover queries that select from a well-defined subset of data. What this means is that if a particular subset of data is frequently fetched from a table (about 80% of the time), then such tables are ideal candidates to have filtered indexes on the specified range of data.
A Brief demo
It looks quite simple, but the major change is that the CREATE INDEX statement allows allows for the use of a filter predicate to index a portion of rows in the table.
If a column contains mostly NULL values, and the queries in the application only fetch the non NULL values, such columns are great candidates to have filtered indexes on. In the AdventureWorks database, bill of materials are generally important for products whose deliveries are already completed. Hence, the application queries to Production.BillOfMaterials are great candidates for use of a filtered index.
Here’s a brief demo on how filtered indexes can be created, and how they can be effective.
First up, we will see how we would conventionally address the problem of efficiently fetching ComponentID and StartDate from the Production.BillOfMaterials table. Use Ctrl+M (or Query->Include Actual Execution Plan) to view the actual Execution Plan of the query batch.
--The below example is a direct take from BOL, with a few minor changes USE AdventureWorks2008; GO --Create a Regular Index on the Production.BillOfMaterials table IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'RIBillOfMaterialsWithEndDate' AND object_id = OBJECT_ID(N'Production.BillOfMaterials')) DROP INDEX RIBillOfMaterialsWithEndDate ON Production.BillOfMaterials; GO CREATE NONCLUSTERED INDEX "RIBillOfMaterialsWithEndDate" ON Production.BillOfMaterials (ComponentID, StartDate, EndDate) GO -- Clear the Procedure cache DBCC FREEPROCCACHE GO --This select will NOT make use of the filtered index SELECT ComponentID, StartDate FROM Production.BillOfMaterials WHERE EndDate IS NOT NULL
Looking at the execution plan, you can see that SQL Server utilizes the newly created non-clustered index.
Notice that we had to include the column – EndDate as part of the columns on the index. However, if the application is only interested in records whose EndDate IS NOT NULL, a better idea would be to create a Filtered non-clustered index.
--The below example is a direct take from BOL, with a few minor changes USE AdventureWorks2008; GO --For a clean test, begin by dropping the "regular index" IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'RIBillOfMaterialsWithEndDate' AND object_id = OBJECT_ID(N'Production.BillOfMaterials')) DROP INDEX RIBillOfMaterialsWithEndDate ON Production.BillOfMaterials; GO --Create a Filtered Index on the Production.BillOfMaterials table IF EXISTS (SELECT name FROM sys.indexes WHERE name = N'FIBillOfMaterialsWithEndDate' AND object_id = OBJECT_ID(N'Production.BillOfMaterials')) DROP INDEX FIBillOfMaterialsWithEndDate ON Production.BillOfMaterials; GO CREATE NONCLUSTERED INDEX "FIBillOfMaterialsWithEndDate" ON Production.BillOfMaterials (ComponentID, StartDate) WHERE EndDate IS NOT NULL; GO -- Clear the Procedure cache DBCC FREEPROCCACHE GO --This select will make use of the filtered index SELECT ComponentID, StartDate FROM Production.BillOfMaterials WHERE EndDate IS NOT NULL GO
Let’s look at the execution plan to see if SQL Server uses the filtered index or not:
While highlighting the usage of the two indexes – the regular and filtered ones, a few key differences come out:
Reduced index storage size
For performance reasons, and in order to make the SELECT an index-only operation, we had to put the EndDate on the index itself. This means that the overall row size on the index is higher in case of the regular index as compared to the filtered index. As proof of this, observe that the “Estimated Row Size” on the regular index is 27B, whereas that on the filtered index is just 19B.
While it may not look like a big difference, it is a big one when you are looking at deployment at a shared data center or the cloud – where one is charged per the amount of space used.
If you have multiple such queries, that work on pre-defined filters, it may be a better idea to replace a full-table non-clustered index with multiple filtered indexes. Of course, this depends upon the nature of the queries that the application uses, and upon the nature of the underlying data.
Reduced index maintenance costs
Coming up as a direct result of the above discussion, a filtered index is also cheaper to maintain. Indexes are maintained only when the data within the index is affected by a Data Manipulation Language (DML) operation. Because the data stored in the index is limited, it may be possible that it is not affected by many DML operations that the user might execute on the table. On the other hand, if a filtered index contains only the frequently affected data, the smaller size of the index reduces the cost of updating the index statistics.
Improved Query Performance
Enhancements are not easily accepted unless they give some immediate benefits. A well-designed filtered index improves query performance because it is smaller in size when compared to the full non-clustered index, which also contributes to filtered statistics, which are easier and much more accurate to maintain.
As proof of this, observe the following parameters from the screen-shots above:
- Estimated I/O cost
- Estimated CPU cost
- Estimated operator cost
- Estimated subtree cost
In each of these parameters, the filtered index wins hands-on. For further confirmation, we will force the query to use each of these indexes in order:
DBCC FREEPROCCACHE GO --This select will make use of the regular index SELECT ComponentID, StartDate FROM Production.BillOfMaterials WITH (INDEX (RIBillOfMaterialsWithEndDate)) WHERE EndDate IS NOT NULL GO DBCC FREEPROCCACHE GO --This select will make use of the filtered index SELECT ComponentID, StartDate FROM Production.BillOfMaterials WITH (INDEX (FIBillOfMaterialsWithEndDate)) WHERE EndDate IS NOT NULL GO
Here’s the proof!
A Word of Caution
Upon looking at BOL, I encountered the following words of caution – “A filtered index can be used as table hint, but will cause the query optimizer to generate error 8622 if it does not cover all of the rows that the query selects.” This is a very important design consideration when thinking about filtered indexes.
Let’s attempt to simulate a situation wherein we attempt to fetch more rows than what are covered by the filtered index.
SELECT ComponentID, StartDate, EndDate FROM Production.BillOfMaterials WITH (INDEX (FIBillOfMaterialsWithEndDate))
The following is the output:
Msg 8622, Level 16, State 1, Line 1
Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.
Filtered Indexes v/s Views
Views offer a much wider range of functionality than filtered indexes. Views naturally allow for computed columns, joins, multiple tables and complex predicate logic, which filtered indexes cannot utilize.
This is exactly why we cannot create filtered indexes on views. However, this does not mean that filtered indexes cannot be used when defined on a table in a view. If the query optimizer determines that the results of the query will be correct, it will go ahead and use the filtered index.
More examples on this, and other filtered index design considerations can be found at: http://msdn.microsoft.com/en-us/library/cc280372.aspx
Some food for thought
Because we have filtered indexes, it is logical that we should have filtered statistics as well. If you have written about or are planning to write about filtered statistics, do link it up to this post. I will be sure to post my feedback 🙂
Until we meet next time,
Be courteous. Drive responsibly.