Filtered Indexes – Underappreciated features of Microsoft SQL Server


This is part of the series of posts that I am running under the title "Under Appreciated Features of SQL Server". The series has been inspired by Andy Warren’s editorial on SQLServerCentral.com.

(For those new to indexes, please do not proceed any further. Please go through the “Introduction to Indexes” series by Gail Shaw. You can access the series from here.)

Indexes are not new, but Microsoft SQL Server 2008 introduced a new feature called “Filtered Indexes”. A filtered index is an optimized non-clustered index (for obvious reasons, you cannot have a filtered clustered index), especially suited to cover queries that select from a well-defined subset of data. What this means is that if a particular subset of data is frequently fetched from a table (about 80% of the time), then such tables are ideal candidates to have filtered indexes on the specified range of data.

A Brief demo

It looks quite simple, but the major change is that the CREATE INDEX statement allows allows for the use of a filter predicate to index a portion of rows in the table.

If a column contains mostly NULL values, and the queries in the application only fetch the non NULL values, such columns are great candidates to have filtered indexes on. In the AdventureWorks database, bill of materials are generally important for products whose deliveries are already completed. Hence, the application queries to Production.BillOfMaterials are great candidates for use of a filtered index.

Here’s a brief demo on how filtered indexes can be created, and how they can be effective.

First up, we will see how we would conventionally address the problem of efficiently fetching ComponentID and StartDate from the Production.BillOfMaterials table. Use Ctrl+M (or Query->Include Actual Execution Plan) to view the actual Execution Plan of the query batch.

--The below example is a direct take from BOL, with a few minor changes
USE AdventureWorks2008;
GO

--Create a Regular Index on the Production.BillOfMaterials table
IF EXISTS (SELECT name FROM sys.indexes
    WHERE name = N'RIBillOfMaterialsWithEndDate' 
    AND object_id = OBJECT_ID(N'Production.BillOfMaterials'))
DROP INDEX RIBillOfMaterialsWithEndDate
    ON Production.BillOfMaterials;
GO

CREATE NONCLUSTERED INDEX "RIBillOfMaterialsWithEndDate"
    ON Production.BillOfMaterials (ComponentID, StartDate, EndDate)
GO

-- Clear the Procedure cache
DBCC FREEPROCCACHE
GO
--This select will NOT make use of the filtered index
SELECT ComponentID, StartDate FROM Production.BillOfMaterials WHERE EndDate IS NOT NULL

Looking at the execution plan, you can see that SQL Server utilizes the newly created non-clustered index.

image

Notice that we had to include the column – EndDate as part of the columns on the index. However, if the application is only interested in records whose EndDate IS NOT NULL, a better idea would be to create a Filtered non-clustered index.

--The below example is a direct take from BOL, with a few minor changes
USE AdventureWorks2008;
GO

--For a clean test, begin by dropping the "regular index"
IF EXISTS (SELECT name FROM sys.indexes
    WHERE name = N'RIBillOfMaterialsWithEndDate' 
    AND object_id = OBJECT_ID(N'Production.BillOfMaterials'))
DROP INDEX RIBillOfMaterialsWithEndDate
    ON Production.BillOfMaterials;
GO

--Create a Filtered Index on the Production.BillOfMaterials table
IF EXISTS (SELECT name FROM sys.indexes
    WHERE name = N'FIBillOfMaterialsWithEndDate' 
    AND object_id = OBJECT_ID(N'Production.BillOfMaterials'))
DROP INDEX FIBillOfMaterialsWithEndDate
    ON Production.BillOfMaterials;
GO

CREATE NONCLUSTERED INDEX "FIBillOfMaterialsWithEndDate"
    ON Production.BillOfMaterials (ComponentID, StartDate)
    WHERE EndDate IS NOT NULL;
GO

-- Clear the Procedure cache
DBCC FREEPROCCACHE
GO
--This select will make use of the filtered index
SELECT ComponentID, StartDate FROM Production.BillOfMaterials WHERE EndDate IS NOT NULL
GO

Let’s look at the execution plan to see if SQL Server uses the filtered index or not:

image

The Advantages

While highlighting the usage of the two indexes – the regular and filtered ones, a few key differences come out:

Reduced index storage size

For performance reasons, and in order to make the SELECT an index-only operation, we had to put the EndDate on the index itself. This means that the overall row size on the index is higher in case of the regular index as compared to the filtered index. As proof of this, observe that the “Estimated Row Size” on the regular index is 27B, whereas that on the filtered index is just 19B.

While it may not look like a big difference, it is a big one when you are looking at deployment at a shared data center or the cloud – where one is charged per the amount of space used.

If you have multiple such queries, that work on pre-defined filters, it may be a better idea to replace a full-table non-clustered index with multiple filtered indexes. Of course, this depends upon the nature of the queries that the application uses, and upon the nature of the underlying data.

Reduced index maintenance costs

Coming up as a direct result of the above discussion, a filtered index is also cheaper to maintain. Indexes are maintained only when the data within the index is affected by a Data Manipulation Language (DML) operation. Because the data stored in the index is limited, it may be possible that it is not affected by many DML operations that the user might execute on the table. On the other hand, if a filtered index contains only the frequently affected data, the smaller size of the index reduces the cost of updating the index statistics.

Improved Query Performance

Enhancements are not easily accepted unless they give some immediate benefits. A well-designed filtered index improves query performance because it is smaller in size when compared to the full non-clustered index, which also contributes to filtered statistics, which are easier and much more accurate to maintain.

As proof of this, observe the following parameters from the screen-shots above:

  1. Estimated I/O cost
  2. Estimated CPU cost
  3. Estimated operator cost
  4. Estimated subtree cost

In each of these parameters, the filtered index wins hands-on. For further confirmation, we will force the query to use each of these indexes in order:

DBCC FREEPROCCACHE
GO
--This select will make use of the regular index
SELECT ComponentID, StartDate FROM Production.BillOfMaterials WITH (INDEX (RIBillOfMaterialsWithEndDate)) WHERE EndDate IS NOT NULL
GO
DBCC FREEPROCCACHE
GO
--This select will make use of the filtered index
SELECT ComponentID, StartDate FROM Production.BillOfMaterials WITH (INDEX (FIBillOfMaterialsWithEndDate)) WHERE EndDate IS NOT NULL
GO

Here’s the proof!

image

A Word of Caution

Upon looking at BOL, I encountered the following words of caution – “A filtered index can be used as table hint, but will cause the query optimizer to generate error 8622 if it does not cover all of the rows that the query selects.” This is a very important design consideration when thinking about filtered indexes.

Let’s attempt to simulate a situation wherein we attempt to fetch more rows than what are covered by the filtered index.

SELECT ComponentID, StartDate, EndDate
FROM Production.BillOfMaterials
WITH (INDEX (FIBillOfMaterialsWithEndDate))

The following is the output:

Msg 8622, Level 16, State 1, Line 1

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Filtered Indexes v/s Views

Views offer a much wider range of functionality than filtered indexes. Views naturally allow for computed columns, joins, multiple tables and complex predicate logic, which filtered indexes cannot utilize.

This is exactly why we cannot create filtered indexes on views. However, this does not mean that filtered indexes cannot be used when defined on a table in a view. If the query optimizer determines that the results of the query will be correct, it will go ahead and use the filtered index.

More examples on this, and other filtered index design considerations can be found at: http://msdn.microsoft.com/en-us/library/cc280372.aspx

Some food for thought

Because we have filtered indexes, it is logical that we should have filtered statistics as well. If you have written about or are planning to write about filtered statistics, do link it up to this post. I will be sure to post my feedback 🙂

Until we meet next time,

Be courteous. Drive responsibly.

Advertisements

Let me know what you think about this post by leaving your feedback here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s