As a part database developer myself, I am sure that what I am about to document today will be useful to almost everyone who has ever written a stored procedure for with indexed tables for SSRS reporting or massive batch processing. What I will be trying today is to find an answer to the dilemma – Should indexes be created before or after the insert to the table?
The moment of index creation impacts table statistics, and therefore should impact performance. The point that got me thinking towards this test is the following line from Books On Line (http://msdn.microsoft.com/en-us/library/ms190397.aspx), Section: “Determining When to Create Statistics”, which quite clearly mentions that: “The query optimizer creates statistics for indexes on tables or views when the index is created.”
The approach that I will be taking today is:
- Create two identical tables in the TempDB
- On one table, indexes will be created before the insert of test data is done
- On the other, I will first insert some data, and then create indexes
- Later, I will fire simple select queries against both, while monitoring the performance via monitoring statistics usage by:
- setting of SET STATISTICS IO ON and,
- Using the SQL Profiler
WARNING: The scripts provided in this post are provided “as-is” and without warranty for understanding purposes only. Using the DBCC scripts provided in the post on your QA or production environments can have serious side effects.
Creating the test environment
In order to create the test environment, I will first create a test table, create the necessary indexes and finally add some test data to it.
USE tempdb
GO
/*****************************************************************************************************************
STEP 01: Create a table, complete with a primary key (to create a clustered index), and a non-clustered index
*****************************************************************************************************************/
USE tempdb
GO
CREATE TABLE DontUpdateStatistics
( StatId INT IDENTITY(1,1),
RandomId INT DEFAULT (RAND()*1000),
StatName VARCHAR(10) DEFAULT 'DEFAULT',
CONSTRAINT PK_DontUpdateStatistics_StatId PRIMARY KEY CLUSTERED
(
StatId
)
)
GO
CREATE NONCLUSTERED INDEX NC_DontUpdateStatistics_RandomId ON DontUpdateStatistics(RandomId)
GO
/*****************************************************************************************************************
STEP 03: Insert some test data into the table
*****************************************************************************************************************/
USE tempdb
GO
INSERT INTO DontUpdateStatistics DEFAULT VALUES
GO 100000
Next, I will create another table. However indexes will be created after test data is inserted.
USE tempdb
GO
/*****************************************************************************************************************
STEP 04: Create a table - this time, do not create any indexes
*****************************************************************************************************************/
CREATE TABLE UpdateStatistics
( StatId INT IDENTITY(1,1),
RandomId INT DEFAULT (RAND()*1000),
StatName VARCHAR(10) DEFAULT 'DEFAULT'
)
GO
/*****************************************************************************************************************
STEP 05: Insert some test data into the table
*****************************************************************************************************************/
USE tempdb
GO
INSERT INTO UpdateStatistics DEFAULT VALUES
GO 100000
/*****************************************************************************************************************
STEP 06: Now create the primary key clustered index and the non-clustered index
*****************************************************************************************************************/
ALTER TABLE UpdateStatistics
ADD CONSTRAINT PK_UpdateStatistics_StatId PRIMARY KEY CLUSTERED
(
StatId
)
GO
CREATE NONCLUSTERED INDEX NC_UpdateStatistics_RandomId ON UpdateStatistics(RandomId)
GO
Run the Test
The first thing I will do is to check if statistics for both tables are updated or not.
Confirm statistics updates
/*****************************************************************************************************************
STEP 07: See if the statisics on the table in tempDB were updated
Script source: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
: by Pinal Dave
*****************************************************************************************************************/
USE tempdb
GO
SELECT name AS index_name,
STATS_DATE(OBJECT_ID, index_id) AS StatsUpdated
FROM sys.indexes
WHERE OBJECT_ID = OBJECT_ID('DontUpdateStatistics') OR
OBJECT_ID = OBJECT_ID('UpdateStatistics')
GO
As can be seen, the statistics for Table #1 (“DontUpdateStatistics”) did not update:

The Test Queries
Next, allow me to present a set of SELECT queries that would fetch data from the tables of interest.
USE tempdb
GO
/*****************************************************************************************************************
STEP 08 A: Clear out the Buffers and the procedure cache.
: Make sure to run a CHECKPOINT!
Reference source:
http://beyondrelational.com/blogs/nakul/archive/2011/10/03/dbcc-dropcleanbuffers-needs-to-be-preceded-by-a-checkpoint-to-remove-all-buffers.aspx
*****************************************************************************************************************/
CHECKPOINT
GO
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
/*****************************************************************************************************************
STEP 08 B: Turn STATISTICS IO ON, and select some data from the DontUpdateStatistics table
*****************************************************************************************************************/
SET STATISTICS IO ON
SELECT * FROM DontUpdateStatistics WHERE StatId BETWEEN 10 AND 1000
AND RandomId > 300
GO
SET STATISTICS IO OFF
/*****************************************************************************************************************
STEP 08 C: Clear out the Buffers and the procedure cache.
: Make sure to run a CHECKPOINT!
Reference source:
http://beyondrelational.com/blogs/nakul/archive/2011/10/03/dbcc-dropcleanbuffers-needs-to-be-preceded-by-a-checkpoint-to-remove-all-buffers.aspx
*****************************************************************************************************************/
CHECKPOINT
GO
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO
/*****************************************************************************************************************
STEP 08 D: Turn STATISTICS IO ON, and select some data from the UpdateStatistics table
*****************************************************************************************************************/
SET STATISTICS IO ON
SELECT * FROM UpdateStatistics WHERE StatId BETWEEN 10 AND 1000
AND RandomId > 300
GO
SET STATISTICS IO OFF
Configure the SQL Profiler
To use the SQL Server Profiler in order to help me back my case, I connected to my SQL Server instance, and selected only one event:
Event Group: Errors & Warnings
Event Class: Missing Column Statistics

The Result
With the SQL Profiler running, I turned on the Actual Execution Plan in SSMS (by pressing Ctrl+M) and then executed the test query. The result was as shown below:
| The Execution Plan |
|
| Statistics Output |
|
| Profiler Output |
|
Confirm statistics updates
/*****************************************************************************************************************
STEP 09: See if the statisics on the table in tempDB were updated
Script source: http://blog.sqlauthority.com/2010/01/25/sql-server-find-statistics-update-date-update-statistics/
: by Pinal Dave
*****************************************************************************************************************/
USE tempdb
GO
SELECT name AS index_name,
STATS_DATE(OBJECT_ID, index_id) AS StatsUpdated
FROM sys.indexes
WHERE OBJECT_ID = OBJECT_ID('DontUpdateStatistics') OR
OBJECT_ID = OBJECT_ID('UpdateStatistics')
GO
As can be seen, the statistics for Table #1 (“DontUpdateStatistics”) are now updated. The statistics for Table #2 (“UpdateStatistics”) are unchanged:

Post-Mortem
First of all, I would like to draw attention to a line mentioned in the Profiler’s description of the “Missing Column statistics” event class. The line states – “You will not see this event produced unless the option to auto-compute statistics is turned off”. For the TempDB, auto-compute statistics is ON, so that’s not the root cause to this issue.

If this is not the issue, then what is? Summarizing the results, I can derive the following conclusions:
- Both queries appear to have executed with equal performance impact – with a 50% split across them according to the execution plan
- However, looking at the statistics IO output, I can see that:
- Read-Ahead Reads for table #1 (“DontUpdateStatistics”) = 16
- Read-Ahead Reads for table #2 (“UpdateStatistics”) = 4
- The query against Table #1 (“DontUpdateStatistics”) is requires 4 times more reads when compared to the query against query #2
- The SQL Profiler warns me that column statistics are not available, and therefore, a sub-optimal plan can be chosen by the optimizer
Revisiting Books On Line, (http://msdn.microsoft.com/en-us/library/ms190397.aspx), the section: “Determining When to Create Statistics”, clearly mentions that: The query optimizer creates statistics for indexes on tables or views when the index is created.
More precisely, statistics, if missing, are created when a select is performed on the columns for which missing statistics were found.
In our case, we created the indexes on table #2 (“UpdateStatistics”) after data was inserted. Because creating indexes involves selecting data from the table, a statistics update is performed. For table #1 (“DontUpdateStatistics”) though, the indexes were created before inserting the data. When the query #1 was executed, the statistics are generated, and then the optimization happens – which makes the query a little bit slower as it spends more time on the read-ahead reads.
If we look up on Books On Line for STATISTICS IO (http://msdn.microsoft.com/en-us/library/ms184361.aspx), we can see that the read-ahead read count gives us the number of pages placed into the cache. In our case, all pages for table #1 have to be placed in the cache for the statistics update.
Conclusion
It’s been a rather long post, with a simple conclusion. It is important to think about when index creation should be performed because using the proper sequence is important from the performance and IO perspective.
Until we meet next time,
Be courteous. Drive responsibly.