Category Archives: #SQLServer

All about Microsoft SQL Server

#0142 – SQL Server – Profiler – Part 09 – Introducing the SQLDiag utility to capture SQL and Performance traces

Permissions

Under it’s default mode, because SQLDiag collects SQLTrace information, it needs to be run under a user context that has sysadmin access to the SQL Server instance being monitored. Also, because it collects windows performance & event log information, the user must have Administrator access to the server being monitored.

SQLDiag can also be run under a “Generic” mode, with a /G command line argument. Under the /G switch, SQLDiag does not enforce SQL Server connectivity constraints or verify whether a user can capture the required windows logs. These checks are deferred to the SQL Server and the Windows Operating System respectively.

Please note that by default, SQLDiag connects to the SQL Server instance using Windows authentication.

Location

By default, the SQLDiag utility is located at the following paths:

SQL Server 2008: ..Microsoft SQL Server100ToolsBinnSQLDiag.exe
SQL Server 2012: ..Microsoft SQL Server110ToolsBinnSQLDiag.exe

Configuring the SQLDiag

SQLDiag collects SQL Trace and windows logs by using a configuration file as input. This configuration file is an XML file, with a default name of – SQLDiag.xml. This XML is not present by default in a clean installation of SQL Server. To generate the XML for the first time, simply run SQLDiag.exe on the command prompt under a user who has sysadmin and system administrative privileges. As soon as execution starts, press Ctrl+C to stop the execution. The SQLDiag.xml would have been generated in the SQLDiag’s launch folder.

Along with the SQLDiag.xml, associated XSD is also generated that would help in validating the changes to the xml file using an XML editor such as the SSMS.

The SQLDiag.xml is created under a default configuration. In most cases, this file requires to be edited so that the user can configure the SQL trace and the windows performance log capture.

Manually Editing the SQLDiag.xml file

To edit the SQLDiag.xml, I would always recommend to use an XML editor (the fastest option being to use the SSMS). Allow me to walk-you through the XML structure.

Configuring the machine and instance names

The first thing that needs to be done is to configure the file to point to the correct machine and associated SQL Server instance names. Under the “Machines” node, one can find the “Machine” and the “Instances” nodes. This indicates that SQLDiag can be used to capture:

Windows performance data from more than one machine at the same time
SQL Trace data from more than one SQL Server instance on each of these machines at the same time

By default only one “Machine” node exists, pointing to the local server. Similarly, only one “Instance” node exists, pointing to the default local instance of SQL Server, the version for which is indicated by the “ssver” attribute.

Word of caution: Because the SQLDiag has the potential to capture a large number of data points, please keep an eye out for IO, memory & CPU pressure. An incorrectly configured SQLDiag file can do more harm than good.

Observe that under the instance node, one can not only change the SQL Server instance that is to be monitored, but also whether or not to use Windows authentication, what the SQL Server version of the instance is and the user name in case the user is using SQL authentication.

Configuring the SQL Trace options

Now that I have specified the machine name and the instance name which needs to be monitored, it is time to configure the SQL Trace options. This will be similar to the configuration I did for the server-side tracing (Part 07) – the only difference is that for the standard server side tracing we used T-SQL whereas here we will be using XML. I had to specify the event class, event itself and the columns that needed to be captured. In the T-SQL approach, remembering the enumerated values for each event class and column names was a headache. Here, things are a little better – I already have a list of all event classes & events that are available and I only need to choose whether or not to capture this information. All columns available to an event are captured, because they can be filtered later on by the user.

Under the “Instances” node, expand the “Collectors” node. This would show us the following:

SqlDiagCollector = indicates whether the SqlDiagCollector should collect information based on the below mentioned configuration or not
BlockingCollector = indicates whether SqlDiag should collect blocking related information or not
ProfilerCollector = indicates whether SqlDiag should collect Profiler trace information or not

   1:  <Collectors>

   2:      <SqldiagCollector enabled="true" startup="false" shutdown="true" />

   3:      <BlockingCollector enabled="false" pollinginterval="5" maxfilesize="350" filecount="1"/>

   4:      <ProfilerCollector enabled="false" template="_GeneralPerformance90.xml" pollinginterval="5" maxfilesize="350" filecount="1">

   5:          <Events>

   6:              <!--Event Classes here-->

   7:              <EventType name="Deprecation" enabled="false">

   8:                  <Event id="125" name="Deprecation Announcement"  enabled="false" description="Occurs when you use a feature that will be removed from a future version of SQL Server." />

   9:                  <Event id="126" name="Deprecation Final Support"  enabled="false" description="Occurs when you use a feature that will be removed from the next major release of SQL Server." />

  10:              </EventType>

  11:              <!--Event Classes here-->

  12:          </Events>

  13:      </ProfilerCollector>

  14:  </Collectors>

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

To enable/disable a particular trace (e.g. ProfilerCollector) and/or a particular event class/event, simply change the “enabled="false"” to “enabled="true"”. Do ensure that if you want to trace a Profiler event, the ProfilerCollector, the associated Event Class and the associated Event should all have “enabled=”true”” set appropriately.

Configuring the Event Log collector

SQLDiag has the ability to monitor the Windows event log and capture appropriate data from it accordingly. To configure Event Log monitoring and collection, expand the “Machine Collectors” node.

   1:            <EventlogCollector enabled="false" startup="false" shutdown="true" >

   2:              <Eventlogs>

   3:                <EventlogType name="Application" enabled="true"/>

   4:                <EventlogType name="Security" enabled="true"/>

   5:                <EventlogType name="System" enabled="true"/>

   6:              </Eventlogs>

   7:            </EventlogCollector>

As mentioned above, changing the “enabled=”true”” to “enabled=”false”” will stop the collection of these event logs.

Configuring the PerfMon Collector

Expanding the “PerfmonCollector” node gives us the “PerfmonCounters” node, which contains a listing of every available performance counter. Working with the “enabled” attribute allows us to control whether or not a particular counter is to be monitored or not.

Do note that this is a list of all counters available under “PerfMon”, had I launched it externally (as I did in Part 06 of the tutorial).

Here’s a snippet:

   1:  <PerfmonCollector enabled="false" pollinginterval="5" maxfilesize="256">

   2:      <PerfmonCounters>

   3:          <!--PerfmonObject nodes here-->

   4:          <PerfmonObject name="MSSQL$%s:Buffer Partition(*)" enabled="true">

   5:              <PerfmonCounter name="*" enabled="true" />

   6:              <PerfmonCounter name="Free pages" enabled="true" />

   7:              <PerfmonCounter name="Free list requests/sec" enabled="true" />

   8:              <PerfmonCounter name="Free list empty/sec" enabled="true" />

   9:          </PerfmonObject>

  10:          <!--PerfmonObject nodes here-->

  11:      </PerfmonCounters>

  12:  </PerfmonCollector>

Once the file has been edited, we are ready to begin execution of the SqlDiag utility, however, we will see how to run SqlDiag in the next part of the tutorial.

Editing the SQLDiag.xml file through a GUI

Manually editing the SQLDiag.xml is a very tedious and error-prone task. The SQL Diag Configuration Tool (SDCT) from Codeplex is an intuitive, easy to use tool that will read existing sqldiag.xml configuration files, allow you to select your desired options and save those options to a new file.

The only limitation of this tool is that because SQL 2012 has not been released at the time of writing this blog, the tool does not support the SQLDiag file for SQL2012. (Because CodePlex is an open source community, those interested in open source development can take it up as an action item to work upon).

Welcome Screen Simply choose the version which the SqlDiag.xml needs to work with.
Authentication page Choose the machine name and SQL Server instance name to work with.
Machine Wide Diagnostics Choose whether or not Event Log information is to be captured
Configure instance specific settings
Profiler Options Configure the SQL trace. Notice that the system provides 3 templates and allows to configure: 1. Maximum file size 2. Polling Interval 3. Maximum number of files required
Perfmon Options Configure the Perfmon counters that are required to be captured. Again, we have 1 pre-defined template to help us get started.
Enable Producers
Enable Analyzers

After configuring all options using this simple GUI based tool, simply click on “Save” to save a fully-configured copy of the SqlDiag.xml file.

In the next part of the tutorial:

The next part of the tutorial is the final part in the “Profiler” series. We will see how to execute the SQLDiag utility and how to consume it’s output with the SQL Profiler. In addition, I will also be providing some more tools that improve upon the functionality and flexibility provided by SQLDiag.

References:

SQLDiag Utility: http://msdn.microsoft.com/en-us/library/ms162833.aspx
SQLDiag Configuration Tool: http://sdct.codeplex.com/
Adapting SQLDiag.xml for custom logging: http://blogs.msdn.com/b/deviations/archive/2009/01/20/adapting-sqldiag-to-your-needs.aspx

Until we meet next time,

Be courteous. Drive responsibly.

Subscribe to my posts via E-mail: Subscribe here | Read my posts via your favourite RSS reader: Click Here! | Tweet to @nakulv_sql

#0141 – SQL Server – Profiler – Part 08 – Run a SQL Trace via T-SQL code (Part B)

How to use the graphical Profiler to generate a T-SQL script for a SQL trace

As I was writing my previous post, I realized that manually identifying and writing down the integer representations of the events and data columns to capture is a labour intensive activity. I generally do not write the T-SQL scripts myself. I leverage the graphical Profiler to generate the script for me. Here’s how I generate the script using the Profiler:

Connect to any instance of SQL Server and identify the events and data-columns to be captured (because we will only be using it to generate the T-SQL script for the trace, it does not matter what server we connect to)
Identify any filters that need to be applied
Start and quickly stop the trace
Go to File –> Export –> Script Trace Definition and choose the version for which the script is to be generated
Save the script to a location of your choice

Here is a snapshot of the T-SQL script definition that was generated (actual code has not been provided intentionally):

As you can see, we can find usages of the following system stored procedures within the script:

Customizing this trace script and run a SQL trace

The generated T-SQL trace script file is not ready to be consumed directly. Here are a couple of things that we need to change before we can go ahead and start tracing:

Provide a path to store the trace files. As mentioned in the trace file, the text “InsertFileNameHere” needs to be replaced by the actual path and file name (without the .trc extension, which the SQL Server will automatically add for us)
Check the value of @maxfilesize

For this demo, I will reduce the max. file size to 1MB for each file (using such a low value in your production use might result in performance issues)

Add (if required) the @stoptime parameter (by default, this is set to NULL, which would cause the trace to continue until the trace is manually stopped or until the server shuts down)

The changes that I have made to my trace are therefore:

declare @stoptime datetime
set @maxfilesize = 1
set @stoptime = '2012-03-07 19:40:00.000'

-- Please replace the text InsertFileNameHere, with an appropriate
-- filename prefixed by a path, e.g., c:MyFolderMyTrace. The .trc extension
-- will be appended to the filename automatically. If you are writing from
-- remote server to local drive, please use UNC path and make sure server has
-- write access to your network share

exec @rc = sp_trace_create @TraceID output, 2, N'C:BRTraceBRT', @maxfilesize, @stoptime

To start the trace, we simply need to run the entire script by hitting the “Execute” button on the Query Editor bar in SSMS. To generate a load to capture by the trace, I would simply run the script I used in one of my previous posts.

Open the captured trace using the graphical Profiler

Once the trace completes at the specified time, you can see that the trace files are now available for analysis at the mentioned path. These files are regular .trc files and can be opened using the SQL Profiler for offline analysis, thereby leveraging the power of graphical tools for analysis and pure T-SQL for data capture with minimum overhead.

Explore the option to schedule traces

Traces are generally used for troubleshooting performance issues on a SQL Server instance. Performance issues may crop up as part of a nightly job, a ETL (Extract-Transfer-Load) operation during a BI process or during a process where it may be desirable to schedule traces for analysis the next day or so.

We have two options for scheduling traces:

Specify a stop-time (refer above on how to specify a stop-time)
Schedule traces using the SQL Server Agent, which would simply involve:

Starting the trace using the SQL Server Agent jobs
Specifying a stop time in the T-SQL trace definition and using it in the system stored procedure sp_trace_create
Using the sp_trace_status system stored procedure to stop the trace manually

Exercises:

Convert your most used Profiler trace template into a T-SQL script definition by hand
Convert your most used Profiler trace template into a T-SQL script definition using the Profiler. Check for differences between the one by hand and the one generated by the Profiler. Why are they different?
Customize the T-SQL script definition generated by either of the approaches to have:

Max. file size of 7MB each
Enable rollover and shutting down of the server in case of an error
Limit to 5 roll-over files

Run the customized template and capture the trace for a test workload
Analyze the workload using the Profiler
Schedule a trace using the SQL Server Agent and the T-SQL script definition files

References/Further reading:

Until we meet next time,

Be courteous. Drive responsibly.

Subscribe to my posts via E-mail: Subscribe here | Read my posts via your favourite RSS reader: Click Here! | Tweet to @nakulv_sql

#0140 – SQL Server – Profiler – Part 07 – Run a SQL Trace via T-SQL code (Part A)

2 Replies

For those who have been following my posts, you would recollect that I had run a 6-part series on SQL Profiler tutorials some time ago. Based on the feedback received, I am now presenting a 4-part extension to the series. In this part, I will describe how the capturing of a SQL Trace can be done in an automated fashion, from the server-side instead of from the client side.

It is quite logical that using the graphical interface for the SQL Profiler will generate a performance overhead on the SQL Server. The primary reasons for this behaviour are:

All events and trace data needs to be brought over to the tracing client application – this introduces an overhead on the network
Graphically representing this data introduces a performance overhead in making room for new trace data flowing in from the network

The ideal solution to this is to have as concise and precise trace as possible and to run the trace on the SQL Server.

T-SQL Support for SQL Traces

SQL Server comes with 5 system stored procedures that allow us to monitor performance and activity on the server using T-SQL code on the server itself and also eliminating the need for using graphical tools (Profiler) for data capture, thereby improving overall performance. We will be looking at the 4 most commonly used system stored procedures that allow us to write a SQL Trace using T-SQL.

Creating the trace – sp_trace_create

The first step is, of course, to create a SQL Trace. Once the trace is created, then we can go in and configure the trace, add events to monitor and columns to capture and finally, run the trace. The system stored procedure to create a new SQL Trace is “sp_trace_create”. It accepts the following parameters:

@trace_id
- This is the only OUTPUT parameter, which contains the trace_id assigned to the newly created SQL Trace
@options
- Allows the user to control whether the system should rollover to a new target storage file when the given maximum size is reached, or whether to shut-down the SQL Server in case a trace file cannot be written to
@tracefile
- This parameter specifies the local or UNC path of the file to which the trace output should be written to
@maxfilesize
- Allows the user to set the maximum size to which a trace file will grow before a roll-over or an overwrite happens
@stoptime

Specifies the date and time the trace collection will be stopped

@filecount
- Allows the user to control the maximum number of rollover files, after which an overwrite should happen

An example call to the stored procedure sp_trace_create - is shown below:

DECLARE @rc INT
DECLARE @TraceID INT
--Create the SQL Trace
EXEC @rc = sp_trace_create @TraceID output, 0, N'InsertFileNameHere', @maxfilesize, NULL 
SELECT @rc, @TraceID

Words of caution:

Avoid using trace target file names that contain underscores (“_”) in their names. This is because the rollover mechanism versions files using the underscore (“_”)
- Using underscores (“_”) in the file name will cause rollover mechanism to fail
This stored procedure only defines a trace. It does not start, stop or modify an existing trace
The parameters of this stored procedure are strongly typed. The stored procedure will error out if called with an incorrect combination of input parameter data types
If NULL is specified for @stoptime, the trace runs until it is manually stopped or until the server shuts down

Configuring the trace – sp_trace_setevent

Now that we have the trace defined, it is time for us to configure the trace by setting the events that need to be monitored and the columns that need to be captured. As is the case with the graphical tool (Profiler), the events/columns configurations can be done only on traces that are currently stopped.

As expected, the stored procedure sp_trace_setevent expects the following 4 parameters:

@trace_id
- The @trace_id returned by sp_trace_create
@eventid
- The integer representation of the particular event to capture
@columnid
- The integer representation of the particular data column to capture
@on
- Bit flag indicating whether to capture this event/column or not (corresponds to the check-box on the graphical tool (Profiler))

What this means is that if we are capturing 10 data columns for 1 event, we would need to write 10 sp_trace_setevent statements. Here’s an example:

-- Set the events
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 125, 1, @on
exec sp_trace_setevent @TraceID, 125, 3, @on
exec sp_trace_setevent @TraceID, 125, 8, @on
exec sp_trace_setevent @TraceID, 125, 12, @on
exec sp_trace_setevent @TraceID, 125, 14, @on
exec sp_trace_setevent @TraceID, 125, 22, @on
exec sp_trace_setevent @TraceID, 125, 26, @on
exec sp_trace_setevent @TraceID, 125, 34, @on
exec sp_trace_setevent @TraceID, 125, 35, @on

Per Books On Line (http://msdn.microsoft.com/en-us/library/ms186265(SQL.110).aspx), the above statements mean that we are capturing:

Event ID = 125, i.e. “Deprecation Announcement”
Column ID = 1, i.e. “TextData”
Column ID = 3, i.e. “DatabaseID”
Column ID = 8, i.e. “HostName”
Column ID = 12, i.e. “SPID”
and so on..

Setting the filters – sp_trace_setfilter

Now that we have the events & columns for the trace defined, we may want to filter the trace to capture only the events generated by let’s say, a specific host machine or against a specific database. To do so, we can use the sp_trace_setfilter system stored procedure. The stored procedure accepts the following as parameters:

@traceid
- The @trace_id returned by sp_trace_create
@columnid
- The ColumnId for the column that needs to be filtered
- Use a NULL here to clear all filters for the specified trace
@logical_operator
- Specifies whether the “AND” or “OR” logical operations are applied with respect to other filters on the same trace
@comparison_operator
- Specifies an enumerated integer value that indicates the type of comparison to make
@value
- Specifies the value on which to filter

Words of caution:

The data type of @value parameter must match the data type of the column to be filtered
To apply two filters on a particular column (e.g. the application name column and one filter on the username column), the filters must be applied on application name in sequence

This is demonstrated very clearly by the following example, which has been borrowed from the Books On Line:

--Notice that filters for column Id #10 ("Application Name") are applied together
sp_trace_setfilter  1, 10, 0, 6, N'SQLT%'
sp_trace_setfilter  1, 10, 0, 6, N'MS%'
sp_trace_setfilter  1, 11, 0, 0, N'joe'

Setting the trace status – sp_trace_setstatus

This one is perhaps the simplest of all system stored procedures for working with a SQL Trace. Now that a new trace created and events, columns and filters have been defined, it is time to start the trace. The parameters therefore are:

@traceid
- The @trace_id returned by sp_trace_create
@status
- Start or stop the data capture

An example would be:

-- Set the trace status to start
exec sp_trace_setstatus @TraceID, 1

Following these 4 steps essentially completes defining, modifying and starting a SQL Trace. In the next part of the tutorial, we will see how the graphical tool (Profiler) helps us to generate the T-SQL code for a given SQL trace and how to run the T-SQL based SQL Trace.

#0139 – SQL Server – Productivity feature – customize keyboard shortcuts

4 Replies

Recently, I was working with a developer colleague of mine to troubleshoot an issue he had with one of his queries. A stored procedure he wrote was consuming data from another procedure and was not functioning as expected. I suspected the issue to be a missing change in the related stored procedure.

The easiest way to verify this is obviously to script out the object and verify for the existence of the required code. Scripting out a programmability object (a stored procedure, function or a view) can be done in any one of the following ways:

Script the object using the Object Explorer
Query the catalog view – sys.sql_modules
Use the sp_helptext system stored procedure

The recommended way is of course, to use the Object Explorer to script out the object. However, the fastest method for most practical purposes is using the system stored procedure sp_helptext.

sp_helptext takes at least one parameter – the object name, with a normal usage similar to the one shown below:

USE AdventureWorks2008R2
GO
--Normal implementation
sp_helptext uspGetBillOfMaterials
GO

However, there is another way, which is even faster than typing in sp_helptext – which my colleague was surprised to learn about. That method is to customize the keyboard settings in SSMS such that any key one of the various supported, customizable key combinations stands for executing the system stored procedure sp_helptext.

Customizing the SSMS Keyboard

In order to customize the keyboard settings for your SSMS instance, here are the simple steps that need to be followed:

Navigate out to Tools –> Options
Within the Options window, expand the “Environment” node and navigate to the “Keyboard” node
Navigate to the “Query shortcuts” node by expanding the “Keyboard” node
Notice that various key combinations and assigned stored procedures to be executed are listed in the “Query shortcuts” node
Against the preferred key combination, enter the stored procedure that you would like to execute

As you can see from the screenshot above, the following are the default key assignments that come with SSMS:

Alt + F1 = sp_help
Ctrl + 1 = sp_who
Ctrl + 2 = sp_lock

In my case, I had the sp_helptext added as the key combination Ctrl+F1.

Please restart the SSMS after making changes to the keyboard configuration.

Using the customization

Once the keyboard combinations are assigned, there is no longer a need to type in sp_helptext anymore. Simply selecting the object name and using the key combination defined (in my case, Ctrl+F1), executes the stored procedure and results are returned.

I hope you liked today’s tip!

References:

Books-On-Line: sys.sql_modules
Books-On-Line: sp_helptext

Until we meet next time,

Be courteous. Drive responsibly.

Subscribe to my posts via E-mail: Subscribe here | Read my posts via your favourite RSS reader: Click Here! | Tweet to @nakulv_sql

#0138 – SQL Server – Performance – Best Practice – Create Index Before or After a data insert – I/O, Fragmentation, CPU and Elap

1 Reply

In November 2011, I wrote a post titled SQL Server – Performance – Best Practice – Create Index Before or After a data insert?. The post dealt with answering the question – which method was better – whether to create an index before or after a bulk-insert? The post was essentially targeted to code that generated temporary tables for SSRS reporting or bulk processing.

Erik Eckhardt provided some great feedback on the post. Essentially, his feedback was:

He wanted to see a Profiler trace comparison between the two approaches
A fragmentation study was in order – which method produced the most efficient indexes?

I had intended to carry out this study a long time ago (during the Christmas week-end), but it’s better late than never. So, here’s the results of the study.

Creating the test data

The approach that I will be taking today is:

Create two identical tables in the TempDB
- On one table, indexes will be created before the insert of test data is done
- On the other, I will first insert some data, and then create indexes

WARNING: The scripts provided in this post are provided “as-is” and without warranty for understanding purposes only. Using the DBCC scripts provided in the post on your QA or production environments can have serious side effects

The script that I will be using today is same as that in my earlier post. However, please find it below for your kind reference. First up is the query that creates the indices before inserting the data.

USE tempdb
GO

SET NOCOUNT ON

/*****************************************************************************************************************
   STEP 01: Create a table, complete with a primary key (to create a clustered index), and a non-clustered index
*****************************************************************************************************************/
USE tempdb
GO
CREATE TABLE DontUpdateStatistics
( StatId INT IDENTITY(1,1),
  RandomId INT DEFAULT (RAND()*1000),
  StatName VARCHAR(10) DEFAULT 'DEFAULT',
  CONSTRAINT PK_DontUpdateStatistics_StatId PRIMARY KEY CLUSTERED 
  (
    StatId
  )
)
GO

CREATE NONCLUSTERED INDEX NC_DontUpdateStatistics_RandomId ON DontUpdateStatistics(RandomId)
GO

/*****************************************************************************************************************
   STEP 02: Insert some test data into the table
*****************************************************************************************************************/
USE tempdb
GO
INSERT INTO DontUpdateStatistics DEFAULT VALUES
GO 100000

Next up is the script that I recommend – wherein the indexes are created after the data is inserted.

USE tempdb
GO

SET NOCOUNT ON

/*****************************************************************************************************************
   STEP 04: Create a table - this time, do not create any indexes
*****************************************************************************************************************/
CREATE TABLE UpdateStatistics
( StatId INT IDENTITY(1,1),
  RandomId INT DEFAULT (RAND()*1000),
  StatName VARCHAR(10) DEFAULT 'DEFAULT'
)
GO

/*****************************************************************************************************************
   STEP 05: Insert some test data into the table
*****************************************************************************************************************/
USE tempdb
GO
INSERT INTO UpdateStatistics DEFAULT VALUES
GO 100000

/*****************************************************************************************************************
   STEP 06: Now create the primary key clustered index and the non-clustered index
*****************************************************************************************************************/
ALTER TABLE UpdateStatistics
    ADD CONSTRAINT PK_UpdateStatistics_StatId PRIMARY KEY CLUSTERED 
    (
        StatId
    )
GO
CREATE NONCLUSTERED INDEX NC_UpdateStatistics_RandomId ON UpdateStatistics(RandomId)
GO

I will now start by answering the 2nd question first – which method produced an index that was more efficient from a fragmentation perspective?

Fragmentation study

Please note that in both the queries above, the clustered index is created on the column – StatId, which is an identity column. The inserts are therefore in the order we would want them to be in both cases, causing a “write-once” operation as Erik put it.

Once the data has been inserted into the temporary tables, I ran the following query that uses the sys.dm_db_index_physical_stats DMV to take a look at the index physical statistics. NOTE: I will not use the DBCC SHOWCONTIG because it is marked for deprecation (BOL reference here).

USE tempdb
GO
SELECT 'Table: DontUpdateStatistics' AS TableName,
       index_id AS IndexId,
       index_type_desc AS IndexType,
       index_depth AS IndexDepth,
       avg_fragment_size_in_pages AS 'AverageFragmentation(Pages)',
       avg_fragmentation_in_percent AS 'AverageFragmentation(Percentage)',
       fragment_count AS Fragments,
       page_count AS 'PageCount',
       partition_number AS PartitionNumber,
       alloc_unit_type_desc AS AllocationUnitType,
       index_level AS IndexLevel
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('DontUpdateStatistics'),-1,0,NULL)

SELECT 'Table: UpdateStatistics'AS TableName,
       index_id AS IndexId,
       index_type_desc AS IndexType,
       index_depth AS IndexDepth,
       avg_fragment_size_in_pages AS 'AverageFragmentation(Pages)',
       avg_fragmentation_in_percent AS 'AverageFragmentation(Percentage)',
       fragment_count AS Fragments,
       page_count AS 'PageCount',
       partition_number AS PartitionNumber,
       alloc_unit_type_desc AS AllocationUnitType,
       index_level AS IndexLevel
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('UpdateStatistics'),-1,0,NULL)

I looked at the following values:

avg_fragmentation_in_percent – The percent of logical fragmentation (out-of-order pages in the index)
fragment_count – The number of fragments (physically consecutive leaf pages) in the index
avg_fragment_size_in_pages – Average number of pages in one fragment in an index

Essentially, The avg_fragment_size_in_pages and avg_fragmentation_in_percent values are inversely proportional to each other. Therefore, we are looking for a higher value of avg_fragment_size_in_pages, and a lower value of avg_fragmentation_in_percent.

Below is the output that I received when the above queries were executed against my test instance.

In case you are not able to see the screenshot above, here’s the same data in tabular format:

TableName	Index Id	IndexType	Index Depth	Average Fragmentation(Pages)	Average Fragmentation(Percentage)	Fragments	Page Count	Partition Number	Allocation Unit Type	Index Depth
Table: DontUpdateStatistics	1	CLUSTERED INDEX	2	11.5666666666667	1.44092219020173	30	347	1	IN_ROW_DATA	0
Table: DontUpdateStatistics	2	NONCLUSTERED INDEX	2	1	99.6108949416342	257	257	1	IN_ROW_DATA	0
Table: UpdateStatistics	1	CLUSTERED INDEX	2	173.5	0	2	347	1	IN_ROW_DATA	0
Table: UpdateStatistics	2	NONCLUSTERED INDEX	2	58	0.574712643678161	3	174	1	IN_ROW_DATA	0

Results

Contrary to expectations, the results above clearly indicate that we have higher fragmentation in cases when indexes are created before data is inserted into the tables. It does not matter when these values (i.e. the results of the DMV) are fetched – before or after a SELECT is executed against the table, i.e. the index fragmentation remains the same and are not affected by when the statistics on the table are updated.

CPU Utilization and Elapsed Time study

I would address this part of the study in 2 parts:

In my original post (here), we already saw that when running the SELECT statements against the table, the Profiler was able to detect that the table: DontUpdateStatistics – did not have the expected statistics
- We have also seen that when STATISTICS IO was turned ON, the SQL Server clearly showed a performance difference between the two SELECT statements
In this post, I will try to see whether the performance gain in the second SELECT has been redistributed during the index creation or data load phases

To see the overall CPU time and elapsed time statistics, I used the SET STATISTICS TIME ON option wrapped around the Index (both clustered & non-clustered) creation and data insert statements. The result of the execution based on the data returned for the test data of 100,000 records is summarized below:

	Data Insert (ms)	Index (clustered & non-clustered) creation (ms)	Totals (ms)
CPU Time	—	—	—
Table: DontUpdateStatistics	4687	0	4687
Table: UpdateStatistics	4533	400	*4933*
Elapsed Time	—	—	—
Table: DontUpdateStatistics	24264	7	24271
Table: UpdateStatistics	22364	23872	*46236*

Summary

It is now time to summarize the findings.

Data can be loaded to a table in multiple ways. We have been comparing the pros & cons of 2 such methods:

Create necessary supporting indexes first, insert data afterwards
Insert data first, create necessary supporting indexes afterwards

My original post compared these approaches in terms of the number of read-ahead reads (number of pages placed into the cache) required before the SQL server can serve our request to fetch data from the tables depending upon when and whether statistics were updated for the tables. In this post, we studied the level of index fragmentation and time statistics.

The following chart is a summary of my findings based on which approach “wins” (Wins represented as “W”, whereas a loss by “L”):

	Most up-to-date statistics	Read-ahead reads required to service a query	Index fragmentation level	Time (CPU + Elapsed time) statistics
Index before, insert after	L	L	L	W
Insert before, index after	W	W	W	L
Further reading	Previous post	Previous post	This post	This post

In short, if execution time is of utmost importance for a given scenario, the approach #1 is preferred wherein indexes are created before inserting the data (which is the most common approach). However, if fragmentation, up-to-date statistics and reduced number of read-ahead reads are of importance (in case of read-only data), the approach #2 is preferable wherein data is inserted before creating the indexes.

I trust you found the study conducted above useful & interesting. If you have any comments, or would like to know more about it, please do feel free to contact me via the associated discussion or twitter. As always, I really appreciate the feedback that you, the community, provides.

References

Until we meet next time,

Be courteous. Drive responsibly.

SQLTwins by Nakul Vachhrajani

SQL Server tips and experiences dedicated to my twin daughters.

Category Archives: #SQLServer

#0142 – SQL Server – Profiler – Part 09 – Introducing the SQLDiag utility to capture SQL and Performance traces

Permissions

Location

Configuring the SQLDiag

Manually Editing the SQLDiag.xml file

Configuring the machine and instance names

Configuring the SQL Trace options

Configuring the Event Log collector

Configuring the PerfMon Collector

Editing the SQLDiag.xml file through a GUI

In the next part of the tutorial:

References:

#0141 – SQL Server – Profiler – Part 08 – Run a SQL Trace via T-SQL code (Part B)

How to use the graphical Profiler to generate a T-SQL script for a SQL trace

Customizing this trace script and run a SQL trace

Open the captured trace using the graphical Profiler

Explore the option to schedule traces

Exercises:

References/Further reading:

#0140 – SQL Server – Profiler – Part 07 – Run a SQL Trace via T-SQL code (Part A)

T-SQL Support for SQL Traces

Creating the trace – sp_trace_create

Configuring the trace – sp_trace_setevent

Setting the filters – sp_trace_setfilter

Setting the trace status – sp_trace_setstatus

Further reading:

#0139 – SQL Server – Productivity feature – customize keyboard shortcuts

Customizing the SSMS Keyboard

Using the customization

References:

#0138 – SQL Server – Performance – Best Practice – Create Index Before or After a data insert – I/O, Fragmentation, CPU and Elap

Creating the test data

Fragmentation study

Results

CPU Utilization and Elapsed Time study

Summary

References