#0138 – SQL Server – Performance – Best Practice – Create Index Before or After a data insert – I/O, Fragmentation, CPU and Elap


In November 2011, I wrote a post titled SQL Server – Performance – Best Practice – Create Index Before or After a data insert?. The post dealt with answering the question – which method was better – whether to create an index before or after a bulk-insert? The post was essentially targeted to code that generated temporary tables for SSRS reporting or bulk processing.

Erik Eckhardt provided some great feedback on the post. Essentially, his feedback was:

  1. He wanted to see a Profiler trace comparison between the two approaches
  2. A fragmentation study was in order – which method produced the most efficient indexes?

I had intended to carry out this study a long time ago (during the Christmas week-end), but it’s better late than never. So, here’s the results of the study.

Creating the test data

The approach that I will be taking today is:

  1. Create two identical tables in the TempDB
    • On one table, indexes will be created before the insert of test data is done
    • On the other, I will first insert some data, and then create indexes

WARNING: The scripts provided in this post are provided “as-is” and without warranty for understanding purposes only. Using the DBCC scripts provided in the post on your QA or production environments can have serious side effects

The script that I will be using today is same as that in my earlier post. However, please find it below for your kind reference. First up is the query that creates the indices before inserting the data.

USE tempdb
GO

SET NOCOUNT ON

/*****************************************************************************************************************
   STEP 01: Create a table, complete with a primary key (to create a clustered index), and a non-clustered index
*****************************************************************************************************************/
USE tempdb
GO
CREATE TABLE DontUpdateStatistics
( StatId INT IDENTITY(1,1),
  RandomId INT DEFAULT (RAND()*1000),
  StatName VARCHAR(10) DEFAULT 'DEFAULT',
  CONSTRAINT PK_DontUpdateStatistics_StatId PRIMARY KEY CLUSTERED 
  (
    StatId
  )
)
GO

CREATE NONCLUSTERED INDEX NC_DontUpdateStatistics_RandomId ON DontUpdateStatistics(RandomId)
GO

/*****************************************************************************************************************
   STEP 02: Insert some test data into the table
*****************************************************************************************************************/
USE tempdb
GO
INSERT INTO DontUpdateStatistics DEFAULT VALUES
GO 100000

Next up is the script that I recommend – wherein the indexes are created after the data is inserted.

USE tempdb
GO

SET NOCOUNT ON

/*****************************************************************************************************************
   STEP 04: Create a table - this time, do not create any indexes
*****************************************************************************************************************/
CREATE TABLE UpdateStatistics
( StatId INT IDENTITY(1,1),
  RandomId INT DEFAULT (RAND()*1000),
  StatName VARCHAR(10) DEFAULT 'DEFAULT'
)
GO

/*****************************************************************************************************************
   STEP 05: Insert some test data into the table
*****************************************************************************************************************/
USE tempdb
GO
INSERT INTO UpdateStatistics DEFAULT VALUES
GO 100000

/*****************************************************************************************************************
   STEP 06: Now create the primary key clustered index and the non-clustered index
*****************************************************************************************************************/
ALTER TABLE UpdateStatistics
    ADD CONSTRAINT PK_UpdateStatistics_StatId PRIMARY KEY CLUSTERED 
    (
        StatId
    )
GO
CREATE NONCLUSTERED INDEX NC_UpdateStatistics_RandomId ON UpdateStatistics(RandomId)
GO

I will now start by answering the 2nd question first – which method produced an index that was more efficient from a fragmentation perspective?

Fragmentation study

Please note that in both the queries above, the clustered index is created on the column – StatId, which is an identity column. The inserts are therefore in the order we would want them to be in both cases, causing a “write-once” operation as Erik put it.

Once the data has been inserted into the temporary tables, I ran the following query that uses the sys.dm_db_index_physical_stats DMV to take a look at the index physical statistics. NOTE: I will not use the DBCC SHOWCONTIG because it is marked for deprecation (BOL reference here).

USE tempdb
GO
SELECT 'Table: DontUpdateStatistics' AS TableName,
       index_id AS IndexId,
       index_type_desc AS IndexType,
       index_depth AS IndexDepth,
       avg_fragment_size_in_pages AS 'AverageFragmentation(Pages)',
       avg_fragmentation_in_percent AS 'AverageFragmentation(Percentage)',
       fragment_count AS Fragments,
       page_count AS 'PageCount',
       partition_number AS PartitionNumber,
       alloc_unit_type_desc AS AllocationUnitType,
       index_level AS IndexLevel
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('DontUpdateStatistics'),-1,0,NULL)

SELECT 'Table: UpdateStatistics'AS TableName,
       index_id AS IndexId,
       index_type_desc AS IndexType,
       index_depth AS IndexDepth,
       avg_fragment_size_in_pages AS 'AverageFragmentation(Pages)',
       avg_fragmentation_in_percent AS 'AverageFragmentation(Percentage)',
       fragment_count AS Fragments,
       page_count AS 'PageCount',
       partition_number AS PartitionNumber,
       alloc_unit_type_desc AS AllocationUnitType,
       index_level AS IndexLevel
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('UpdateStatistics'),-1,0,NULL)

I looked at the following values:

  • avg_fragmentation_in_percent – The percent of logical fragmentation (out-of-order pages in the index)
  • fragment_count – The number of fragments (physically consecutive leaf pages) in the index
  • avg_fragment_size_in_pages – Average number of pages in one fragment in an index

Essentially, The avg_fragment_size_in_pages and avg_fragmentation_in_percent values are inversely proportional to each other. Therefore, we are looking for a higher value of avg_fragment_size_in_pages, and a lower value of avg_fragmentation_in_percent.

Below is the output that I received when the above queries were executed against my test instance.

Fragmentation Results

In case you are not able to see the screenshot above, here’s the same data in tabular format:

TableName Index

Id
IndexType Index

Depth
Average

Fragmentation(Pages)
Average

Fragmentation(Percentage)
Fragments Page

Count
Partition

Number
Allocation

Unit

Type

Index

Depth
Table: DontUpdateStatistics 1 CLUSTERED INDEX 2 11.5666666666667 1.44092219020173 30 347 1 IN_ROW_DATA 0
Table: DontUpdateStatistics 2 NONCLUSTERED INDEX 2 1 99.6108949416342 257 257 1 IN_ROW_DATA 0
Table: UpdateStatistics 1 CLUSTERED INDEX 2 173.5 0 2 347 1 IN_ROW_DATA 0
Table: UpdateStatistics 2 NONCLUSTERED INDEX 2 58 0.574712643678161 3 174 1 IN_ROW_DATA 0

Results

Contrary to expectations, the results above clearly indicate that we have higher fragmentation in cases when indexes are created before data is inserted into the tables. It does not matter when these values (i.e. the results of the DMV) are fetched – before or after a SELECT is executed against the table, i.e. the index fragmentation remains the same and are not affected by when the statistics on the table are updated.

CPU Utilization and Elapsed Time study

I would address this part of the study in 2 parts:

  1. In my original post (here), we already saw that when running the SELECT statements against the table, the Profiler was able to detect that the table: DontUpdateStatistics – did not have the expected statistics
    • We have also seen that when STATISTICS IO was turned ON, the SQL Server clearly showed a performance difference between the two SELECT statements
  2. In this post, I will try to see whether the performance gain in the second SELECT has been redistributed during the index creation or data load phases

To see the overall CPU time and elapsed time statistics, I used the SET STATISTICS TIME ON option wrapped around the Index (both clustered & non-clustered) creation and data insert statements. The result of the execution based on the data returned for the test data of 100,000 records is summarized below:

Data Insert (ms) Index (clustered & non-clustered) creation (ms) Totals (ms)
CPU Time
Table: DontUpdateStatistics 4687 0 4687
Table: UpdateStatistics 4533 400 4933
Elapsed Time
Table: DontUpdateStatistics 24264 7 24271
Table: UpdateStatistics 22364 23872 46236

Summary

It is now time to summarize the findings.

Data can be loaded to a table in multiple ways. We have been comparing the pros & cons of 2 such methods:

  1. Create necessary supporting indexes first, insert data afterwards
  2. Insert data first, create necessary supporting indexes afterwards

My original post compared these approaches in terms of the number of read-ahead reads (number of pages placed into the cache) required before the SQL server can serve our request to fetch data from the tables depending upon when and whether statistics were updated for the tables. In this post, we studied the level of index fragmentation and time statistics.

The following chart is a summary of my findings based on which approach “wins” (Wins represented as “W”, whereas a loss by “L”):

Most up-to-date statistics Read-ahead reads required to service a query Index fragmentation level Time (CPU + Elapsed time) statistics
Index before, insert after

L

L

L

W

Insert before, index after

W

W

W

L

Further reading

Previous post

Previous post

This post

This post

In short, if execution time is of utmost importance for a given scenario, the approach #1 is preferred wherein indexes are created before inserting the data (which is the most common approach). However, if fragmentation, up-to-date statistics and reduced number of read-ahead reads are of importance (in case of read-only data), the approach #2 is preferable wherein data is inserted before creating the indexes.

I trust you found the study conducted above useful & interesting. If you have any comments, or would like to know more about it, please do feel free to contact me via the associated discussion or twitter. As always, I really appreciate the feedback that you, the community, provides.

References

Until we meet next time,

Be courteous. Drive responsibly.

#0137 – SQL Server – Review – Pluralsight course – SQL Server Questions & Answers – by Pinal Dave & Vinod Kumar


I recently had the opportunity to view some of the newest courses from Pluralsight, an organization that offers online training courses for developers. One of the courses that I viewed was the SQL Server Questions & Answers (http://www.pluralsight-training.net/microsoft/Courses/TableOfContents?courseName=sql-server-qa) by Pinal Dave (blog) and Vinod Kumar (blog), both of whom have been a source of inspiration to the entire SQL Server community and enthusiasts like myself.

Today, I take the opportunity to write a review on the course.

Book or Online course? Are they the same?

Pinal & Vinod recently authored a book “SQL Server Interview Questions and Answers”, which I reviewed in my post here. My review was “With real world questions and practical answers, this is one book you simply cannot afford to miss.

So, the question now is – both the book and the online course have similar names. Are they the same? My answer: No.

While the online course is based a highly filtered content from the book, the online course is much more than the book. Pinal Dave takes the place of the garden-variety SQL Server developer & administrator whose mind is full of myths surrounding SQL Server. Vinod Kumar spreads light by demonstrating the problems caused by these common misconceptions and also demonstrates the right way to do things.

The online course is more of a “myth-buster”, and is definitely not the same as the Interview Questions book.

Overall feel of the course

When I first looked at the course index, I thought – there’s no way such a laundry list of concepts can be covered in a time that’s just above 120minutes (2 hours). But, the simplicity by which Pinal & Vinod explained the most complicated concepts left me hungry for more. I took the entire course in one, uninterrupted session and it’s quite possible that almost everyone who views the course would have a similar experience.

The course is all about eliminating some of the most common misconceptions in the developer community around SQL Server. Here are some of the best:

  • Are stored procedures pre-compiled?
  • What is the difference between temporary tables & temporary variables?
  • Does the order of columns in the WHERE clause matter?
  • What is the real difference between DELETE & TRUNCATE?

In addition, the course also enlightens us on some of the basic elements like – how long should a transaction be? or, are locks maintained throughout the duration of the transaction?

At one point in the course, Vinod says – "Dynamic SQL is really not bad, it’s the way in which we prepare & execute SQL that matters." – It just can’t get any better than this!

Some surprises….

Pinal & Vinod never cease to surprise me. The final part of the session is not about questions and answers, but about how the various Tools & utilities within SQL Server can help us make working with SQL Server a breeze. While I have written about them in the past and use them in my day-to-day work, I was surprised to note a couple of things that Vinod had up his sleeve.

  1. Activity Monitor
  2. Object explorer details
  3. Filtering within Object Explorer details
  4. Template Explorer
    • Vinod demonstrates some really cool tricks here (and my post does not cover them)
  5. Query Editor & Query Editor results tab (I won’t share any links here – watch the course!)

My Recommendation

Every SQL Server developer must take the course. If you are into DB design, this course has elements that you would definitely enjoy, especially the part which demystifies DENY v/s GRANT operations.

Do not miss the course for a highly productive work environment! So, this week-end, register with Pluralsight and take the course.

Until we meet next time,

Be courteous. Drive responsibly.

Subscribe to my posts via E-mail: Subscribe here | Read my posts via your favourite RSS reader: Click Here! | Tweet to @nakulv_sql

#0136 – SQL Server – SSMS – Slow load times – Improving launch performance in 2 simple steps


As you know, I recently completed a 10-part tutorial series on the SQL Server Management Studio (SSMS). As I finished writing the tutorial, I remembered that one of the issues most SQL Server developers encounter are the slow load times of the SSMS. Sometimes, SSMS is just not fast enough for a quick session. In this post, I will share with you 2 simple tweaks I use to improve the launch performance of SSMS.

Simplify the SSMS interface

Almost each dedicated pane in SSMS that allows you to directly interact with your SQL Server instance establishes a connection to the SQL Server. The Object Explorer (which normally opens by default when you launch SSMS) and the Object Explorer Details window launch 2 separate connections to the SQL Server instance, which are in addition to the connection opened by the Query Editor window. Read my post: http://beyondrelational.com/blogs/nakul/archive/2011/01/17/the-multiple-connections-of-ssms.aspx – for details on this fact.

Each new connection means an increase in the resources consumed and additional time taken to establish the connection. Therefore, the first rule of optimizing SSMS launch times has to be to keep the SSMS interface simple and close out any unused panes/monitors open within SSMS.

Controlling the startup windows

To configure what windows should be opened up by SSMS at startup, simply navigate out to Tools –> Options within an SSMS session. In the “Options” window, go to the “Startup” options under “Environment” node. (For SQL Server 2005, 2008 and 2008R2 users, you would go to the “General” page)

You may want to select either “Open new query window” or “Open empty environment” for some of the fastest SSMS launch times.

SSMS Startup Options

Adjust the Visual Experience based on the client performance (SQL 2012/”Denali” and up)

Starting SQL Server 2012 (code named “Denali”), the “General” page of the Options window has a section that helps you tweak the visual experience of the SSMS based on the client performance. By default the SSMS automatically adjusts itself based on the client performance, and I have found that if I leave it as-is, things work just fine for me.

SSMS Options - Adjusting Visual Experience features 

Automate the launching of SSMS and connecting to your preferred SQL Server instance

Many a times, improving the load times may not just be about launching a particular application faster. In fact, for me, launch time for any application is the time the application takes to load up + the time it takes for me to actively use the application. In case of the SQL Server management studio, the launch time would therefore be a summation of:

  1. Launching the SSMS
  2. Selecting the SQL Server instance
  3. Entering connection information
  4. Clicking OK to the connection dialog

Wouldn’t it be much faster and simpler for a developer to have all these steps rolled up into one?

Using the SSMS command line options

SSMS allows us to specify a few command line options that eliminate the need for a user to select the required SQL Server instance and enter the connection information every time they need to use SSMS. Most developers need to connect to only a handful of SQL Server instances. These instances would not change for the duration of the project they are actively working on and therefore using customized shortcuts for these instances makes sense.

The parameters that I generally use are:

Parameter Meaning
-S servername The SQL Server instance name to which the SSMS should establish a connection
-d databasename The database within the specified SQL Server instance to which the SSMS should switch to after establishing the connection
-E Connect using Windows authentication.

If your server uses SQL Server authentication, you may use the [–U username] and [-P password] options

-nosplash Do not display the SSMS splash screen during startup

In addition, you may also use the parameters [scriptfile], [projectfile], [solutionfile] to specify a script, project or solution to open by default.

I therefore changed my SSMS shortcut to be the following. The shortcut connects to the AdventrueWorks2008R2 database on my test instance. The SSMS would use Windows authentication.

"C:Program FilesMicrosoft SQL Server110ToolsBinnManagementStudioSsms.exe" -S W2K8DENALISQL11 -d AdventureWorks2008R2 -E –nosplash

SSMS Shortcut

As can be seen from the screenshot below, the SSMS launched with windows authentication and connected directly to the AdventureWorks2008R2 database by default.

SSMS - Fast Load

References:

Do you use any methods to speed up the SSMS launch times? If yes, do share your method in the discussion section below.

Until we meet next time,

Be courteous. Drive responsibly.

Subscribe to my posts via E-mail: Subscribe here | Read my posts via your favourite RSS reader: Click Here! | Tweet to @nakulv_sql

#0135 – SQL Server – SSMS – Tutorial – 10 – Troubleshooting (L200)


All systems are susceptible to failure. No software is perfect, and therefore it is natural that SSMS too will fail. Today, I have put together some of the most common failures that I have experienced with SSMS and their known workarounds.

The reason I have kept this part of the tutorial as a L200 is because the workarounds and tips from this post should be used with caution.

General troubleshooting

The Object Explorer is not visible/hidden

The most common scenario when my Object Explorer sometimes goes missing is when I switch back to my standard laptop resolution after projecting using a projector or on a larger screen.

Obviously, the first step in troubleshooting a hidden object explorer is to make sure the Object Explorer is indeed enabled by toggling it using the F8 key or by using the View menu.

If this does not resolve your issue, then the problem starts becoming a bit ugly. Save all your work within the SSMS, and restart the SSMS. In most cases, restarting SSMS restores the Object Explorer.

An important criteria for any recovery mechanism is to make as little collateral damage as possible. Restarting a system allows the system to preserve user data, while repairing the point of failure successfully. It is therefore that standard protocols like restarting systems are developed. However, in some cases, it becomes impossible to recover the system completely. Over the many years of my experience with SQL Server, I have faced such a situation with SSMS only once. If the components of SSMS simply fail to become visible on screen, the last option left is to give the system a “shock” by removing all customizations (repositioning, resizing, etc) applied. This can be done by using Windows –> Reset window layout option.

image

Using the “Reset Window Layout” clears out customizations to the default window positions and sizes and takes the SSMS back to the default state.

Possibly large number of connections when only one SSMS instance is connected to the SQL Server

When working with SSMS, you might find that while only one instance of SSMS is connected to the SQL Server, the number of connections being opened up is not proportional. The reason is that SSMS does a tradeoff between features and additional connections. The Object Explorer, Object Explorer details and other such “heavy-weight” windows open their own dedicated connection to the SQL Server instance.

You can read about my little experiment regarding the multiple connections of SSMS here: http://beyondrelational.com/blogs/nakul/archive/2011/01/17/the-multiple-connections-of-ssms.aspx

Avoid the urge to use all features at the same time on a production server, or during troubleshooting – not only will it confuse you, but also might slow down the SQL Server (if the server is under pressure of some sort).

Finally, if the SQL Server has a database which is in single-user mode, it will be available to only one of the multiple sessions that SSMS starts. The trouble is that which connection will come first is unknown. Hence, it is always a good practice to have nothing but the query editor open when working with either single-user databases or single query modes like the DAC (Dedicated Admin connection).

So, configure your environment wisely, and the power of SSMS will be with you.

Clearing the SSMS History

For those of you who work with multiple SQL servers with multiple logins, the “Connect to Server” window seems to fill up very quickly. Sometimes, these servers might not even be in existence (e.g.. if you frequently refresh your virtual RND/test environments).

For those using the yet-to-be-released SQL Server 2012 (code named “Denali”), you can select the value and hit the delete key just as you would delete the entry from any other drop-down. However, if you are using SSMS for SQL Server 2005 (in which case you should upgrade) or SQL Server 2008/2008 R2, you can refer my post on clearing SSMS history here: http://beyondrelational.com/blogs/nakul/archive/2011/02/17/clearing-ssms-history.aspx.

Advanced Troubleshooting – Troubleshooting the T-SQL Debugger

In 2011, I presented a session on the “Underappreciated Features of Microsoft SQL Server” during the Exclusive SQL Server Webcast series on Security & Scalability. One of the items that I demoed in the VTD was the T-SQL debugger. The T-SQL Debugger exists at least since SQL Server 2000 and as the versions progress, we can see a lot of changes being made to the debugger. While preparing my environment for the T-SQL debugger, I encountered multiple issues related to firewall security and sequence of the Visual Studio versions installed. You can find my posts on each of these errors here:

By the way, you can find a recording of my VTD session here:

Concluding the series

This concludes my series of tutorials on SSMS. Last year, a lot of engineers who had never worked with SQL Server came up to me looking for some guidance on the SSMS interface. My goal therefore was to focus on features that would make their lives easier and make them feel comfortable within the confines of the SSMS IDE. I hope that these tutorials will help all the budding SQL Server enthusiasts.

If these tutorials helped you, or if you have any suggestions, do let me know. Your feedback is most welcome.

Until we meet next time,

Be courteous. Drive responsibly.

Subscribe to my posts via E-mail: Subscribe here | Read my posts via your favourite RSS reader: Click Here! | Tweet to @nakulv_sql

#0134 – SQL Server – SSMS – Tutorial – 09 – Integrating with peripheral services & other SQL Server tools (L200)


As mentioned in Part 01 of this tutorial series, one of the key responsibilities of SSMS was to replace the erstwhile SQL Server Enterprise Manager – the interface that is used to manage & maintain the SQL Server instance configuration. In addition to the tools and utilities available within the SSMS, there are other peripheral tools & utilities that work in tandem with SQL Server to help you develop T-SQL code, manage & maintain the SQL Server instance.

SQL Server Agent

The SQL Server agent needs no introduction. The SQL Server agent is actually a Windows service (separate from the SQL Server service) that executes scheduled administrative tasks, called jobs. Being a process external to the SQL Server, the Agent also performs a very important function of monitoring the SQL Server (thereby generating necessary logs) and processes SQL Server alerts.

To know more about the SQL Server Agent, please visit Books On Line at: http://msdn.microsoft.com/en-us/library/ms189089.aspx

SQL Server Profiler

There would hardly be a soul on the planet who would have worked with SQL Server and not used the Profiler, which is undoubtedly the 2nd most frequently used utility. The SQL Server Profiler integrates with the SSMS providing easy access.

One way of launching the Profiler from SSMS is by using the Tools menu. However, there is no difference between launching the Profiler externally or via the Tools menu.

image

Normally, when the Profiler is launched externally, one needs to login to the SQL Server instance and then choose the trace required. When launched from the query editor in SSMS, these two steps are bypassed, thereby improving general developer productivity.

For example, let’s say you have the following query open in SSMS and would like to trace it using the SQL Server Profiler.

USE AdventureWorks2008R2;
GO
SELECT VendorID, 
       [250] AS Emp1, 
       [251] AS Emp2, 
       [256] AS Emp3, 
       [257] AS Emp4, 
       [260] AS Emp5
FROM (SELECT PurchaseOrderID, EmployeeID, VendorID
      FROM Purchasing.PurchaseOrderHeader) p
PIVOT (COUNT (PurchaseOrderID)
       FOR EmployeeID IN ( [250], [251], [256], [257], [260] )
      ) AS pvt
ORDER BY pvt.VendorID;

On the Query editor, right-click and choose “Trace query in SQL Server Profiler” or use the key combination (Ctrl+Alt+P). You will find that the SQL Server Profiler launches, the trace chosen as default is launched and all that remains now is to run/execute the query in the query editor. The SQL Server Profiler will trace the execution of the query as required.

image
image

SQL Server Database Engine Tuning Advisor

The Database Engine Tuning Advisor (DTA) helps you select and create an optimal set of indexes, indexed views, and partitions for a given workload under evaluation. Provided that the hardware, SQL Server and it’s configuration, and the database itself mimic production environments, the DTA is fairly accurate in it’s suggestions. However, as is the case with all SQL Server utilities and recommendations, please validate whether they would be helpful in your particular scenario before rolling out into production use.

The DTA can be launched from the SSMS via the Tools menu (refer screenshot above). The other option is to right click in the query editor and choose “Analyze query in Database Engine Tuning Advisor”.

image
image

After adjusting the tuning options in the “Tuning options” tab, click on the “Start Analysis” button to start the analysis.

image 

At the end of the tuning session, the recommendations are made available in the “Recommendations” tab. More details are available in the “Reports” tab.

image

Practice Exercises…

As practice, identify the most common workloads from your production application which you plan to optimize.

  • Run a profiler trace against this trace. Are you able to identify the potential bottlenecks from the Profiler trace?
  • Next, run the DTA against this workload. What are the recommendations that the DTA identifies? Using the Reports tab, decide whether you would accept these recommendations or not

Further reading…

Until we meet next time,

Be courteous. Drive responsibly.

Subscribe to my posts via E-mail: Subscribe here | Read my posts via your favourite RSS reader: Click Here! | Tweet to @nakulv_sql