BCP & large BULK operations made faster – switch between FULL & BULK-LOGGED recovery models


Recently, a friend told me that they were experiencing performance issues in production when importing data from 3rd party interfaces at the end of the day. The entire import was done via BCP and at a time when no users were logged into the system. Also, since it was a disconnected import, the entire import could be repeated again in case of any errors or issues.

After some discussion and research on MSDN, we came across the following page: Considerations for Switching from the Full or Bulk-Logged Recovery Model. Sure enough, the issues were resolved by a simple process change and addition of less than 10 lines of T-SQL code his end.

The Secret of faster Bulk operations

At this point, allow me to mention that detailed discussion on recovery models within SQL Server is out-of-scope of this post. Please refer Books-on-line for more details on each recovery model.

What the MSDN page suggests is that while FULL recovery model is ideal for normal usage, the BULK_LOGGED recovery model might help speed-up large bulk imports. So, the sequence that BOL recommends is:

  1. Under the FULL recovery model, take a log backup (in addition to the normal scheduled backup)
  2. Switch to the BULK_LOGGED recovery model
  3. Undertake the bulk operation
  4. Switch back to the FULL recovery model
  5. Take a log backup (in addition to the normal scheduled backup)
  6. Database available for normal operations

Using Bulk Logged Recovery model to speed up bulk operations

While under the BULK_LOGGED recovery model, the scheduled database, log and differential backups can continue to occur.

The secret lies in the fact that under the BULK_LOGGED recovery model, some operations are minimally-logged whereas the same operations are fully logged under the FULL recovery model. The fact that these operations are minimally logged means that the bulk operation would complete faster under the BULK_LOGGED model than under than the FULL recovery model.

These most commonly used operations which can benefit from the BULK_LOGGED recovery model are bulk import operations (BCP, SELECT…INTO, BULK INSERT and INSERT… SELECT).

IMPORTANT:

While following these steps it is strongly recommended that:

  1. No users be there in the system
  2. No modification be made that cannot be recovered without a log or a full database backup. Simply re-executing the bulk operation should be sufficient to restore the database from last backup if necessary

What this would not benefit

There are many myths & legends that surround SQL Server. One of them is that switching to the BULK_LOGGED recovery model would speed up log shipping because the transaction log file would be small. This myth has been busted here – http://blogs.msdn.com/b/sqlserverfaq/archive/2011/01/07/using-bulk-logged-recovery-model-for-bulk-operations-will-reduce-the-size-of-transaction-log-backups-myths-and-truths.aspx

I hope that the above has helped you in setting up your bulk import processes.

Until we meet next time,

Be courteous. Drive responsibly.

SQL Server Configuration Manager – Hide your SQL Server Instance


Data is one of the most critical and valuable components for any industry and it’s protection is every individual’s responsibility. Data lives within Microsoft SQL Server, and therefore, it is the duty of DBAs to ensure that their SQL Server is safe from unwanted detection and intrusion.

Today, I will demonstrate a method by which you can isolate your SQL Server instance from unwanted detection. I believe that this is a best practice and must be followed for all production systems.

SQL Server Configuration Manager

Per Books-On-Line,

  • SQL Server Configuration Manager is a tool to manage the services associated with SQL Server, to configure the network protocols used by SQL Server, and to manage the network connectivity configuration from SQL Server client computers
  • It is basically a Microsoft Management Console (MMC) snap-in and for SQL 2008 utilizes the SQLServerManager10.msc snap-in component
  • System administrators: please note that the configuration manager uses WMI to monitor and change SQL Server settings

Managing Protocol access

After all, client connectivity to access, manipulate and store data is all the SQL Server is about. SQL Server Configuration Manager allows you to configure server and client network protocols, and connectivity options.

The SQL Server configuration manager allows you to control how applications on the network access your database – what protocol is to be used, whether remote connections are allowed or not and also control the IP addresses & ports that SQL server would listen on for incoming requests. However, there is one key aspect that can be controlled when configuring protocols – the discovery of an SQL Server instance on the network.

How applications “discover” SQL Server?

Applications on a network discover a SQL Server instance by sending out something called as an “enumeration request”. If a SQL Server is visible on the network, the SQL Server Browser service would respond to this enumeration request and allow the SQL Server instances on the server to be listed amongst the list of available instances to connect to.

Hiding a SQL Server instance

In a production environment, you may want to hide the production instances of SQL Server to avoid detection. Essentially, all that needs to be done is to instruct the SQL Server Browser not to respond to an enumeration request for that particular SQL Server instance. Here are the brief steps on how you can go about achieving this objective:

  1. Launch the SQL Server Configuration Manager
  2. Expand the node for “SQL Server Network Connection”
  3. Browse out to “Protocols for <<SQL Server instance name>>
  4. Right-click and go to “Properties”
  5. image
  6. In the “Protocols for <<SQL Server instance name>>”, choose “Yes” from the drop-down to hide this instance from detection over the network
  7. image

As you can see that in just 5 steps (2 are images, so they don’t count) you can hide your SQL Server. Users and applications can still connect to this instance normally, but anyone attempting to enumerate through the network for a list of available SQL Servers would not find this instance on their list.

Other “How-to” topics

You can do a lot with the SQL Server Configuration Manager. You can find a list of common operations, and how to go about them on the Books-On-Line page here: http://msdn.microsoft.com/en-us/library/ms188707.aspx

By the way…

I have often seen developers using their trusted Windows Services applet to restart the SQL Server service. This was acceptable till SQL 2008 came about. Starting SQL 2008, this practice should not be followed.

This is because SQL Server uses a service master key for all it’s operations and this key needs to be regenerated upon service restart. These keys are not re-generated if the services are restarted using the Windows Services applet. The SQL Server Configuration manager is the right place to go to if you need a SQL Server service restart.

Until we meet next time,

Be courteous. Drive responsibly.

Msg 1701 – Creating or altering table failed….This exceeds the maximum allowable table row size of 8060 bytes.


In a perfect world, any table once designed would never undergo any modifications for the entire life of the product. However, nothing’s perfect and as they say – the only thing that’s constant is that nothing’s constant. Similarly, requirements change and so do table designs. However, there is one little aspect of SQL Server that most developers and novice administrators do not quite understand correctly.

Assume that we have the following setup:

CREATE DATABASE foo
GO

USE foo
GO
CREATE TABLE MyTbl (MyName CHAR(4030))
GO

INSERT INTO MyTbl
SELECT REPLICATE('a',4030)
GO

Let us assume that we need to modify the structure of our table:

ALTER TABLE MyTbl ALTER COLUMN MyName CHAR(4031)
GO

While the ALTER TABLE statement should work just fine, it does not. What we encounter is:

Msg 1701, Level 16, State 1, Line 2

Creating or altering table ‘MyTbl’ failed because the minimum row size would be 8068, including 7 bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.

So, what went wrong? We just wanted to increase one byte!

The post-mortem

Most of us have a tendency not to read error messages. Any message, and we just click “OK” to it. Similarly, if SQL throws anything in red fonts, we simply hit F5 again just in case SQL was not in the right spirits the first time around.

However, in this case, a close look at the message is the key to understanding what’s going on. Let’s start with the final sentence: This exceeds the maximum allowable table row size of 8060 bytes.

What is this number? 8060 bytes? It is the page size of a data page.

Per Books On Line (Understanding Pages & Extents), here’s how the numbers work out:

Data Page length in bytes 8192
Page Header 96
Overhead 36
Total Data space available on page = data page length – header length – overhead 8060

Once we have established this, let’s study what we are trying to do:

  Column properties Data type length (bytes) Current Row size in bytes
Current 4030 1 4030
New 4031 1 4031
TOTAL     8061

So, we are trying to take a row (because we only have one column, the column size is the row size) of 4030 bytes, and increase it to 4031 bytes, which should be okay because it ultimately remains under the 8060 byte limit. But that’s where looks are deceiving.

The ALTER Table statement attempts to expand the column “in-place”, i.e. it copies the data over to a new column with the new length specifications on the same row, making the row total now 8061 bytes, which is more than the 8060 maximum.

Now that we know we have a row that’s larger than the maximum number of bytes a page can hold, we move down the Books On Line article to see a section for Large Row support, which mentions – “Rows cannot span pages, however portions of the row may be moved off the row’s page so that the row can actually be very large.

Rows cannot span pages – everything boils down to this simple statement.Because our request causes the row to exceed 8060 bytes “in-transit”, the ALTER TABLE fails.

You can get more information on Rows exceeding the 8K limit here: http://msdn.microsoft.com/en-us/library/ms186981.aspx

Workarounds

There are a couple of workarounds to this problem. However, all are very crude mechanisms with their own pros & cons and must not be used unless absolutely necessary.

  1. The most obvious one – if possible, reduce the data within a row so that we do not exceed the allowed limits
  2. Create a new table with the new specifications, copy all the data over, transforming it on the way. Finally, drop the old table
  3. Export the existing data to another table/s, truncate the existing table. Make necessary changes and import data back into the original table

I hope that you found this behaviour interesting and also enjoyed learning about the internal workings of Microsoft SQL Server as much as I did.

Until we meet next time,

Be courteous. Drive responsibly.

Query Plan Re-use: Write your T-SQL queries with the proper case for plan re-use and better query performance


I had an interesting time at the office recently. We were undertaking a performance tuning exercise for the canned reports of an OLTP that were migrated from Crystal Reports to SSRS, and had to come up with some ground rules for all database developers to follow when re-engineering queries.

One of the basic things to be ensured during a performance tuning exercise is that the queries must be written in such a way that once an execution plan is generated, SQL Server reuses a given execution plan for subsequent executions of the same query.

The above statement seems very simple, but there is a lot of between-the-lines reading that needs to be done before the true meaning of the statement can be understood.

Some basics – Execution & Query Plans

When a query is submitted to the SQL Server database engine for the first time, the query optimizer processes the query (query processing is a huge topic, and out-of-scope for this discussion. For more information, you can refer the BOL page on Query processing here) and generates a two execution plans:

  1. For serial execution
  2. For parallel execution

Each Execution plan (BOL Reference here) consists of:

  1. Query Plan
    • This is the bulk of the query plan, and does not contain the Execution context
  2. Execution Context
    • The execution context is unique for each user running the plan because it contains the parameter values for each execution per user

The prime concern is the re-use of the query plan portion of the Execution Plan. This is because generation of execution plans takes time & resources – both of which are scarce.

The importance of using the proper case when writing T-SQL Queries

The query plan, is internally stored as a text value – a string, which is what we see if we execute a query with the SHOWPLAN operator.

When a query is submitted, it is parsed and one of the first things that happens afterwards is that Microsoft SQL Server attempts to find a suitable match for the query plan from the plan cache. This is done via a simple, but crucial to understand string comparison. The problem is that most developers fail to realize that this comparison is a case-sensitive comparison.

Let’s understand this via a simple demo.

Demo

We will begin by clearing out the procedure cache.

DBCC FREEPROCCACHE
GO

Next, let us run a simple T-SQL query (twice) to fetch a list of all the employees in an organization:

USE AdventureWorks2008R2
GO

SELECT * FROM HumanResources.Employee
GO 2

Now, let us use the DMVssys.dm_exec_cached_plans, sys.dm_exec_query_plan and sys.dm_exec_sql_text to look at the procedure cache:

--Run this in a separate window

USE AdventureWorks2008R2
GO

SELECT cp.refcounts, cp.usecounts, cp.size_in_bytes, cp.cacheobjtype, cp.objtype, st.text, qp.query_plan, cp.plan_handle 
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) st
GO

You can see from the usecounts column that Microsoft SQL Server used the query plan it generated twice.

image

Now, assume that different developers working on the same application used some other variants of the same query:

USE AdventureWorks2008R2
GO

SELECT * FROM HUMANRESOURCES.EMPLOYEE
GO

SELECT * FROM HumanResources.Employee
GO;

Let us take a look at that plan usage again:

--Run this in a separate window

USE AdventureWorks2008R2
GO

SELECT cp.refcounts, cp.usecounts, cp.size_in_bytes, cp.cacheobjtype, cp.objtype, st.text, qp.query_plan, cp.plan_handle 
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan (cp.plan_handle) qp
CROSS APPLY sys.dm_exec_sql_text (cp.plan_handle) st
GO

You can clearly see that SQL Server used a different version of the query plan for each of the two query variants.

image

Moral of the story

When writing queries, please avoid:

  1. Changes to letter case in the text
  2. Changes to white space
  3. Changes to literal values
  4. Changes to text inside comments

Most important: If you are using Microsoft SQL Server 2008 and above – always use intelli-sense to be safe!

Until we meet next time,

Be courteous. Drive responsibly.

Happy 64th Independence Day! & A summary of my favourite posts


Long years ago we made a tryst with destiny, and now the time comes when we shall redeem our pledge, not wholly or in full measure, but very substantially. At the stroke of the midnight hour, when the world sleeps, India will awake to life and freedom.” ~Pandit Jawaharlal Nehru

These words were spoken by India’s first Prime Minister as part of a speech delivered in the Constituent Assembly, New Delhi, on August 14, 1947, on the eve of the attainment of Independence. Today is India’s 64th Independence Day, and a time to remember the thousands of individuals who laid down their lives in the struggle for independence.

It is a holiday here in India, and if you are working today, I hope that you get to go home on time (if not early) and that you have an otherwise uneventful Monday. I am also on a long-pending holiday. It’s been non-stop work since the last 1.5 years and I was on the edge of pulling my hair out!

I will therefore, present a selection from the posts that I have written till date. These may not have the highest read counts, but they are somehow near & dear to my heart. Some have even come as a result of a query that you, the kind and respected reader, a friend or a colleague might have asked.

Posts:

  1. CAST v/s CONVERT – A Performance Comparison
  2. CAST v/s CONVERT – Is there a difference as far as SQL Server is concerned? Which is better?
  3. Measuring the number of rows in a table – Are there any alternatives to COUNT(*)?
  4. The multiple connections of SSMS
  5. SQL Server Myth- Log files are removed when a database is made READ_ONLY
  6. “GO” as Batch Separator – Customize batch separator in SSMS & SQLCMD – What would you use instead of GO?
  7. The Ghost of the Y2K
  8. UNIX timestamp and the Year 2038 problem
  9. Questions from you!
  10. Other contributions:

Series:

  1. Underappreciated features of Microsoft SQL Server
  2. VTD Recorded session & PPT – Learn Underappreciated Features of Microsoft SQL Server
  3. Tech-Ed India 2011 – Part 01, Part 02, Part 03
  4. T-SQL Worst Practices – a compilation from Jacob’s session at Tech-Ed 2011 – Part 01, Part 02, Part 03
  5. T-SQL Debugging

On this day, I urge you all to Be courteous, Drive responsibly.