BCP & Bulk Inserts – Underappreciated features of Microsoft SQL Server


This is in continuation to my series on the Underappreciated features of Microsoft SQL Server. The series has been inspired from Andy Warren’s editorial on SQLServerCentral.com of the same name.

For any system that interfaces with other systems, bulk data transfer is a reality and has an importance that cannot be underestimated. Generally, the scenario is that data from multiple data sources (flat files being the most common) need to be imported into your SQL Server, or data from your SQL Server needs to be exported to these flat files.

Because Microsoft SQL server is naturally tuned to enterprise needs, it comes out of the box with a facility to “bulk insert” data into the SQL Server. Today, I will try to introduce you to the bcp command line utility and the BULK INSERT in SQL Server.

The bcp command line utility

The bcp command line utility copies data from a database in the Microsoft SQL Server to or from a data file – in a user-specified format (nothing can be more important than the last part, trust me).

BCP is a huge utility (huge in terms of the flexibility and functionality it provides), and you can read all about it on Books On Line at http://msdn.microsoft.com/en-us/library/ms162802.aspx. BCP has been written using the ODBC bulk copy API per the BOL page – http://msdn.microsoft.com/en-us/library/aa196743(SQL.80).aspx.

However, put quite simply, you can open up the command prompt on your machine running Microsoft SQL Server and run the following (use the name of your SQL Server instance for the –S parameter):

bcp "select * from AdventureWorks2008R2.HumanResources.Employee" queryout "E:Employee.txt" -c"" -S VPCW2K8DENALI -T -k

And you will see that the output file now has the all rows of the HumanResources.Employee table.

image

Open this Excel and choose to have default (tab) delimiters:

image

Madhivanan (blog) provides a great stored procedure to work with BCP. Do not forget to read about it from his post here (it’s one of my favourites).

BULK INSERT

BULK INSERT operations allow us to import data into a Microsoft SQL Server. This feature is a one-way traffic lane. The BULK INSERT operation is similar to the very familiar SELECT * INTO tableName FROM tableName2 query.

Pinal Dave (blog) demonstrates in his post (http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/), how to BULK INSERT the contents of a CSV file into a Microsoft SQL Server database.

The Books On Line page for BULK INSERT is http://msdn.microsoft.com/en-us/library/ms188365.aspx.From this page, we can summarize that:

  1. INSERT and ADMINISTER BULK OPERATIONS permissions are required (unlike bcp). Also, ALTER TABLE may be required
  2. The BULK INSERT statement can be executed within a user-defined transaction
  3. The import may fail if executed on data of an invalid data type – this is change from previous versions of SQL Server. Also, this means that strings showing scientific notations are now considered invalid
  4. BULK INSERTs may cause table level locks – a smaller batch size should be used if this is found to be the case

BULK INSERT does not support import of selective columns. Madhivanan explains how this can be done in his post here – http://beyondrelational.com/blogs/madhivanan/archive/2010/03/17/bulk-insert-to-table-with-specific-columns.aspx. He also summarizes some of the best practices in his post http://beyondrelational.com/blogs/madhivanan/archive/2010/12/22/best-practices-when-using-bulk-insert.aspx.

CAUTION

  • By default, bulk copy operations do not fire triggers! You must specify the FIRE_TRIGGERS option for bulk copy operations to fire off INSERT and INSTEAD OF triggers. However, note that triggers are fired once for each batch in the operation. The INSERTED table passed to these triggers will have all the affected rows contained in the batch
  • Specifying FIRE_TRIGGERS also causes bulk copy operations to be fully logged (thus, your transaction log may grow faster than expected) as opposed to being bulk-logged
  • Result sets generating out of the bulk insert are not returned to the client

In Conclusion

Do you use bcp or BULK INSERT? If yes, do let me know – I would be interested in understanding the performance and/or other issues that you may have had to overcome.

This post ends the T-SQL enhancements section in the list of Underappreciated Features of SQL Server. From my next post, I will address some of the Administration Enhancements. If you would like me to write about something, do feel free to let me know.

Until we meet next time,

Be courteous. Drive responsibly.

Advertisement

5 thoughts on “BCP & Bulk Inserts – Underappreciated features of Microsoft SQL Server

  1. Nakul Vachhrajani

    If this is just a one-time operation that you need to do (copying data from server #1 to #2), you can try using the OPENROWSET queries ([http://msdn.microsoft.com/en-us/library/ms190312.aspx][1]). I use that myself in production to transfer data when it does not make sense to link two servers at geographically remote locations just for that one operation. Please note though that this is an adhoc distributed query, which is an advanced setting that needs to be enabled by the database administrator.

    [1]: http://msdn.microsoft.com/en-us/library/ms190312.aspx

    Like

    Reply
  2. davecason

    I benchmarked a bunch of these SQL Server ETL methods and bcp and bulk insert came up near at the bottom on performance. SSIS was, by far, the fastest way to move data into a SQL Server, and it was relatively easy to make parallel transactions to “fill the bucket” if you have lots of horse power. Even the “import wizard” was faster than bcp for bringing data in, and by far. What have you experienced in this respect?

    Like

    Reply
  3. Nakul Vachhrajani

    Absolutely – I agree with you, “davecason”. SSIS, if feasible, is one of the best options. The “import wizard” internally uses SSIS, which is why you saw the performance difference.
    But, if someone does not want to invest in Integration Services, or if there are legacy systems around, BCP and BULK INSERT are always handy. The other reason why BCP and BULK INSERT need to be around is if one wanted to integrate an external source into a T-SQL batch, SSIS cannot fit in.

    Like

    Reply
  4. tebbaerty

    the greatest issue that bcp overcome is performance. The most commonly used case is when 2 sql server can’t be linked ( by a link server ) and you need to copy data fast and easy.

    How can you do that, if bcp is out of the question, without developing, or using another 3rd party tool ?

    Like

    Reply
  5. Pingback: SSMS – Query Result configuration – Generate comma separated output without BCP or query changes | SQLTwins by Nakul Vachhrajani

Let me know what you think about this post by leaving your feedback here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.