BCP & Bulk Inserts – Underappreciated features of Microsoft SQL Server

5 Replies

This is in continuation to my series on the Underappreciated features of Microsoft SQL Server. The series has been inspired from Andy Warren’s editorial on SQLServerCentral.com of the same name.

For any system that interfaces with other systems, bulk data transfer is a reality and has an importance that cannot be underestimated. Generally, the scenario is that data from multiple data sources (flat files being the most common) need to be imported into your SQL Server, or data from your SQL Server needs to be exported to these flat files.

Because Microsoft SQL server is naturally tuned to enterprise needs, it comes out of the box with a facility to “bulk insert” data into the SQL Server. Today, I will try to introduce you to the bcp command line utility and the BULK INSERT in SQL Server.

The bcp command line utility

The bcp command line utility copies data from a database in the Microsoft SQL Server to or from a data file – in a user-specified format (nothing can be more important than the last part, trust me).

BCP is a huge utility (huge in terms of the flexibility and functionality it provides), and you can read all about it on Books On Line at http://msdn.microsoft.com/en-us/library/ms162802.aspx. BCP has been written using the ODBC bulk copy API per the BOL page – http://msdn.microsoft.com/en-us/library/aa196743(SQL.80).aspx.

However, put quite simply, you can open up the command prompt on your machine running Microsoft SQL Server and run the following (use the name of your SQL Server instance for the –S parameter):

bcp "select * from AdventureWorks2008R2.HumanResources.Employee" queryout "E:Employee.txt" -c"" -S VPCW2K8DENALI -T -k

And you will see that the output file now has the all rows of the HumanResources.Employee table.

Open this Excel and choose to have default (tab) delimiters:

Madhivanan (blog) provides a great stored procedure to work with BCP. Do not forget to read about it from his post here (it’s one of my favourites).

BULK INSERT

BULK INSERT operations allow us to import data into a Microsoft SQL Server. This feature is a one-way traffic lane. The BULK INSERT operation is similar to the very familiar SELECT * INTO tableName FROM tableName2 query.

Pinal Dave (blog) demonstrates in his post (http://blog.sqlauthority.com/2008/02/06/sql-server-import-csv-file-into-sql-server-using-bulk-insert-load-comma-delimited-file-into-sql-server/), how to BULK INSERT the contents of a CSV file into a Microsoft SQL Server database.

The Books On Line page for BULK INSERT is http://msdn.microsoft.com/en-us/library/ms188365.aspx.From this page, we can summarize that:

INSERT and ADMINISTER BULK OPERATIONS permissions are required (unlike bcp). Also, ALTER TABLE may be required
The BULK INSERT statement can be executed within a user-defined transaction
The import may fail if executed on data of an invalid data type – this is change from previous versions of SQL Server. Also, this means that strings showing scientific notations are now considered invalid
BULK INSERTs may cause table level locks – a smaller batch size should be used if this is found to be the case

BULK INSERT does not support import of selective columns. Madhivanan explains how this can be done in his post here – http://beyondrelational.com/blogs/madhivanan/archive/2010/03/17/bulk-insert-to-table-with-specific-columns.aspx. He also summarizes some of the best practices in his post http://beyondrelational.com/blogs/madhivanan/archive/2010/12/22/best-practices-when-using-bulk-insert.aspx.

CAUTION

By default, bulk copy operations do not fire triggers! You must specify the FIRE_TRIGGERS option for bulk copy operations to fire off INSERT and INSTEAD OF triggers. However, note that triggers are fired once for each batch in the operation. The INSERTED table passed to these triggers will have all the affected rows contained in the batch
Specifying FIRE_TRIGGERS also causes bulk copy operations to be fully logged (thus, your transaction log may grow faster than expected) as opposed to being bulk-logged
Result sets generating out of the bulk insert are not returned to the client

In Conclusion

Do you use bcp or BULK INSERT? If yes, do let me know – I would be interested in understanding the performance and/or other issues that you may have had to overcome.

This post ends the T-SQL enhancements section in the list of Underappreciated Features of SQL Server. From my next post, I will address some of the Administration Enhancements. If you would like me to write about something, do feel free to let me know.

Until we meet next time,

Be courteous. Drive responsibly.

Exclusive SQL Server Webcast Series on Security & Scalability: REGISTER TODAY

Leave a reply

v:* {behavior:url(#default#VML);}
o:* {behavior:url(#default#VML);}
w:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}

I invite you all to be a part of the webcast series on SQL Server 2008 R2.

This webcast series will take you through the features of SQL Server 2008 R2 and help you get a better control over the software solution environment. I will be one of the speakers and will be presenting a session on the “Underappreciated Features of Microsoft SQL Server” on May 06, 2011!

So, what are you waiting for? Register Now at http://virtualtechdays.com/SQLServer2008R2/

Empowering New Class of Business Users!

Security and scalability are always the most important aspect of any enterprise solution. This webcast series will analyse various solutions that create an environment where performance is the key player along with security and scalability.

We invite you to log-on and be a part of the webcasts on SQL Server 2008 R2. This webcast series will help every developer and administrator to get a better control over their environment and create commendable solutions.

Date & Time

Session Title & Abstract

Speaker

2nd May, 2011
(2:30 pm – 3:45 pm)

Managing and Optimizing Resources for SQL Server

This session will cover the various aspect of how to manage the resources as well balance the utilization of the same to ensure the smooth running of the server.

Click here to join the session

Balmukund Lakhani

Balmukund is currently working as Technical Lead in SQL Support team with Microsoft India GTSC. In past 5+ years with Microsoft he was also a part of Premier Field Engineering Team for 18 months.

3rd May, 2011
(2:30 pm – 3:45 pm)

Optimizing and Tuning Full Text Search for SQL Server

This session will cover all the key aspect of Full Text Search which is responsible for the scalable and secure but optimized full text search.

Click here to join the session

Sudarshan Narasimhan

Sudarshan Narasimhan is currently working as a Technical Lead for the SQL Server support team where he is responsible for a group of Support Engineers and serves as their escalation point for all technical issues.

4th May, 2011
(2:30 pm – 3:45 pm)

Understanding Performance Bottlenecks using Performance Dashboard

This session will demonstrate how to troubleshoot a SQL Server performance issue using out-of-the-box features without having to collect diagnostic data for post-mortem analysis.

Click here to join the session

Amit Banerjee

Amit is a part of the SQL Server Escalation Services team at Microsoft. His day-to-day work involves fixing/troubleshooting complex issues related to SQL Server over a varied range of environments.

5th May, 2011
(2:30 pm – 3:45 pm)

Cool Tools to have for SQL Server DBA

This session will cover a quick review of PSSDiag, SQL Nexus, SQL Server Backup Simulator, Profiler, Process Explorer, PerfMon and WinDirStat.

Click here to join the session

Pradeep Adiga

Pradeep Adiga is a Subject Matter Expert (SME) for MS SQL Server with one of the biggest MNCs in Information Technology field, where he works with highly critical OLTP systems and over 50 TB data warehouse.

6th May, 2011
(2:30 pm – 3:45 pm)

Learn Underappreciated Features of SQL Server to Improve Productivity

This session will cover SQL Server Management Studio utilities which would improve developer productivity and ease management/administration tasks.

Click here to join the session

Nakul Vachhrajani

Nakul Vachhrajani is a Technical Lead and systems development professional with Patni Computer Systems Limited having a total IT experience of more than 6 years. He has comprehensive grasp on Database Administration, Development and Implementation with MS SQL Server and C, C++, Visual C++/C#.

Please note:

Audio Issues: Please follow these steps to resolve any audio issues if you face during the session:

1. Click on Voice and Video button in the Live Meeting Console and then un-mute the speaker’s icon from there.

	If the speaker’s icon is greyed out, please close the session and join the meeting again.
	If you see Join Audio under Voice & Video tab, then click on it to connect to the audio.

2. In case the Speaker Icon on the top menu bar is disabled, then exit the current session and re-join. Please ensure that you have:

	Windows Media Player 9 or above, you have Windows Live Meeting 2007 installed and you are not trying to view the session on Internet Explorer.
	Please ensure that you have adequate bandwidth.
	Also, the computer Audio controls should not be set to ‘mute’.

Conference Centre: If you are still unable to troubleshoot the audio problem, you have the option to connect to the conference centre and hear the session broadcast.

Dial 1-203-4808000 or 1-866-500-6738 (Toll-Free through Skype) and enter 8635208 (this pass code will be same for all the SQL Server 2008 R2 webcast sessions) as participant passcode to connect to the Webcast session.

Microsoft respects your privacy. Please read our online Privacy Statement.
If you would prefer not to receive future promotional emails from Microsoft Corporation please click here to unsubscribe. These settings will not affect any newsletters you’ve requested or any mandatory service communications that are considered part of certain Microsoft services.
To set your contact preferences for Microsoft newsletters, see the communications preferences section of the Microsoft Privacy Statement.

Microsoft Corporation (India) Pvt. Ltd.
9th Floor, Tower A, DLF Cyber Greens, DLF Cyber Citi, Sector 25A
Gurgaon, Haryana, 122 002, INDIA

CROSS APPLY – Underappreciated features of Microsoft SQL Server

Leave a reply

This is in continuation to my series on the Underappreciated features of Microsoft SQL Server. The series has been inspired from Andy Warren’s editorial on SQLServerCentral.com of the same name.

Today, we will look at a great new T-SQL enhancement introduced since SQL Server 2005 – the APPLY operator. Per Books On Line, “the APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.”

Instead of joining two tables, when APPLY is used, we join the output of a table-valued function with the outer table – such that the each row in the output of the table valued function is evaluated for each row of the outer table.

The two forms of APPLY

Just as we have multiple forms of the JOIN operator, we also have two forms of the APPLY operator – CROSS APPLY and OUTER APPLY. The difference is quite simple:

CROSS APPLY only returns rows from the outer table which produce a result set from the table valued function
OUTER APPLY returns both rows – irrespective of whether or not they produce a result set. NULL values are seen for the output of the table valued function for such rows

Examples

I believe the most common use of APPLY outside of a business application, is in performance tuning and database administration. One of the things that DBA are always monitoring is the answer to the question – “Which queries are currently running against a particular SQL Server?”. The simple query for this is:

SELECT * 
FROM sys.dm_exec_requests ser
CROSS APPLY sys.dm_exec_sql_text(ser.sql_handle)

As you can see, the query is such that the sql_handle is taken from the DMV – sys.dm_exec_requests and then applied to the function – sys.dm_exec_sql_text. Because we do not want NULL values, we used CROSS APPLY. As an exercise, try using OUTER APPLY and see what happens.

For a more examples, I would redirect the reader to Books On Line at: http://technet.microsoft.com/en-us/library/ms175156.aspx. The example and explanation is excellent, and very easy to understand.

The big difference – CROSS APPLY v/s CROSS JOIN

So, one might say that if the output of the table valued function was an actual table, CROSS APPLY can be replaced by a CROSS JOIN. However, that is not entirely true. CROSS JOIN will produce a Cartesian product, hence, if the outer table has m rows and the inner table n, the output will be (m x n) rows. CROSS APPLY, on the other hand, is more similar to an INNER JOIN.

Some things to keep in mind

Finally, let me draw your attention to a few things that you should keep in mind before using CROSS APPLY:

It’s quite obvious that to use the APPLY operator, the compatibility level of the database must at least be 90
Performance impact – It is quite clear that we will have at least one scan every time the TVF is executed. Hence, please keep an eye out on performance aspects before jumping in and using CROSS APPLY in everything – moderation is always good

Until we meet next time,

Be courteous. Drive responsibly.

PIVOT & Dynamic Cross-tabs – Underappreciated features of Microsoft SQL Server

Leave a reply

This is in continuation to my series on the Underappreciated features of Microsoft SQL Server. The series has been inspired from Andy Warren’s editorial on SQLServerCentral.com of the same name.

After a number of posts on my visit to Tech-Ed 2011 (India), we are now back on track and will continue from where we left off. We were discussing the T-SQL related underappreciated features and the next one in line was the PIVOT operator. Because PIVOT has been around since SQL Server 2005, and is simplifies a very important functionality, I was quite surprised to see this on the list of underappreciated features. Hence, I will start off with an introduction, provide and explain an example, and then point out the reader to a couple of reference resources that I used when I was learning this T-SQL enhancement.

Cross-tabs are a very important piece of any OLTP application. These systems typically have large amounts of data – all in great detail, but arranged “row-wise”. This arrangement is very much required for a high-performance OLTP system because it generally involves more incoming transactions and less of read operations. However, not every “row-wise” detail can be put up on a report – it makes great technical sense, but not business sense. For the data to have some business value, it needs to have the ability to be aggregated effectively when requested. What this means is that we need to convert one table-valued expression into another table.

The problem

Let’s take an example. In a sales setup, it is important to measure the number of orders placed by certain employees. In the AdventureWorks2008R2 sample database, this information is stored such that all unique values of interest (EmployeeId and VendorId) are in individual columns.

USE AdventureWorks2008R2
GO
SELECT poh.PurchaseOrderID, poh.EmployeeID, poh.VendorID
FROM Purchasing.PurchaseOrderHeader poh

What we need is that the unique values from one column in the expression (EmployeeID) are converted or transformed into multiple columns in the output, and an aggregation is performed on the remaining columns in the output. The unique values in the EmployeeID column themselves need to become fields in the final result set.

	EmployeeID as Columns
VendorID as Rows	Aggregations – COUNT of PurchaseOrderIDs for the vendor in the row key for the employee in the column key

The Conventional Solution

Conventionally (before SQL 2005), we would have ended up doing something like:

USE AdventureWorks2008R2
GO
SELECT poh.VendorID,
       (SELECT COUNT(poh1.PurchaseOrderID) 
        FROM Purchasing.PurchaseOrderHeader poh1 
        WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 250) AS Employee1,
       (SELECT COUNT(poh1.PurchaseOrderID) 
        FROM Purchasing.PurchaseOrderHeader poh1 
        WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 251) AS Employee2,
       (SELECT COUNT(poh1.PurchaseOrderID) 
        FROM Purchasing.PurchaseOrderHeader poh1 
        WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 256) AS Employee3,
       (SELECT COUNT(poh1.PurchaseOrderID) 
        FROM Purchasing.PurchaseOrderHeader poh1 
        WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 257) AS Employee4,
       (SELECT COUNT(poh1.PurchaseOrderID) 
        FROM Purchasing.PurchaseOrderHeader poh1 
        WHERE poh.VendorID = poh1.VendorID AND poh1.EmployeeID = 260) AS Employee5
FROM Purchasing.PurchaseOrderHeader poh
WHERE poh.EmployeeID IN ( 250, 251, 256, 257, 260 )
GROUP BY poh.VendorID

Such a query would have given us the result as shown in the screen-shot below, which is what we require.

When we look at the execution plan of this query, we find that it is anything but efficient – in fact, it’s terrible! Due to space constraints, I am only showing the core of the execution plan (i.e. where most computations are concentrated).

Depending upon the situation, you may also want to use a complex series of the SELECT…CASE statements.

The solution – PIVOT & UNPIVOT operators

Come SQL 2005, we had a exciting T-SQL enhancements introduced by Microsoft. Some of them are the introduction of the PIVOT and UNPIVOT operators.

The following is the T-SQL query using PIVOT to carry out the same computation demonstrated above:

USE AdventureWorks2008R2;
GO
SELECT VendorID, 
       [250] AS Emp1, 
       [251] AS Emp2, 
       [256] AS Emp3, 
       [257] AS Emp4, 
       [260] AS Emp5
FROM (SELECT PurchaseOrderID, EmployeeID, VendorID
      FROM Purchasing.PurchaseOrderHeader) p
PIVOT (COUNT (PurchaseOrderID)
       FOR EmployeeID IN ( [250], [251], [256], [257], [260] )
      ) AS pvt
ORDER BY pvt.VendorID;

In this example, the results of the following sub-query are PIVOT’ed on the EmployeeID column.

USE AdventureWorks2008R2;
GO
SELECT PurchaseOrderID, EmployeeID, VendorID
FROM Purchasing.PurchaseOrderHeader

Here is a walk-through of the above query:

The PurchaseOrderID column serves as the grouping column along with the EmployeeID
This set is then aggregated to produce an output as under:

Some points to consider:

When PIVOT and UNPIVOT are used against databases that are upgraded to SQL Server 2005 or later, the compatibility level of the database must be set to 90 or higher
When aggregate functions are used with PIVOT, the presence of any null values in the value column are not considered when computing an aggregation

To demonstrate UNPIVOT, let’s store the output of the PIVOT operation above into a temporary table variable and then execute the UNPIVOT operation on it. By doing so, there is a very important point I want to demonstrate.

USE AdventureWorks2008R2;
GO
--Declare a temporary table variable to hold the output of the PIVOT
DECLARE @pvt TABLE (VendorID INT, Emp1 INT, Emp2 INT, Emp3 INT, Emp4 INT, Emp5 INT)

INSERT INTO @pvt
SELECT VendorID, 
       [250] AS Emp1, 
       [251] AS Emp2, 
       [256] AS Emp3, 
       [257] AS Emp4, 
       [260] AS Emp5
FROM (SELECT PurchaseOrderID, EmployeeID, VendorID
      FROM Purchasing.PurchaseOrderHeader) p
PIVOT (COUNT (PurchaseOrderID)
       FOR EmployeeID IN ( [250], [251], [256], [257], [260] )
      ) AS pvt
ORDER BY pvt.VendorID

--UNPIVOT the table.
SELECT VendorID, Employee, Orders
FROM 
   (SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
   FROM @pvt) p
UNPIVOT
   (Orders FOR Employee IN 
      (Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt;
GO

The following is the output:

At this point, I would like you to understand the following major differences between PIVOT and UNPIVOT:

UNPIVOT is not the exact reverse of PIVOT. This is because during a PIVOT operation, we aggregate data, which causes loss of data-granularity by merging multiple rows into a single row
Also, because PIVOT ignores NULL values during the aggregations, they would no longer be present in the output of the UNPIVOT operation

Comparing PIVOT with the conventional query

Any feature is only useful if it has it’s own benefits. Obviously, PIVOT comes with the benefit of being more read-able and maintainable, but is it any better in terms of performance over the older, more conventional query?

So, I put both queries together, turned on the execution plan and executed them. The below was what the execution plan showed me. Do I need to say anything more?

Some great reference resources

I hope that you can now appreciate the benefits of using newer features of T-SQL as they come along. While there is an age-old saying that “If it isn’t broken, don’t fix it”, but in this case, we are not fixing anything – we are doing value addition – which is always welcome. As we say good-bye today, I will leave you with some great reference resources that helped me understand PIVOT:

Getting started with PIVOT Queries in SQL Server 2005/2008 by Jacob Sebastian
Another PIVOT Query example by Jacob Sebastian
Posts on PIVOT & UNPIVOT tables by Pinal Dave
Dynamic PIVOT in SQL Server 2005 by Madhivanan
Dynamic cross-tab with multiple PIVOT columns by Madhivanan

Until we meet next time,

Be courteous. Drive responsibly.

AdventureWorks documentation – Data Dictionary

Leave a reply

Today is a special day. It is Holy Thursday (or Maundy Thursday) – the day before Good Friday and the long Easter week-end, which is the Christian celebration of the Resurrection. It is the day when the Lord’s Supper or Holy Communion/Eucharist was instituted by Jesus Christ. This was made immortal by Leonardo Da Vinci in his famous fresco – “The Last Supper”.

Anyway, it’s still a working day in most parts of the world, and speaking of work reminds me that all of us have used the AdventureWorks family of sample databases at one time or the other. Ever since SQL Server 2005 came out, the AdventureWorks database has for me been the go-to place for studying about SQL Server, doing a quick test or demonstrating my thoughts to my managers or my team.

While the AdventureWorks family of databases is easily available from CodePlex, there is no well-known place where we can go to for lookup to the description of the schema and the tables. Recently, while researching for one of my posts, I stumbled upon the entire data dictionary for the AdventureWorks family of databases.

A data dictionary is a centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format. Data dictionaries are commonly used to circulate copies of the database schema to vendors and technology partners, and once a product is released, this may be made available to the end customers depending upon the need (eg. to allow for customization or study).

You can get the data dictionary for AdventureWorks sample databases at: http://technet.microsoft.com/en-us/library/ms124438(SQL.100).aspx

For those who would like to download the AdventureWorks database for SQL 11 (“Denali”), head over to: http://msftdbprodsamples.codeplex.com/releases/view/55330

I trust that the AdventureWorks data dictionary will help you a lot in finding your way through the database. Next week, we will be resuming the Underappreciated features of Microsoft SQL Server series.

Have a Happy Easter!

Until we meet next time,

Be courteous. Drive responsibly.

Want to ask me a question? Do so at: http://beyondrelational.com/ask/nakul/default.aspx

SQLTwins by Nakul Vachhrajani

SQL Server tips and experiences dedicated to my twin daughters.

BCP & Bulk Inserts – Underappreciated features of Microsoft SQL Server

The bcp command line utility

BULK INSERT

CAUTION

In Conclusion

Exclusive SQL Server Webcast Series on Security & Scalability: REGISTER TODAY

CROSS APPLY – Underappreciated features of Microsoft SQL Server

The two forms of APPLY

Examples

The big difference – CROSS APPLY v/s CROSS JOIN

Some things to keep in mind

PIVOT & Dynamic Cross-tabs – Underappreciated features of Microsoft SQL Server

The problem

The Conventional Solution

The solution – PIVOT & UNPIVOT operators

Some great reference resources

AdventureWorks documentation – Data Dictionary