SQL Server Integration Services (SSIS) are typically called upon when integrating systems exchange data from one source to a given destination. The reason I use the term “source” and “destination” instead of a “database” because either of the two can be something other than a database (a flat file, some web-service, a script task, etc). This is possible because SSIS is more like any other .net framework based programming language (C# or VB.net).
OLE DB destination
Because one would commonly have either a Microsoft Access or a Microsoft SQL Server on at least one side of the integration, the most common source & destinations used in a SSIS-based data solution are the OLE DB Source and the OLE DB Destination. The OLE DB destination allows you to load data to a table, a view or even a SQL command (e.g. the results of a statement execution).
In order to load data as quickly into the destination as possible, the OLE DB destination allows us to use a “Fast Load” mode. The “Fast Load” option allows the data team to configure various options that affect the speed of the data load:
- Keep Identity
- Keep NULLs
- Table Lock
- Check Constraints
- Rows per Batch
- Maximum Insert Commit Size
We will look at each option in detail over the next couple of weeks.
Keep NULLs option
The Keep NULLs option is normally something that most accidental SSIS developers do not pay much attention to. It comes unchecked by default and it left unchecked. However, the state of this checkbox can have a significant impact on the completeness and quality of data being inserted into the destination database.
To clarify, allow me to explain the functionality of this checkbox:
- Checked – If a column in the source data has NULL values, keep them as-is
- Unchecked – If a column in the source data has NULL values, try to replace them with the default values as defined by the destination DB
The state of this checkbox typically does not make much of a difference because in most cases, the domain and business rules in both the systems involved would be similar. Thus, the if a column in one system allows a NULL value, other systems in the same domain would also allow a NULL (e.g. in most enrollment forms, the last name would generally be mandatory but the first name is not). However, legacy systems (which have been around since decades) would have accumulated a lot data that does not conform to newer domain practices, causing issues during migration. This is when the “Keep Nulls” checkbox comes into action.
In the case I am going to present today, I have a set of Product Names and their corresponding Manufacturers. In a few of these cases, I don’t know the manufacturer and have therefore kept it blank.
USE tempdb; GO --Test Data SELECT [ProductList].[ProductName], [ProductList].[ManufacturerName] FROM (VALUES ('Windows' , 'Microsoft'), ('SQL Server' , NULL ), ('VisualStudio','Microsoft'), ('MySQL' , 'Oracle' ), ('PeopleSoft' , 'Oracle' ) ) AS [ProductList] ([ProductName], [ManufacturerName]); GO
For the sake of this demo, I have used this query as my source in the test SSIS package. Below is a screenshot of my data flow task.
I directly take this dataset as input to the OLE DB destination. The OLE DB destination is configured to a test table ([dbo].[KeepNullsInOLEDB]) with the following table definition.
USE [tempdb]; GO --Safety Check IF OBJECT_ID('dbo.KeepNullsInOLEDB','U') IS NOT NULL BEGIN DROP TABLE dbo.KeepNullsInOLEDB; END GO --Create table CREATE TABLE dbo.KeepNullsInOLEDB ([ProductName] VARCHAR(255) NULL, [ManufacturerName] VARCHAR(255) NULL CONSTRAINT df_KeepNullsInOLEDB_ManufacturerName DEFAULT ('Microsoft') ); GO
After executing the package, I query the [dbo].[KeepNullsInOLEDB] table in the destination database, and compare with the source data.
As can be seen from the screenshot, the [ManufacturerName] for “SQL Server” is not NULL. It is instead set to “Microsoft” which is the default value as set in the default constraint on the destination table.
The data inserted in the destination table changes if the switch is kept checked in the OLE DB destination.
If the “Keep nulls” checkbox is checked, the default constraint on the target table is not used – thereby maintaining the same data as the source.
Depending upon the business requirements, it may be critical to migrate data from a source to a destination “as-is”, without the application of default constraints. In such situations, the “Keep nulls” switch on the OLE DB destination (“Fast Load” mode) needs to be checked.
If the “Keep nulls” switch is unchecked, the default constraints from the target table definition come into effect.
In my future posts, I will take a look at the other switches on the OLE DB Fast Load mode.
Until we meet next time,
Be courteous. Drive responsibly.