Tag Archives: Series

Articles that are too big to fit into one part.

#0421 – SQL Server – SSIS – Wild card search to find if a file exists in a folder or directory

I recently answered a question on a forum which I believe will be useful to many of the readers in the audience.

SSIS packages are widely used for data import from and export to files. One of the main tasks in this situation would be to check if files with certain kinds of names exist in a particular folder or directory, i.e. basically perform a wild-card search in a directory.

The solution

This can be achieved by using the EnumerateFiles() method of the System.IO.Directory class in the SSIS Script task. Here’s the sample package:

In a folder, I have a set of files, some with similar names (which we will search from the SSIS package).

Files existing in the directory to be searched

The SSIS package has two (2) variables:

Variable Name	Configuration on the Script Task	Description
SearchPath	ReadOnly, Input	This is the path to be searched
SearchPattern	ReadOnly, Input	Pattern to be searched
FileExists	ReadWrite, Output	A boolean indicating downstream processes whether files were found or not

Table 1: Variables on the SSIS package

Screenshot showing the variables and their configuration on the script task

The script is a quite simple implementation as below:

public void Main()
{
    // SQLTwins: SSIS: Blog #0421

    string searchPath = Dts.Variables["User::SearchPath"].Value.ToString();
    string searchPattern = Dts.Variables["User::SearchPattern"].Value.ToString();

    System.Collections.Generic.List<string> searchResults = System.IO.Directory.EnumerateFileSystemEntries(searchPath, searchPattern).ToList();

    if (searchResults.Any())
    {
        Dts.Variables["User::FileExists"].Value = true;
    }

    Dts.TaskResult = (int)ScriptResults.Success;
}

Ensure that you have the following directive in the “namespaces” section of your script:

using System.Linq;

Here’s the script in action:

Screenshot showing the script in debug mode indicating that files were found matching the pattern.

As you can see, the script can help perform a wild-card search in a given folder or directory.

#0416 – SQL Server – Msg 8101 – Use column lists when working with IDENTITY columns

1 Reply

I have often written about IDENTITY columns on my blog. Identity columns, most commonly used to implement auto-increment keys, have been around for more than a decade now. Yet, I often see teams run into interesting use cases especially in cases where data is being migrated from one system to another.

Today’s post is based on one such incident that came to my attention.

The team was trying to migrate data from one table to another as part of an exercise to change the database structure for more efficiency. When moving the data from one table to another, they were using the option (SET IDENTITY_INSERT ON) in order to explicitly insert values into the Identity column. However, they were running into an error.

Msg 8101, Level 16, State 1, Line 24
An explicit value for the identity column in table 'dbo.tIdentity' can only be specified when a column list is used and IDENTITY_INSERT is ON.

Here is a simulation of what they were doing:

USE tempdb;
GO
SET NOCOUNT ON;
--Prepare the environment
--   Create a table, and add some test data into it
--Safety Check
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO
--Create a table
CREATE TABLE dbo.tIdentity (IdentityId INT IDENTITY(1,1),
                            IdentityValue VARCHAR(10)
                           );
GO
--Turn Explicit IDENTITY_INSERT ON and insert duplicate IDENTITY values
SET IDENTITY_INSERT dbo.tIdentity ON;
GO
--NOTICE: No column list has been supplied in the INSERT
INSERT INTO dbo.tIdentity
VALUES (1, 'One'),
       (2, 'Two');
GO

--RESULTS
--Msg 8101, Level 16, State 1, Line 24
--An explicit value for the identity column in table 'dbo.tIdentity' can only be pecified when a column list is used and IDENTITY_INSERT is ON.

The Solution

Let’s re-read the error. It clearly gives an indication of what the issue is – if we need to insert an explicit value into Identity columns, we need to explicitly use column lists in our insert statements, as shown below.

USE tempdb;
GO
SET NOCOUNT ON;
--Prepare the environment
--Create a table, and add some test data into it
--Safety Check
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

--Create a table
CREATE TABLE dbo.tIdentity (IdentityId INT IDENTITY(1,1),
                            IdentityValue VARCHAR(10)
                           );
GO

--Turn Explicit IDENTITY_INSERT ON and insert duplicate IDENTITY values
SET IDENTITY_INSERT dbo.tIdentity ON;
GO

--NOTE: Column list has been supplied in the INSERT,
--      so, no errors will be encountered    
INSERT INTO dbo.tIdentity ([IdentityId], [IdentityValue])
VALUES (1, 'One'),
       (2, 'Two');
GO

--Confirm that data has been inserted
SELECT IdentityId,
       IdentityValue
FROM dbo.tIdentity;
GO

--Now that data has been inserted, turn OFF IDENTITY_INSERT
SET IDENTITY_INSERT dbo.tIdentity OFF;
GO

-----------------------------------------------------------------
--RESULTS
----------
--IdentityId  IdentityValue
--1           One
--2           Two
 -----------------------------------------------------------------

Hope you will find this helpful.

Untill we meet next time,

Be courteous. Drive responsibly.

#0400 – SQL Server – SSIS – Using the SQL Server Destination

Summary

The SQL Server Destination is recommended instead of the OLE DB destination if the SSIS package is to be executed on the same machine/server where the target Microsoft SQL Server instance is located. Below are the finer points about the SQL Server destination:

The SSIS package must be executed on the same server where the Microsoft SQL Server instance is located
The Shared Memory protocol for data exchange is enabled for the instance from the SQL Server Configuration Manager
- Warning: This may need local security policy updates if User Access Control (UAC) is configured
SQL Server destination
- Only works with OLE DB connection managers (ODBC is not supported)
- Supports only one input
- Does not support an error output
- Performs bulk insert of data
- Allows leveraging of fast load options of the OLE DB connection

#0399 – SQL Server – SSIS – Debugging – Exploring the Data Viewer

1 Reply

I often get questions regarding debugging of SSIS packages with the most common scenario being a need to “see” the data flowing down the data pipeline. SQL Server Integration Services (SSIS) packages are explored and edited visually within the SQL Server Data Tools (SSDT) and hence, the mechanism to troubleshoot an SSIS package when developing it also has to be visual.

The Data Viewer allows a developer to pause the data flow in a data flow task and look at the data as it is on that point in the pipeline. Today, I will introduce you to the Data Viewer.

Sample SSIS package to demonstrate Data Viewer on the Data Flow Pipeline

The data flow is quite simply taking all the records from the [HumanResources].[Employee] table of the sample [AdventureWorks2014] database and putting into an object variable via the Recordset destination (I am using the Recordset destination for the demo here for the sake of simplicity).

Using the Data Viewer

If I want to study the data flowing through the data pipeline, all I need to do is right-click on the data flow path and choose “Enable Data Viewer” (a data flow path being the connection between a source and a transformation, between two transformations or a transformation and a destination).

Enabling the Data Viewer on a Data Flow Pipeline

Magnifying glass indicates Data Viewer is enabled

As can be seen from the screen grab above, a magnifying glass icon on the data flow now appears indicating that the data flow is configured for viewing.

If I execute the SSIS package at this point, I see that the data flow pauses just before it starts writing to the destination and opens a new grid window. The window resembles a normal dataset viewer (if you are familiar with developing C# or ASP.NET applications in Visual Studio, you would feel right at home!) which contains the data flowing through the data pipeline. You can even copy this grid to Excel or any other file for further research (Tip!).

Data Viewer showing data flowing through the data path

Once you have studied the data flowing through, you can choose to either stop the transaction (by stopping execution of the package) or allowing the package to execute through by clicking on the “green” arrow on the data viewer.

Allowing the data flow to continue down the pipeline by allowing the package to resume execution

Filtering columns displayed on the Data Viewer

Sometimes, we may not want to sift through all the columns in the data pipeline. If we know that a particular column is causing some problems, we may just want to monitor that column. One can explicitly choose which columns should be displayed on the data viewer by going into the data flow properties.

Right-click on the data flow path and choose “Edit”
Go to “Data Viewer”
Use the arrow buttons to selectively choose which columns to display in the data viewer

Selectively choosing columns visible on the Data Viewer

Summary

The Data Viewer can be used for design time troubleshooting of an SSIS package, allowing developers to pause the data flow and monitor the data flowing down the data pipeline.

You can have multiple data viewers in your data flow task so that you can monitor each part of the data flow
You can control the columns seen in the data viewer so that you can focus on the fields of interest
You can copy this data for further research/maintaining a record
If you have transformation components that use 64-bit components, you may need to turn off the “Run64BitRuntime” under the Debug options (Debug -> Solution Properties -> Configuration Properties -> Debugging)

#0398 – SQL Server – SSIS – Performing a DISTINCT using the SORT transformation

1 Reply

Developing SSIS packages is quite easy – it’s mostly drag and drop and some minor configuration. Choosing the right transformations and components and making them work optimally is a task that is heavily dependent upon the use case at hand.

Normally, when working with relational data sources, I am averse to using the SORT transformation within a data flow, because it is a blocking transformation. Blocking transformations not only consume client memory, they also act as roadblocks in what should be an uninterrupted data pipeline. The performance tip (!) therefore is to prefer that sorting, if required, is done as part of the data source query itself so that the data pipeline gets presorted data.

However, presorting of data may not be possible in all cases. For example, if you are receiving data from a web-service, XML data source or any other 3^rd party custom component. A SORT transformation becomes essential in such scenarios, but one of the areas where I have seen most people stumble upon is that a custom query is later used to remove unnecessary records from the set. The custom query is not required in most cases, and I will be demonstrating how to use configure the SORT transformation to also perform a DISTINCT.

Demo

Assume that:

You just need a list of unique patrons who visited a local library this year in order of their registration Id for sending out some promotional mails
The promotions need to be sent irrespective of the visit date and number of visits within the year
The input (log of library visits) is coming from a web-service of some sort over which you as an integration team member have no control over
Following is a sample of the test data that comes over from the web-service

NOTE: For the purposes of this demo, I will use a T-SQL SELECT to generate my test data, but I would request the reader to assume that the data is coming from a non-relational source.

SELECT LibraryVisitLog.PatronId,
       LibraryVisitLog.PatronName,
       LibraryVisitLog.VisitDate
FROM (VALUES (1201,'John Doe',  '2017-01-05'),
             (1180,'John Smith','2017-01-05'),
             (1201,'John Doe',  '2017-01-06'),
             (1201,'John Doe',  '2017-01-07'),
             (1201,'John Doe',  '2017-01-08'),
             (1201,'John Doe',  '2017-01-09'),
             (1201,'John Doe',  '2017-01-10'),
             (1280,'Jane Doe',  '2017-01-10'),
             (1201,'John Doe',  '2017-01-11'),
             (1201,'John Doe',  '2017-01-12'),
             (1210,'Jack Smith','2017-01-12'),
             (1201,'John Doe',  '2017-01-13'),
             (1201,'John Doe',  '2017-01-14'),
             (1180,'John Smith','2017-01-14')
     ) AS LibraryVisitLog (PatronId, PatronName, VisitDate)

Sample Test Data

We only want the information about the following 4 (four) unique patrons ordered by their Id down the pipeline.

Patron Id	Patron Name
1180	John Smith
1201	John Doe
1210	Jack Smith
1280	Jane Doe

Configuring the SSIS package to achieve this is quite simple. As shown in the image below, I have a SORT transformation between my data source and target.

Overview of the sample SSIS package using the SORT operator

Overview of the demo SSIS package

The SORT transformation is configured to:

Ignore the VisitDate (as established in the requirements above)
SORT based on the PatronId
Pass the PatronName through the pipleline

Configuring the SORT operator to perform a DISTINCT on the output rows

In addition, notice that I have checked the checkbox “Remove rows with duplicate sort values” in the bottom left corner. This switch controls whether the transformation copies duplicate rows on the transformation output or creates a single, unique entry for all duplicates.

To actually see the data flowing through the pipeline, I have enabled DataViewers on each stage of the data flow. You can see the actual data flowing through the pipeline in the screenshots provided below:

Demonstrating the ability of a SORT transformation to perform a DISTINCT on the data

Data Viewer showing input data

Data Viewer showing sorted and Distinct data

Summary

With respect to the SORT operator and achieving a balance between the performance and limitations of the data sources used by an SSIS package:

It is a best practice to pre-sort the data at the data source before sending it across a data flow pipeline
In case pre-sorting is not possible, but sorting is required, one can use the SORT transformation
Leverage the SORT transform to filter out unique records from the sorted output – custom logic is not required to do so

SQLTwins by Nakul Vachhrajani

SQL Server tips and experiences dedicated to my twin daughters.

Tag Archives: Series

#0421 – SQL Server – SSIS – Wild card search to find if a file exists in a folder or directory

The solution

Further Reading:

#0416 – SQL Server – Msg 8101 – Use column lists when working with IDENTITY columns

The Solution

#0400 – SQL Server – SSIS – Using the SQL Server Destination

Summary

Further Reading

#0399 – SQL Server – SSIS – Debugging – Exploring the Data Viewer

Using the Data Viewer

Filtering columns displayed on the Data Viewer

Summary

Further Reading

#0398 – SQL Server – SSIS – Performing a DISTINCT using the SORT transformation

Demo

Summary

Further Reading