Tag Archives: Series

Articles that are too big to fit into one part.

#0395 – SQL Server – SSIS – Adding date and time to a file name after processing


As we have been discussing over the last couple of weeks, moving data from a “source” to a “destination” is the crux of SSIS. The data source can be literally anything – a SQL Server  database, a web-service, a script, an XML or even a conventional delimited file.

Picking up files from a folder and processing them is one of the most common use-case that I have seen. In order to avoid processing the same file over and over again, the most common requirement is to append the file name with a date-time value. In this post, I will provide an expression I use to achieve this goal.

Appending a Date/Time and renaming a file in SSIS

The easiest way to append a date/time value to a file name and renaming the file in SSIS is via SSIS expressions.

Put every simply – an expression is a formula, a combination of variables, literals and operators that work together and evaluate to a single value.

We can use the following expression to yield a date value:

SUBSTRING(@[User::FileName], 1, FINDSTRING(@[User::FileName],".",1) -1 ) + (DT_WSTR,4)YEAR(GETDATE()) + (DT_WSTR,2)MONTH(GETDATE()) + (DT_WSTR,2)DAY(GETDATE())

This can then be appended to the file name and a simple File System Task can rename the file for us on the file system. Let’s see this in action with an example.

Demo

Assume a scenario where I have a set of files in a folder, and I need to do some processing on them. After processing, I need to update the file names with the date. For the sake of brevity of this example, I will not be performing any other operation on the files other than renaming them.

My folder containing the input files looks like this:

Folder with input files which need to be renamed once processing is complete

Folder with input files which need to be renamed once processing is complete

Now, I create an SSIS package that contains a simple For Each file iterator and a File System task.

For Each Loop configuration

The For Each Loop (configured as a file iterator) uses an expression populated by a variable to get the source directory information.

The file name and extension is fetched into a variable which will be used during the processing and subsequent renaming.

0395-sqltwins-overall-package

Overall package configuration showing the For Each File Iterator

0395-sqltwins-foreachloop-collection-configuration

For Each Iterator – Collection configuration using a variabel for the source directory

0395-sqltwins-foreachloop-fetching-file-name

Fetching the individual file name & extension into a variable

Now, I create a new user variable “OutputFileName” and use the expression below to generate the output file name. The expression has essentially 3 distinct parts:

  1. Fetch the file name (without the extension)
  2. Append the date to this string
  3. Fetch the file extension and append to the modified/new file name
@[User::SourceFolder] + 
SUBSTRING(@[User::FileName], 1, FINDSTRING(@[User::FileName],".",1) -1 ) + (DT_WSTR,4)YEAR(GETDATE()) + (DT_WSTR,2)MONTH(GETDATE()) + (DT_WSTR,2)DAY(GETDATE()) + 
SUBSTRING(@[User::FileName], FINDSTRING(@[User::FileName],".",1), LEN(@[User::FileName]))
Output File Path expression

Output File Path expression

This variable is now used in the configuration of the file system task which is responsible for renaming the file.

File System Task configuration

The file system task is an extremely flexible task component in SSIS. It can operate on the file system not only by using file connections but also on the basis of variables! For our problem, we will leverage this flexibility of the File System task.

As can be seen from the screenshot below, my File System task has been configured as follows:

  • Source File Path is a variable
  • Destination File Path is a variable
  • File Operation type = “Rename File”

The “Rename File” operation renames the file specified by the old file path and renames it to the name specified by the new file path.

File System Task configuration using “Rename File” mode where input & output file names are supplied via variables

Once the package is executed, we can see that the files in the source folder are now updated as expected.

Source Folder with the File Names updated as expected

Source Folder with the File Names updated as expected

Further Reading

Until we meet next time,

Be courteous. Drive responsibly.

Advertisements
High-level usage of the File Ordering script task

#0394 – SQL Server – SSIS – ForEach Loop – Controlling processing order


One of the SSIS related questions that I see often in the forums is around the ability to control the sequence in which a For Each Loop processes the files from a directory. When performing data movement as part of an ETL or an integration, it may be required to work asynchronously. The “sender” of the data may place the files in a folder for the data processing to pickup. These files generally contain ordered data that needs to be loaded in a specific sequence.

The problem is that the SSIS For Each Loop container does not have the provision that allows a user to specify the order in which the files are to be processed. Today, I present one of the workarounds that I came up with (there may be others, and I would be very interested to know what do you prefer in these scenarios).

Environment Prep

To demonstrate the ordered loading of files, I created a set of simple files  and a table to which I will be adding the data read from these files.

USE [tempdb];
GO

IF OBJECT_ID('dbo.ImportedData','U') IS NOT NULL
BEGIN
    DROP TABLE dbo.ImportedData;
END
GO


CREATE TABLE dbo.ImportedData 
   (RecordId           INT          NOT NULL IDENTITY(1,1),
    ProcessedFileName  NVARCHAR(255) NOT NULL,
    RecordValue        NVARCHAR(100)     NULL,
    ProcessedDate      DATETIME     NOT NULL,
    CONSTRAINT pk_ImportedData PRIMARY KEY CLUSTERED (RecordId, ProcessedDate)
   );
GO

To demonstrate the fix, I will not be processing the files in order of their file names, but I will instead be processing them in order of  the date on which they were last saved. The screenshot below shows the order in which files will be processed.

Folder showing files to process in the required non-alphabetical order

Folder showing files to process in the required non-alphabetical order

Core Logic – Identify the Order of the Files to Process

The For Each container is just a mechanism to loop through a set (of objects/files/records). It does not have the provision to control the order in which the set is to be iterated through.

Hence, the workaround that I came up with was to supply a pre-ordered set to the For Each container. I therefore used a script task to order the filenames in a recordset object which I will pass to the For Each container.

As shown below, I pass the path to the folder where the files are located, and accept the sorted list into an object.

High-level usage of the File Ordering script task

High-level usage of the File Ordering script task

The script used within the script task which is also the heart of this solution is provided below.

public void Main()
{
    // TODO: Add your code here
    DirectoryInfo di = new DirectoryInfo((string)Dts.Variables["User::FolderPath"].Value);
    List<string> files = new List<string>();

    foreach (var fi in di.EnumerateFiles().OrderBy(t=>t.LastWriteTime))
    {
        files.Add(fi.FullName);
    }

    Dts.Variables["User::FileListing"].Value = files;

    Dts.TaskResult = (int)ScriptResults.Success;
}

Process the File

The  next stage is simply to process the file.

  • The sorted set of files is iterated upon by the For Each container
  • I have a script task that simply prints out a message box with the file being processed currently
  • The data flow task simply reads from the file and writes to a conventional OLE DB destination

Screenshots showing these bits of the configuration are provided below.

0394-sqltwins-ssis-foreachcontainer

For Each Loop Iteration Configuration

0394-sqltwins-ssis-foreachcontainer2

For Each Loop File Name configuration

0394-sqltwins-ssis-dfd_fileconnection

Setting the file connection based on the File Name set by For Each Iterator

If I execute the SSIS package and then query the destination table, we see that the files were indeed ordered in the expected order.

USE [tempdb];
GO
SELECT * 
FROM dbo.ImportedData 
WHERE RecordValue LIKE '%1%';
GO
0394-sqltwins-ssis-output

Output of script to confirm that data was loaded in the required sequence

Further Reading

Until we meet next time,

Be courteous. Drive responsibly.