One of the SSIS related questions that I see often in the forums is around the ability to control the sequence in which a For Each Loop processes the files from a directory. When performing data movement as part of an ETL or an integration, it may be required to work asynchronously. The “sender” of the data may place the files in a folder for the data processing to pickup. These files generally contain ordered data that needs to be loaded in a specific sequence.
The problem is that the SSIS For Each Loop container does not have the provision that allows a user to specify the order in which the files are to be processed. Today, I present one of the workarounds that I came up with (there may be others, and I would be very interested to know what do you prefer in these scenarios).
To demonstrate the ordered loading of files, I created a set of simple files and a table to which I will be adding the data read from these files.
IF OBJECT_ID('dbo.ImportedData','U') IS NOT NULL
DROP TABLE dbo.ImportedData;
CREATE TABLE dbo.ImportedData
(RecordId INT NOT NULL IDENTITY(1,1),
ProcessedFileName NVARCHAR(255) NOT NULL,
RecordValue NVARCHAR(100) NULL,
ProcessedDate DATETIME NOT NULL,
CONSTRAINT pk_ImportedData PRIMARY KEY CLUSTERED (RecordId, ProcessedDate)
To demonstrate the fix, I will not be processing the files in order of their file names, but I will instead be processing them in order of the date on which they were last saved. The screenshot below shows the order in which files will be processed.
Folder showing files to process in the required non-alphabetical order
Core Logic – Identify the Order of the Files to Process
The For Each container is just a mechanism to loop through a set (of objects/files/records). It does not have the provision to control the order in which the set is to be iterated through.
Hence, the workaround that I came up with was to supply a pre-ordered set to the For Each container. I therefore used a script task to order the filenames in a recordset object which I will pass to the For Each container.
As shown below, I pass the path to the folder where the files are located, and accept the sorted list into an object.
High-level usage of the File Ordering script task
The script used within the script task which is also the heart of this solution is provided below.
public void Main()
// TODO: Add your code here
DirectoryInfo di = new DirectoryInfo((string)Dts.Variables["User::FolderPath"].Value);
List<string> files = new List<string>();
foreach (var fi in di.EnumerateFiles().OrderBy(t=>t.LastWriteTime))
Dts.Variables["User::FileListing"].Value = files;
Dts.TaskResult = (int)ScriptResults.Success;
Process the File
The next stage is simply to process the file.
- The sorted set of files is iterated upon by the For Each container
- I have a script task that simply prints out a message box with the file being processed currently
- The data flow task simply reads from the file and writes to a conventional OLE DB destination
Screenshots showing these bits of the configuration are provided below.
For Each Loop Iteration Configuration
For Each Loop File Name configuration
Setting the file connection based on the File Name set by For Each Iterator
If I execute the SSIS package and then query the destination table, we see that the files were indeed ordered in the expected order.
WHERE RecordValue LIKE '%1%';
Output of script to confirm that data was loaded in the required sequence
Until we meet next time,
Be courteous. Drive responsibly.