#0374 – SQL Server – Removing string terminators (“\0”) using REPLACE during string concatenation


Recently, I was called upon to troubleshoot a strange behaviour demonstrated by a data conditioning script that involved string concatenations. The script ran fine without any errors, but the script did not appear to be concatenating string.

The entire script for this post is shared towards the end of the post. Because the script involves creating specific data which would give away the root cause, I will not be presenting the snippets beforehand.

Assume that we have a simple table with two columns, “FirstName” and “LastName”:

Simple table with test data for string concatenation demo

Simple table with test data for string concatenation demo

The data conditioning script involved populating the “FullName” column in the table with a simple combination of the First and the Last names.

Output of String Concatenation Script demonstrating the problem

Output of String Concatenation Script demonstrating the problem

The Problem: If we look at the output carefully, there is a problem with the FullNames for rows # 2 and 4. Although the LastName is present, only the FirstName is seen in the concatenation result.

The Theory:

After about an hour of troubleshooting, we decided to check out the length of the strings in the table, and that’s when we hit gold. Although we could “see” only a couple of characters, the length was turning out to be a higher than what we expected.

As can be seen from the screenshot below, although the First Name “John” has a length of 4, we get 5 in the length. Similarly, although the FullName shows up as “John”, we get a length of 9.

Screenshot showing the length of the strings in the table

Screenshot showing the length of the strings in the table

This is the moment when the light bulb went off and we realized what was going on. The strings were inserted by a legacy application based on C/C++ code. In such legacy applications, we need to explicitly handle termination of strings by adding the string termination character. That would very well account for the presence of an additional character in the FirstName.

In order to explain the mystery behind the length of the FullName, let’s walk through the concatenation of a record.

  • FirstName = John, 4 characters
  • LastName = Doe, 3 characters
  • Expected length of FullName = First Name + a space + Last Name = 4 + 1 + 3 = 8 characters
  • Actual length of FullName = 9 characters

The difference can be accurately explained when we believe that the concatenation did actually happen – only thing is that we are unable to see the LastName part of the string because the system encounters the string termination character, causing it to stop displaying more characters from the string.

In order to confirm our theory, we replaced the string termination character with a hyphen (-), using the REPLACE function.

UPDATE pt
SET pt.FullName = REPLACE(pt.FullName COLLATE Latin1_General_BIN, CHAR(0) COLLATE Latin1_General_BIN, '-')
FROM @personTable AS pt;

In order for this to work in all environments, we changed the collation to binary when performing the replace (because ultimately, string terminators are just a set of bits when performing binary manipulation).

Finding and Replacing the String Terminator in a string.

Finding and Replacing the String Terminator in a string.

As can be seen from the screenshot above, the REPLACE was successful, and we were able to see the entire string.

In Conclusion

  • When working with data created by legacy code, it is useful to understand how the code works. In this case, we realized that the string terminator was causing a problem and were able to overcome it – but it could have led to hours of troubleshooting (an option of re-creating data manually was also put on the table)
  • SQL Server, and T-SQL can be trusted when it comes to data manipulation. Almost always it’s the system or the human element that is missing something critical

Until we meet next time,
Be courteous. Drive responsibly.

Script for this post:

USE tempdb;
GO
--Creating the sample table
DECLARE @personTable TABLE (FirstName VARCHAR(50),
                            LastName  VARCHAR(50),
                            FullName  VARCHAR(100)
                           );

--Insert some test data
INSERT INTO @personTable (FirstName, LastName)
VALUES ('Nakul','Vachhrajani'),
       ('John' + CHAR(0),'Doe'),
       ('Jack','Smith'),
       ('FirstName' + CHAR(0),'LastName');

--Check out the data
SELECT pt.FirstName,
       pt.LastName
FROM @personTable AS pt;

--Perform the string concatenation
UPDATE pt
SET pt.FullName = pt.FirstName + ' ' + pt.LastName
FROM @personTable AS pt;


--Check out the data
SELECT pt.FirstName,
       pt.LastName,
       pt.FullName
FROM @personTable AS pt;

--Checking the length of the data
SELECT pt.FirstName,
       LEN(pt.FirstName) AS FirstNameLength,
       pt.LastName,
       LEN(pt.LastName) AS LastNameLength,
       pt.FullName,
       LEN(pt.FullName) AS FullNameLength
FROM @personTable AS pt;

--Confirming presence of string termination characters
SELECT pt.FirstName,
       LEN(pt.FirstName) AS FirstNameLength,
       CHARINDEX(CHAR(0),pt.FirstName,1) AS LocationOfStringTerminator,
       pt.LastName,
       LEN(pt.LastName) AS LastNameLength,
       pt.FullName,
       LEN(pt.FullName) AS FullNameLength,
       CHARINDEX(CHAR(0),pt.FullName,1) AS LocationOfStringTerminatorInFullName
FROM @personTable AS pt;

--Replace the string termination character with a hyphen
UPDATE pt
SET pt.FullName = REPLACE(pt.FullName COLLATE Latin1_General_BIN, CHAR(0) COLLATE Latin1_General_BIN, '-')
FROM @personTable AS pt;

--Confirming that string termination characters are no longer present
SELECT pt.FirstName,
       LEN(pt.FirstName) AS FirstNameLength,
       CHARINDEX(CHAR(0),pt.FirstName,1) AS LocationOfStringTerminator,
       pt.LastName,
       LEN(pt.LastName) AS LastNameLength,
       pt.FullName,
       LEN(pt.FullName) AS FullNameLength,
       CHARINDEX(CHAR(0),pt.FullName,1) AS LocationOfStringTerminatorInFullName
FROM @personTable AS pt;
GO
Advertisements

2 thoughts on “#0374 – SQL Server – Removing string terminators (“\0”) using REPLACE during string concatenation

  1. vikramkmahapatra

    you could also use
    1)RESULT TO TEXT option to see whether it is concatenating string internally
    2)export result of concatenation to excel/csv to see whether it is actually concatenating.

    Like

    Reply
  2. Pingback: #0375 – SQL Server -Collation conflicts may occur when working with string functions (REPLACE, SUBSTRING, etc) | SQLTwins by Nakul Vachhrajani

Let me know what you think about this post by leaving your feedback here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s