#0253-SQL Server-HASHBYTES-String or binary data would be truncated: Msg 8152

Most enterprise products require that they be able to audit data modifications being made within the application. An important aspect of auditing is to be able to identify changes made outside of the application too, i.e. updates directly made on the data using a direct connection to the SQL Server instance. I recently authored a series of articles on SQLServerCentral.com around data change and tamper detection mechanisms available in Microsoft SQL Server. These are:

An in-depth look at change detection in SQL Server – Part 01 [Link]
An in-depth look at change detection in SQL Server – Part 02 [Link]
HASHBYTES: Is CHECKSUM really required? [Link]

Soon after the publication of these articles, a coincidence at work prompted me to write this post.

One of our application modules used HASHBYTES for detecting data change. We were receiving random reports that the quality assurance engineers were receiving auditing errors. As we tracked the issue down through the SQL Server Profiler, we noticed that the underlying error was a “simple” string truncation error:

Msg 8152, Level 16, State 10, Line 5
String or binary data would be truncated.

Root Cause

The following line from the Books-On-Line help for HASHBYTES [Link] caught our attention:

Allowed input values are limited to 8000 bytes.

As soon as we read it, things became very clear to us. Taking a quick look at the schema showed us that the underlying field being checked for changes was an NVARCHAR(MAX) field and the users were attempting to enter data greater than 4000 characters (i.e. 8000 bytes) causing the issue. As long as the data remained less than 4000 characters, no error was encountered.

A demo

The script below demonstrates the expected behaviour of HASHBYTES given that the input string is less than or equal to the 8000 byte limit. For the sake of brevity, I am using the REPLICATE function to create the string.

USE tempdb;
GO
--Declare an NVARCHAR((MAX) variable
DECLARE @tHashInputs NVARCHAR(MAX);
--Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE('a',4000) AS NVARCHAR(MAX));

--Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES('MD5',ISNULL(@tHashInputs,''));
GO

Now, let me alter the script such that the string being evaluated using HASHBYTES is greater than the 8000 byte limit, i.e. is 4001 characters for the NVARCHAR data-type:

USE tempdb;
GO
--Declare an NVARCHAR((MAX) variable
DECLARE @tHashInputs NVARCHAR(MAX);
--Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE('a',4001) AS NVARCHAR(MAX));

--Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES('MD5',ISNULL(@tHashInputs,''));
GO

The following error is encountered upon executing the batch:

Msg 8152, Level 16, State 10, Line 5

String or binary data would be truncated.

As can be seen from the example above, HASHBYTES will fail for data beyond the documented 8000 character limit. Hence, areas like free-form comments may not be a suitable candidate for usage of HASHBYTES.

Solution

The only known workarounds that we have currently are:

Split the strings into multiple parts before generating the HASHBYTES value
Opt for any other change detection algorithm

Have you ever encountered such a situation before? If yes, what solution did you apply to overcome it?

References:

HASHBYTES – Books On Line page [Link]
An in-depth look at change detection in SQL Server – Part 01 [Link]
An in-depth look at change detection in SQL Server – Part 02 [Link]
HASHBYTES: Is CHECKSUM really required? [Link]

Until we meet next time,

Be courteous. Drive responsibly.

5 thoughts on “#0253-SQL Server-HASHBYTES-String or binary data would be truncated: Msg 8152”

riks April 1, 2013 at 3:55 AM

Hi , I do not have idea about HASHBYTES but if we use VARCHAR datatype instead of NVARCHAR it would allow

USE tempdb;
GO
–Declare an VARCHAR((MAX) variable
DECLARE @tHashInputs VARCHAR(MAX);
–Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE(‘a’,4001) AS VARCHAR(MAX));

–Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES(‘MD5’,ISNULL(@tHashInputs,”));
GO

but for HASHBYTES if it is eecessary NVARCHAR than split string into multiple parts is better solution.

LikeLike

Reply ↓

Nakul Vachhrajani April 1, 2013 at 11:55 AM

@riks: Thank-you for your comments. The prime difference between VARCHAR and NVARCHAR is that NVARCHAR allows us to store Unicode data. Since our application needs to store Unicode data, NVARCHAR was the data-type of choice.

In case the application can use VARCHAR, then you are very correct. HASHBYTES will be able to go upto 8000 characters without encountering an error. The reason is that NVARCHAR occupies 2 bytes as opposed to VARCHAR which occupies only 1 byte of storage space.

LikeLike

Reply ↓

riks April 1, 2013 at 11:15 PM

@Nakul : Thanks for reply. Is application data type specific? can you give some example that this particular app based on unicode data type and other is based on non unicode data type.

LikeLike

Reply ↓

Pingback: #0362 – SQL Server – Change Detection in Microsoft SQL Server – Limitations of T-SQL: BINARY_CHECKSUM and CHECKSUM | SQLTwins by Nakul Vachhrajani

Pingback: #0362 – SQL Server – Change Detection in Microsoft SQL Server – Limitations of T-SQL: BINARY_CHECKSUM and CHECKSUM - SQL Server - SQL Server - Toad World

Let me know what you think about this post by leaving your feedback here! Cancel reply

riks April 1, 2013 at 3:55 AM

Hi , I do not have idea about HASHBYTES but if we use VARCHAR datatype instead of NVARCHAR it would allow

USE tempdb;
GO
–Declare an VARCHAR((MAX) variable
DECLARE @tHashInputs VARCHAR(MAX);
–Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE(‘a’,4001) AS VARCHAR(MAX));

–Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES(‘MD5’,ISNULL(@tHashInputs,”));
GO

but for HASHBYTES if it is eecessary NVARCHAR than split string into multiple parts is better solution.

LikeLike

Reply ↓
Nakul Vachhrajani April 1, 2013 at 11:55 AM

@riks: Thank-you for your comments. The prime difference between VARCHAR and NVARCHAR is that NVARCHAR allows us to store Unicode data. Since our application needs to store Unicode data, NVARCHAR was the data-type of choice.

In case the application can use VARCHAR, then you are very correct. HASHBYTES will be able to go upto 8000 characters without encountering an error. The reason is that NVARCHAR occupies 2 bytes as opposed to VARCHAR which occupies only 1 byte of storage space.

LikeLike

Reply ↓
riks April 1, 2013 at 11:15 PM

@Nakul : Thanks for reply. Is application data type specific? can you give some example that this particular app based on unicode data type and other is based on non unicode data type.

LikeLike

Reply ↓
Pingback: #0362 – SQL Server – Change Detection in Microsoft SQL Server – Limitations of T-SQL: BINARY_CHECKSUM and CHECKSUM | SQLTwins by Nakul Vachhrajani
Pingback: #0362 – SQL Server – Change Detection in Microsoft SQL Server – Limitations of T-SQL: BINARY_CHECKSUM and CHECKSUM - SQL Server - SQL Server - Toad World

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Root Cause

A demo

Solution

References:

Spread the word around! Share via:

Related

5 thoughts on “#0253-SQL Server-HASHBYTES-String or binary data would be truncated: Msg 8152”

Let me know what you think about this post by leaving your feedback here! Cancel reply