#0253-SQL Server-HASHBYTES-String or binary data would be truncated: Msg 8152


Most enterprise products require that they be able to audit data modifications being made within the application. An important aspect of auditing is to be able to identify changes made outside of the application too, i.e. updates directly made on the data using a direct connection to the SQL Server instance. I recently authored a series of articles on SQLServerCentral.com around data change and tamper detection mechanisms available in Microsoft SQL Server. These are:

  • An in-depth look at change detection in SQL Server – Part 01 [Link]
  • An in-depth look at change detection in SQL Server – Part 02 [Link]
  • HASHBYTES: Is CHECKSUM really required? [Link]

Soon after the publication of these articles, a coincidence at work prompted me to write this post.

One of our application modules used HASHBYTES for detecting data change. We were receiving random reports that the quality assurance engineers were receiving auditing errors. As we tracked the issue down through the SQL Server Profiler, we noticed that the underlying error was a “simple” string truncation error:

Msg 8152, Level 16, State 10, Line 5
String or binary data would be truncated.

Root Cause

The following line from the Books-On-Line help for HASHBYTES [Link] caught our attention:

Allowed input values are limited to 8000 bytes.

As soon as we read it, things became very clear to us. Taking a quick look at the schema showed us that the underlying field being checked for changes was an NVARCHAR(MAX) field and the users were attempting to enter data greater than 4000 characters (i.e. 8000 bytes) causing the issue. As long as the data remained less than 4000 characters, no error was encountered.

A demo

The script below demonstrates the expected behaviour of HASHBYTES given that the input string is less than or equal to the 8000 byte limit. For the sake of brevity, I am using the REPLICATE function to create the string.

USE tempdb;
GO
--Declare an NVARCHAR((MAX) variable
DECLARE @tHashInputs NVARCHAR(MAX);
--Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE('a',4000) AS NVARCHAR(MAX));

--Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES('MD5',ISNULL(@tHashInputs,''));
GO

image

Now, let me alter the script such that the string being evaluated using HASHBYTES is greater than the 8000 byte limit, i.e. is 4001 characters for the NVARCHAR data-type:

USE tempdb;
GO
--Declare an NVARCHAR((MAX) variable
DECLARE @tHashInputs NVARCHAR(MAX);
--Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE('a',4001) AS NVARCHAR(MAX));

--Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES('MD5',ISNULL(@tHashInputs,''));
GO 

The following error is encountered upon executing the batch:

Msg 8152, Level 16, State 10, Line 5

String or binary data would be truncated.

As can be seen from the example above, HASHBYTES will fail for data beyond the documented 8000 character limit. Hence, areas like free-form comments may not be a suitable candidate for usage of HASHBYTES.

Solution

The only known workarounds that we have currently are:

  • Split the strings into multiple parts before generating the HASHBYTES value
  • Opt for any other change detection algorithm

Have you ever encountered such a situation before? If yes, what solution did you apply to overcome it?

References:

  • HASHBYTES – Books On Line page [Link]
  • An in-depth look at change detection in SQL Server – Part 01 [Link]
  • An in-depth look at change detection in SQL Server – Part 02 [Link]
  • HASHBYTES: Is CHECKSUM really required? [Link]

Until we meet next time,

Be courteous. Drive responsibly.

Advertisements

5 thoughts on “#0253-SQL Server-HASHBYTES-String or binary data would be truncated: Msg 8152

  1. riks

    Hi , I do not have idea about HASHBYTES but if we use VARCHAR datatype instead of NVARCHAR it would allow

    USE tempdb;
    GO
    –Declare an VARCHAR((MAX) variable
    DECLARE @tHashInputs VARCHAR(MAX);
    –Create a 4000 character string
    SELECT @tHashInputs = CAST(REPLICATE(‘a’,4001) AS VARCHAR(MAX));

    –Attempt to generate the HASHBYTEs value for this string
    SELECT HASHBYTES(‘MD5’,ISNULL(@tHashInputs,”));
    GO

    but for HASHBYTES if it is eecessary NVARCHAR than split string into multiple parts is better solution.

    Like

    Reply
  2. Nakul Vachhrajani

    @riks: Thank-you for your comments. The prime difference between VARCHAR and NVARCHAR is that NVARCHAR allows us to store Unicode data. Since our application needs to store Unicode data, NVARCHAR was the data-type of choice.

    In case the application can use VARCHAR, then you are very correct. HASHBYTES will be able to go upto 8000 characters without encountering an error. The reason is that NVARCHAR occupies 2 bytes as opposed to VARCHAR which occupies only 1 byte of storage space.

    Like

    Reply
  3. riks

    @Nakul : Thanks for reply. Is application data type specific? can you give some example that this particular app based on unicode data type and other is based on non unicode data type.

    Like

    Reply
  4. Pingback: #0362 – SQL Server – Change Detection in Microsoft SQL Server – Limitations of T-SQL: BINARY_CHECKSUM and CHECKSUM | SQLTwins by Nakul Vachhrajani

  5. Pingback: #0362 – SQL Server – Change Detection in Microsoft SQL Server – Limitations of T-SQL: BINARY_CHECKSUM and CHECKSUM - SQL Server - SQL Server - Toad World

Let me know what you think about this post by leaving your feedback here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s