Most enterprise products require that they be able to audit data modifications being made within the application. An important aspect of auditing is to be able to identify changes made outside of the application too, i.e. updates directly made on the data using a direct connection to the SQL Server instance. I recently authored a series of articles on SQLServerCentral.com around data change and tamper detection mechanisms available in Microsoft SQL Server. These are:
- An in-depth look at change detection in SQL Server – Part 01 [Link]
- An in-depth look at change detection in SQL Server – Part 02 [Link]
- HASHBYTES: Is CHECKSUM really required? [Link]
Soon after the publication of these articles, a coincidence at work prompted me to write this post.
One of our application modules used HASHBYTES for detecting data change. We were receiving random reports that the quality assurance engineers were receiving auditing errors. As we tracked the issue down through the SQL Server Profiler, we noticed that the underlying error was a “simple” string truncation error:
Msg 8152, Level 16, State 10, Line 5
String or binary data would be truncated.
Root Cause
The following line from the Books-On-Line help for HASHBYTES [Link] caught our attention:
Allowed input values are limited to 8000 bytes.
As soon as we read it, things became very clear to us. Taking a quick look at the schema showed us that the underlying field being checked for changes was an NVARCHAR(MAX) field and the users were attempting to enter data greater than 4000 characters (i.e. 8000 bytes) causing the issue. As long as the data remained less than 4000 characters, no error was encountered.
A demo
The script below demonstrates the expected behaviour of HASHBYTES given that the input string is less than or equal to the 8000 byte limit. For the sake of brevity, I am using the REPLICATE function to create the string.
USE tempdb;
GO
--Declare an NVARCHAR((MAX) variable
DECLARE @tHashInputs NVARCHAR(MAX);
--Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE('a',4000) AS NVARCHAR(MAX));
--Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES('MD5',ISNULL(@tHashInputs,''));
GO
Now, let me alter the script such that the string being evaluated using HASHBYTES is greater than the 8000 byte limit, i.e. is 4001 characters for the NVARCHAR data-type:
USE tempdb;
GO
--Declare an NVARCHAR((MAX) variable
DECLARE @tHashInputs NVARCHAR(MAX);
--Create a 4000 character string
SELECT @tHashInputs = CAST(REPLICATE('a',4001) AS NVARCHAR(MAX));
--Attempt to generate the HASHBYTEs value for this string
SELECT HASHBYTES('MD5',ISNULL(@tHashInputs,''));
GO
The following error is encountered upon executing the batch:
Msg 8152, Level 16, State 10, Line 5
String or binary data would be truncated.
As can be seen from the example above, HASHBYTES will fail for data beyond the documented 8000 character limit. Hence, areas like free-form comments may not be a suitable candidate for usage of HASHBYTES.
Solution
The only known workarounds that we have currently are:
- Split the strings into multiple parts before generating the HASHBYTES value
- Opt for any other change detection algorithm
Have you ever encountered such a situation before? If yes, what solution did you apply to overcome it?
References:
- HASHBYTES – Books On Line page [Link]
- An in-depth look at change detection in SQL Server – Part 01 [Link]
- An in-depth look at change detection in SQL Server – Part 02 [Link]
- HASHBYTES: Is CHECKSUM really required? [Link]
Until we meet next time,
