Tag Archives: Tips

General Microsoft SQL Server tips

#0342 – SQL Server – LEN v/s DATALENGTH – A feature comparison


There are few typical questions that developers ask when developing applications or when troubleshooting an issue:

  • What is the length of this string?
  • How many bytes does this string take for storage?
  • How many bytes are occupied in storing this BLOB data?

The answer to all these questions lie in 2 T-SQL functions: The DATALENGTH() and the LEN(), which are often used interchangeably by developers who may not have fully understood the subtle differences between them.

Today, I present before you, a feature comparison of these two functions.

The LEN() function

Purpose: Returns the number of characters of the specified string expression, excluding trailing blanks.

Usage example: The following example demonstrates how to use the LEN() function:

SELECT LEN('Four');
GO
--Result: 4

Behaviour for Unicode Strings: Unicode strings store a wider range of characters and hence the same number of characters occupy more space. The LEN() function reports the number of characters, and not the space occupied within a UNICODE string.

SELECT LEN(N'Four');
GO
--Result: 4

Effect of white spaces: The LEN() function has a very interesting behaviour when the string expression being evaluated is padded with white spaces. The function trims out white spaces after a string, but does consider white spaces padded before the string.

SELECT LEN(' Four');
GO
--Result: 8
SELECT LEN('Four ');
GO
--Result: 4

 

Behaviour with non-string based data-types: Because the DATALENGTH() function returns the number of bytes consumed, it can work with any expression and return the number of bytes consumed. Here is an example of how DATALENGTH() reports the number of bytes accurately for non-character based data-types:

DECLARE @integerData INT = 4;
SELECT LEN(@integerData);
GO
--Result: 1 (the number of characters)
DECLARE @dataTimeValue DATETIME = GETDATE();
SELECT LEN(@dataTimeValue);
GO
--Result: 19 (the number of characters)

The DATALENGTH() function

Purpose: Check the length of any expression.

Usage example: The following example demonstrates how to use the DATALENGTH() function. From a usage perspective, there is no difference between DATALENGTH() and LEN()

SELECT DATALENGTH('Four');
GO
--Result: 4

Behaviour for Unicode Strings: Unicode strings store a wider range of characters and hence the same number of characters occupy more space. Unlike the LEN() function, the DATALENGTH() function returns the actual number of bytes consumed for the storage of the Unicode value.

SELECT DATALENGTH(N'Four');
GO
--Result: 8

Effect of white spaces: Because the DATALENGTH() function returns the number of bytes consumed, and not the number of characters in an expression, white spaces in a string are reported as-is, without any trimming. Here’s an example:

SELECT DATALENGTH(' Four');
GO
--Result: 8
SELECT DATALENGTH('Four ');
GO
--Result: 8

Behaviour with non-string based data-types: Because the DATALENGTH() function returns the number of bytes consumed, it can work with any expression and return the number of bytes consumed. Here is an example of how DATALENGTH() reports the number of bytes accurately for non-character based data-types:

DECLARE @integerData INT = 4;
SELECT DATALENGTH(@integerData);
GO
--Result: 4 (the number of bytes consumed for storage)
DECLARE @dataTimeValue DATETIME = GETDATE();
SELECT DATALENGTH(@dataTimeValue);
GO
--Result: 8 (the number of bytes consumed for storage)

Summary

The feature comparison between LEN() and DATALENGTH() can be made in the following table:

LEN() DATALENGTH()
Purpose Returns the number of characters of the specified string expression, excluding trailing blanks. Returns the number of bytes used to represent any expression.

Effect of Unicode Number of characters in the expression Number of bytes used for storage
Effect of white spaces Prefixed spaces: Considered as part of length

Suffixed spaces: Trimmed

Both prefixed & suffixed spaces are considered
Behaviour with non-string data-types Supported, but reports the number of characters as if it were a character string Reports the number of bytes used for storage

Further Reading

  • LEN function [MSDN Link]
  • DATALENGTH function [MSDN Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0341 – SQL Server – Random “String or binary data would be truncated” errors during bulk data loads


Interfacing multiple systems is always a great source of learning. In the various integrations that I have been a part of over the years, one of the many things that I have learnt is that no two systems have the same world view of the data that they store.

To take a very simple example, assume that we are interfacing a home-grown warehousing system with a 3rd party front desk PoS system. One of the first problems that most people will run into is that the supported length for product name is different – the warehousing system might be accepting a product name of let’s say 60 characters whereas the PoS may have a limit of 50 characters.

Integrations need to take care of this sort of a mismatch, but it’s not always easy. Sometimes, the interface specifications are vague or in case of home-grown legacy systems, possibly non-existent. In most enterprise integrations, a pre-production run is often done with a backup of the existing production data. When doing a data load from one system to another, it is quite possible that we run into situations wherein the tests run fine, but a production run fails with the following error:

Msg 8152, Level 16, State 14, Line 17
String or binary data would be truncated.

As the error suggests, it is what can very simply be termed as a buffer overflow – we are attempting to insert a larger string into a smaller container (table column). Because the data was the same, the question that comes up is:

Why was the “String or binary data would be truncated” error random and not reproducible at will?

Connection Settings – ANSI_WARNINGS

The answer to the puzzle lies in the connection settings for the SET option – ANSI_WARNINGS.

If the ANSI_WARNINGS is set to OFF, attempting to insert a longer string into a smaller column automatically truncates the string. The following test confirms the behaviour:

USE tempdb;
GO

--Create Test table
IF OBJECT_ID('dbo.AnsiWarningsTest','U') IS NOT NULL
DROP TABLE dbo.AnsiWarningsTest;
GO

CREATE TABLE dbo.AnsiWarningsTest (FourtyCharacterTest VARCHAR(40));
GO

--Default value, resetting for safety
SET ANSI_WARNINGS OFF;

--Check current value of ANSI_WARNINGS
IF (@@OPTIONS & 8) = 8
PRINT 'ANSI_WARNINGS are ON';
ELSE
PRINT 'ANSI_WARNINGS are OFF';

--Attempt to insert data longer than available column value
--This should work fine
INSERT INTO dbo.AnsiWarningsTest (FourtyCharacterTest)
SELECT REPLICATE('a',100);
GO

SELECT AnsiWarningsTest.FourtyCharacterTest
FROM dbo.AnsiWarningsTest;
GO

As can be seen from the screenshot below, the only 40 characters were successfully inserted into the table – dbo.AnsiWarningsTest. The string truncation was handled automatically by Microsoft SQL Server.

image

Now, we turn ANSI_WARNINGS OFF and repeat the same test.

--Set ANSI_WARNINGS to ON
SET ANSI_WARNINGS ON;

--Check current value of ANSI_WARNINGS
IF (@@OPTIONS & 8) = 8
PRINT 'ANSI_WARNINGS are ON';
ELSE
PRINT 'ANSI_WARNINGS are OFF';

--Attempt to insert data longer than available column value
--This SHOULD result into an exception
INSERT INTO dbo.AnsiWarningsTest (FourtyCharacterTest)
SELECT REPLICATE('b',100);
GO

Attempting to execute this immediately results into the following error in the “Messages” tab of the SSMS.

ANSI_WARNINGS are ON
Msg 8152, Level 16, State 14, Line 12
String or binary data would be truncated.

Conclusion

When working with bulk data imports, it is important to know the differences in the storage design of both systems. Once known, mismatches that may result into potential string termination errors should be explicitly handled by the SUSBTRING clause.

Further Reading

  • ANSI_WARNINGS [MSDN Link]
  • SUBSTRING [MSDN Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0276 – SQL Server – IDENTITY columns – Use IDENTITY() Function to change the Identity specification in a SELECT…INTO statement


We know that copying a table into another using the SELECT…INTO statement copies over the IDENTITY property also. However, it is possible that one might want to use an IDENTITY specification different from the source table in this process.


This can be achieved using the IDENTITY() function, which allows us to create a new column for the destination table when used with the SELECT…INTO clause. The function accepts upto 3 parameters – the data-type, the Identity seed and the Identity increment values for the new column.


The process described in this post can also be used to define an IDENTITY column in the destination table when the source table does not have one.


IDENTITY() – Demo


The script below demonstrates the usage of the IDENTITY column. I do not have an IDENTITY column on the source table. During the SELECT…INTO process, I wanted to create the new table with an IDENTITY specification in-place. To do so, I used the IDENTITY() function.

USE tempdb;
GO

SET NOCOUNT ON;

--1. Prepare the environment
--   Create a table, and add some test data into it

--Safety Check
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

--Create a table, notice that there are NO Identity columns on this table
CREATE TABLE dbo.tIdentity ( IdentityValue VARCHAR(10) );
GO

--Add some test data
INSERT INTO dbo.tIdentity (IdentityValue)
VALUES ('One'),
       ('Two'),
       ('Three'),
       ('Four'),
       ('Five');
GO

--2. Use SELECT..INTO to bulk insert data to a new table
SELECT IDENTITY(INT,100,1) AS DestinationId,
       SourceTable.IdentityValue AS DestinationValue
INTO dbo.DestinationTable
FROM dbo.tIdentity AS SourceTable ;

--2a. Fetch the value inserted into the destination table
SELECT DestinationTable.DestinationId,
       DestinationTable.DestinationValue
FROM dbo.DestinationTable;

--3. Check the properties of the new column - is it an IDENTITY column?
SELECT sc.name AS ColumnName,
       OBJECT_NAME(sc.object_id) AS TableName,
       sic.seed_value AS IdentitySeed,
       sic.increment_value AS IdentityIncrement,
       sic.is_nullable AS IsNullable,
       sic.last_value AS LastIdentityValueUsed
FROM sys.columns AS sc
INNER JOIN sys.identity_columns AS sic ON sc.object_id = sic.object_id
                                      AND sc.column_id = sic.column_id
WHERE ( sc.object_id = OBJECT_ID(N'dbo.DestinationTable',N'U') OR
        sc.object_id = OBJECT_ID(N'dbo.tIdentity',N'U')
      )
  AND sc.is_identity = 1;

--4. Cleanup
IF OBJECT_ID('dbo.tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

IF OBJECT_ID('dbo.DestinationTable','U') IS NOT NULL
    DROP TABLE dbo.DestinationTable;
GO

/**********************************************
               RESULTS
**********************************************/
/*
DestinationId DestinationValue
------------- ----------------
100           One
101           Two
102           Three
103           Four
104           Five

ColumnName     TableName         IdentitySeed  IdentityIncrement  IsNullable LastIdentityValueUsed
-------------- ----------------- ------------- ------------------ ---------- ----------------------
DestinationId  DestinationTable  100           1                  0          104
*/

Using IDENTITY() without SELECT…INTO – Msg 177


IDENTITY() does appear to be a very useful function, but please be aware that it cannot be used outside the SELECT…INTO clause. This can be confirmed by a simple modification to the SELECT query used in the example above:

SELECT IDENTITY(INT,100,1) AS DestinationId,
       SourceTable.IdentityValue AS DestinationValue
FROM dbo.tIdentity AS SourceTable ;

Executing this statement (assuming that the dependent tables and data is in-place) returns the following error:


Msg 177, Level 15, State 1, Line 16
The IDENTITY function can only be used when the SELECT statement has an INTO clause.


Conclusion


As can be seen from the scripts above, the IDENTITY() function is very useful to define domain/product-specific IDENTITY specification when importing data from one table to another.


Until we meet next time,


Be courteous. Drive responsibly.

#0275 – SQL Server – IDENTITY columns – Myths – IDENTITY columns do not propagate via SELECT…INTO statements


As I was writing the series on myths around IDENTITY columns, I started to wonder whether copying a table into another using the SELECT…INTO statement copies over the IDENTITY property also.


The simplest way to answer this question was to run a simple test, which is shown below:



  1. Create a table with an IDENTITY column defined, and insert some test data
  2. Use SELECT…INTO to select data from the source table and pump it into another table

    • For clarity, I will be using different column names between the source and the destination columns

  3. Use the catalog views – sys.columns and sys.identity_columns to confirm whether the new table was created with the IDENTITY column in place or not

USE tempdb;
GO

SET NOCOUNT ON;

--1. Prepare the environment
--   Create a table, and add some test data into it

--Safety Check
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

--Create a table
CREATE TABLE dbo.tIdentity (IdentityId INT IDENTITY(1,1),
                            IdentityValue VARCHAR(10)
                           );
GO

--Add some test data
INSERT INTO dbo.tIdentity (IdentityValue)
VALUES ('One'),
       ('Two'),
       ('Three');
GO

--2. Use SELECT..INTO to bulk insert data to a new table
SELECT SourceTable.IdentityId AS DestinationId,
       SourceTable.IdentityValue AS DestinationValue
INTO dbo.DestinationTable
FROM dbo.tIdentity AS SourceTable

--3. Check the properties of the new column - is it an IDENTITY column?
SELECT sc.name AS ColumnName,
       OBJECT_NAME(sc.object_id) AS TableName,
       sic.seed_value AS IdentitySeed,
       sic.increment_value AS IdentityIncrement,
       sic.is_nullable AS IsNullable,
       sic.last_value AS LastIdentityValueUsed
FROM sys.columns AS sc
INNER JOIN sys.identity_columns AS sic ON sc.object_id = sic.object_id
                                      AND sc.column_id = sic.column_id
WHERE ( sc.object_id = OBJECT_ID(N'dbo.DestinationTable',N'U') OR
        sc.object_id = OBJECT_ID(N'dbo.tIdentity',N'U')
      )
  AND sc.is_identity = 1;

--4. Cleanup
IF OBJECT_ID('dbo.tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

IF OBJECT_ID('dbo.DestinationTable','U') IS NOT NULL
    DROP TABLE dbo.DestinationTable;
GO

/**********************************************
               RESULTS
**********************************************/
/*
ColumnName     TableName         IdentitySeed IdentityIncrement IsNullable LastIdentityValueUsed
-------------- ----------------- ------------ ----------------- ---------- ---------------------
IdentityId     tIdentity              1              1               0             3
DestinationId  DestinationTable       1              1               0             3
*/

As can be seen from the experiment, the IDENTITY property propagates from one table to another via the SELECT…INTO clause.


Until we meet next time,


Be courteous. Drive responsibly.

#0274 – SQL Server – IDENTITY columns – Myths – IDENTITY columns cannot have constraints defined on them


In the first part of this series on IDENTITY columns, I had mentioned that IDENTITY columns must be NOT NULL and must not have a DEFAULT constraint defined on them. Often, this is interpreted in haste as IDENTITY columns cannot have any constraints defined on them.


This is a myth. Today’s post will attempt to clear out this confusion.


Defining a DEFAULT constraint on an IDENTITY column – Msg 1754


The script below is quite simple – it attempts to create a table with an IDENTITY column and then goes on to use the ALTER TABLE statement to add a default constraint to the IDENTITY column.

USE tempdb;
GO

SET NOCOUNT ON;

--1. Prepare the environment
--   Create a table, and add some test data into it

--Safety Check
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

--Create a table, this time WITHOUT a Clustered Key
CREATE TABLE dbo.tIdentity (RecordId INT IDENTITY (1,1),
                            IdentityValue VARCHAR(20)
                           );
GO

--2. Add a default constraint
ALTER TABLE dbo.tIdentity
    ADD CONSTRAINT df_DefaultRecordId DEFAULT(0) FOR RecordId;
GO

--3. Cleanup
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

Executing this script returns an error as shown below:


Msg 1754, Level 16, State 0, Line 3
Defaults cannot be created on columns with an IDENTITY attribute. Table ‘tIdentity’, column ‘RecordId’.
Msg 1750, Level 16, State 0, Line 3
Could not create constraint. See previous errors.


This example settles one part of the puzzle – IDENTITY columns cannot have DEFAULT constraints on them.


Defining other constraints (CHECK/PRIMARY KEY) on an IDENTITY column


The script below creates a table with an IDENTITY column and defines a primary key constraint on the IDENTITY column. Later, the script uses the ALTER TABLE statement to add a CHECK constraint on the IDENTITY value.


Because the DDLs execute successfully, we will insert some test data to test that the constraints are indeed in effect.

USE tempdb ;
GO

SET NOCOUNT ON ;

--1. Prepare the environment
--   Create a table, and add some test data into it

--Safety Check
IF OBJECT_ID('tIdentity', 'U') IS NOT NULL 
    DROP TABLE dbo.tIdentity ;
GO

--Create a table, with a primary key constraint on the IDENTITY column
CREATE TABLE dbo.tIdentity
    (
      RecordId INT IDENTITY(1, 1),
      IdentityValue VARCHAR(20),
      CONSTRAINT pk_tIdentityRecordId PRIMARY KEY CLUSTERED ( RecordId )
    ) ;
GO

--2. Alter the table to add a CHECK constraint
--   Now, we have two constraints on the IDENTITY column:
--   a. A Primary Key constraint
--   b. A Check constraint
ALTER TABLE dbo.tIdentity
ADD CONSTRAINT chk_MaxRecordId CHECK ( RecordId <= 5 ) ;
GO

--2. Add some test data
--   Since we are adding 5 records, the statement will succeeed
INSERT  INTO dbo.tIdentity ( IdentityValue )
VALUES  ( 'One' ),
        ( 'Two' ),
        ( 'Three' ),
        ( 'Four' ),
        ( 'Five' ) ;
GO

--2b.One may also want to check the value of IDENT_CURRENT
PRINT 'CurrentIdentityValue (Before Constraint violation): '
    + CAST(IDENT_CURRENT('dbo.tIdentity') AS VARCHAR(10)) ;
GO

--3. Add some more test data
--   This will fail, because the identity value will now exceed the values 
--   allowed by the CHECK constraint
INSERT  INTO dbo.tIdentity ( IdentityValue )
VALUES  ( 'Six' ) ;
GO

--3b.One may also want to check the value of IDENT_CURRENT
PRINT 'CurrentIdentityValue (After Constraint violation): '
    + CAST(IDENT_CURRENT('dbo.tIdentity') AS VARCHAR(10)) ;
GO

--4. Cleanup
IF OBJECT_ID('tIdentity', 'U') IS NOT NULL 
    DROP TABLE dbo.tIdentity ;
GO

The resulting messages are shown below:


CurrentIdentityValue (Before Constraint violation): 5
Msg 547, Level 16, State 0, Line 5
The INSERT statement conflicted with the CHECK constraint “chk_MaxRecordId”. The conflict occurred in database “tempdb”, table “dbo.tIdentity”, column ‘RecordId’.
The statement has been terminated.

CurrentIdentityValue (After Constraint violation): 6


As can be seen from the resulting messages shown above, the CHECK constraint on the IDENTITY column was in effect resulting in the constraint violation error.


Observation!


Observe that the current value of the IDENTITY column (obtained via IDENT_CURRENT) has incremented even though the value was not inserted. This is because of the failed insert, which can leave holes in the IDENTITY sequence.


Conclusion


As we can see from the examples above, IDENTITY columns cannot have a DEFAULT constraint defined on them – other constraints can still be defined on an IDENTITY column.


Until we meet next time,


Be courteous. Drive responsibly.