Tag Archives: Myths

Time to bust some myths around Microsoft SQL Server.

#0267 – SQL Server – IDENTITY columns – Myths – IDENTITY columns cannot have holes or “gaps”


Whenever a pattern evolves into a frequently used practice, it does not take long for myths to start getting associated with it. IDENTITY columns have become such a common pattern that often I hear about these myths from novice developers. Using this series, I hope to bust those myths once and for all.


Busting the Myth – IDENTITY columns cannot have holes


Whenever a break in a series is encountered, it is referred to as a hole. For example, if we have a series of numbers from 1 to 5 such that the number 2 is missing (1, 3, 4, 5), then we have a hole.


I often hear from developers that “Holes cannot be generated in the values of a column defined as the IDENTITY column” is a prime reason why one uses IDENTITY columns. This is in-fact incorrect as I will be proving by this post.


The Test


To provide that holes can be generated in an IDENTITY column, I will use the script shown below. The script:



  1. Prepares the environment by creating a table and adding some test data into it

  2. Check the current identity value of the table using IDENT_CURRENT()

  3. Opens a transaction

  4. Performs an insert into a table which has the IDENTITY property defined

  5. Later, the transaction is rolled back, and the current identity value of the table is queried using IDENT_CURRENT()

  6. To confirm, a record is inserted into the table and the values selected – clearly indicating the presence of the hole

USE tempdb;
GO

SET NOCOUNT ON;

--1. Prepare the environment
--   Create a table, and add some test data into it

--Safety Check
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

--Create a table
CREATE TABLE dbo.tIdentity (IdentityId INT IDENTITY(1,1),
                            IdentityValue VARCHAR(10)
                           );
GO

--Add some test data
INSERT INTO dbo.tIdentity (IdentityValue)
VALUES ('One'),
       ('Two'),
       ('Three');
GO

--2. Check the current identity value of the table using IDENT_CURRENT()
SELECT IDENT_CURRENT('dbo.tIdentity') AS IdentityValueBeforeTest;
GO

--3. Perform the test
--3a. Open a transaction
BEGIN TRANSACTION IdentityHolesCheck
    --3b. Perform an insert
    INSERT INTO dbo.tIdentity (IdentityValue)
    VALUES ('Four');
--3c. Rollback the transaction
ROLLBACK TRANSACTION IdentityHolesCheck;
GO

--4. Check the current identity value of the table using IDENT_CURRENT()
SELECT IDENT_CURRENT('dbo.tIdentity') AS IdentityValueAfterTest;
GO

--5. Confirmatory test
INSERT INTO dbo.tIdentity (IdentityValue)
VALUES ('Five');
GO

--6. Select the values from the table
SELECT ti.IdentityId, ti.IdentityValue
FROM dbo.tIdentity AS ti;
GO

--Cleanup
IF OBJECT_ID('tIdentity','U') IS NOT NULL
    DROP TABLE dbo.tIdentity;
GO

/**********************************************
               RESULTS
**********************************************/
/*
IdentityValueBeforeTest
-----------------------
3

IdentityValueAfterTest
----------------------
4

IdentityId  IdentityValue
----------- -------------
1           One
2           Two
3           Three
5           Five
*/

While this was a simulation, in a practical world this situation is similar to a failed insert. Even if the INSERT did not go through, the IDNETITY value was incremented leaving the hole behind.


Conclusion


The test shown above clearly proves that it is possible to have holes in an IDENTITY column. Any code that expects an uninterrupted series of values may break if this has not been taken into consideration.


Until we meet next time,


Be courteous. Drive responsibly.

#0266 – SQL Server – IDENTITY Columns – @@IDENTITY, SCOPE_IDENTITY, IDENT_CURRENT and $IDENTITY


IDENTITY columns are perhaps the least understood and yet most used features of Microsoft SQL Server.


In this post, I will explore some system functions and keywords that we as developers and administrators can use to get information about IDENTITY columns.


Whenever a record is inserted to a table using an IDENTITY column via an INSERT, SELECT INTO or bulk copy command, the identity values on the affected tables are incremented. Microsoft SQL Server gives us three system functions that can provide the last used identity value – depending upon the scope and session.


IDENT_CURRENT()


Given a table name, this system function returns the last identity value used for the given table or view. Since this function works on a particular table or view, this value is independent of the session or the scope.


SCOPE_IDENTITY()


This system function provides the last used IDENTITY value within the same scope.


Statements inside the same stored procedure, trigger or batch comprise of the same scope – those within triggers fired as a result of a statement inside a stored procedure aren’t.


If no statement has been executed which affects an IDENTITY value, this function returns NULL.


@@IDENTITY


The @@IDENTITY system function contains the last value that was generated by the statement – irrespective of the scope.


If no statement has been executed which affects an IDENTITY value, this function returns NULL. However, please note that this function reports the identity value last generated by the statement – irrespective of the scope. This means that if a statement causes execution of a trigger which generates identity values in other tables, calling the @@IDENTITY function immediately after this statement will return the last IDENTITY value generated by the trigger.


Let’s look at each of these via an example:

USE AdventureWorks2008R2;
GO
SET NOCOUNT ON ;

SELECT  IDENT_CURRENT('Purchasing.PurchaseOrderDetail') AS PODetailIdentity,
        IDENT_CURRENT('Production.TransactionHistory') AS TransactionlIdentity,
        SCOPE_IDENTITY() AS ScopeIdentityValue,
        @@IDENTITY AS LastUsedIdentity;

    --Perform the INSERT into the table: Purchasing.PurchaseOrderDetail
    --The insert trigger on this table: Purchasing.iPurchaseOrderDetail, causes a record
    --to be inserted into Production.TransactionHistory, thereby incrementing it's IDENTITY column - TransactionID
    INSERT  INTO Purchasing.PurchaseOrderDetail
            (
              PurchaseOrderID,
              DueDate,
              OrderQty,
              ProductID,
              UnitPrice,
             --LineTotal,
              ReceivedQty,
              RejectedQty,
             --StockedQty,
              ModifiedDate
            )
            SELECT  1 AS PurchaseOrderId,
                    '2005-05-31' AS DueDate,
                    1 AS OrderQty,
                    4 AS ProductId,
                    57.02 AS UnitPrice,
                    --258.06 AS LineTotal,
                    1 AS ReceivedQty,
                    0 AS RejectedQty,
                    --1 AS StockedQty,
                    '2013-05-09' AS ModifedDate ;
    GO

SELECT  IDENT_CURRENT('Purchasing.PurchaseOrderDetail') AS PODetailIdentity,
        IDENT_CURRENT('Production.TransactionHistory') AS TransactionlIdentity,
        SCOPE_IDENTITY() AS ScopeIdentityValue,
        @@IDENTITY AS LastUsedIdentity;

/****RESULTS****/
/*
        PODetailIdentity    TransactionlIdentity    ScopeIdentityValue    LastUsedIdentity
Before: 8847                213450                  NULL                  NULL  
After : 8848                213451                  8848                  213451
*/

Inserting a new record in the table: Purchasing.PurchaseOrderDetail causes the insert trigger on this table: Purchasing.iPurchaseOrderDetail to be executed. This trigger maintains a history of the PurchaseOrder Detail transactions in the table – Production.TransactionHistory, thereby incrementing it’s IDENTITY column – TransactionID.


As can be seen in the output:



  • IDENT_CURRENT() provides us the last used identity value for the given table

  • SCOPE_IDENTITY() provides us the last used identity value within the scope of the batch

  • @@IDENTITY provides us the last used identity value irrespective of the scope

$IDENTITY


The $IDENTITY keyword allows us to refer to an identity column in a table without using the corresponding column name. Here’s an example:

USE AdventureWorks2008R2 ;
GO
SELECT  $IDENTITY
FROM    Purchasing.PurchaseOrderDetail ;
GO

The query above will return the value of the [PurchaseOrderDetailId] in the table [Purchasing].[PurchaseOrderDetail] since it’s marked as an identity column.


Other functions – IDENT_INCR() and IDENT_SEED()


If one needs to look at the value of the seed and increment values used in the identity column definition, the IDENT_INCR() and IDENT_SEED() functions can be used.

USE AdventureWorks2008R2;
GO
SELECT IDENT_SEED('Production.TransactionHistory') AS IdentityIncrement,
       IDENT_INCR('Production.TransactionHistory') AS IdentitySeed;
GO
/* RESULTS */
/*
IdentityIncrement   IdentitySeed
100000              1
*/

Checking for Identity columns in a table


The Catalog View sys.identity_columns has one record for each identity column and provides information like the seed, increment, last value generated for that column and other meta-data information.


Usage is shown in the query below:

USE AdventureWorks2008R2 ;
GO
SELECT * FROM sys.identity_columns
WHERE object_id = OBJECT_ID('Production.TransactionHistory');
GO

Until we meet next time,


Be courteous. Drive responsibly.

CAST v/s CONVERT – Is there a difference as far as SQL Server is concerned? Which is better?


Data type conversion has been one of the most used features of any programming language. Microsoft SQL Server users are familiar with the two functions: CAST() and CONVERT() and use them regularly, and more specifically, interchangeably. A few months ago, I had written a post comparing the performance of CAST v/s CONVERT, and was able to prove that there is no difference in performance of the two functions.You can read that post here: https://nakulvachhrajani.com/2011/01/03/cast-vs-convert/.

Recently, a colleague and I were discussing a particular design element and found that a data type casting was required. During this discussion, the following questions almost simultaneously came to our minds –

  • If the performance and end results are the same, what is the difference between CAST & CONVERT aside from the fact that they have different syntax elements?
  • Do we really need 2 data conversion functions?
Parameter CAST CONVERT
ANSI standard Yes No
Data-type coverage Limited Full (Date & Date Time values supported)
Performance No difference No difference
Microsoft SQL Server implementation ?? ??
Books On Line Page http://msdn.microsoft.com/en-us/library/ms187928.aspx http://msdn.microsoft.com/en-us/library/ms187928.aspx

So, I ran a little test, and today, I will share with you my findings.

The CAST v/s CONVERT Test

The test is quite simple – we have two identical T-SQL statements, one using CAST, the other using CONVERT. We will use the Properties window of SSMS to analyze the execution plan and try to see what can be found about the underlying implementation.

Let’s take a look at our test statements:

/***********************************************************
Press Ctrl+M (or go to Query->Include Actual Execution Plan)
************************************************************/
USE AdventureWorks2008R2
GO
--Use CONVERT
select CONVERT(VARCHAR(10),BusinessEntityID) FROM HumanResources.Employee

--Use CAST
select CAST(BusinessEntityID AS VARCHAR(10)) FROM HumanResources.Employee
  1. Connect to your SQL Server instance using SQL Server Management Studio (SSMS)
  2. Copy the T-SQL code above into SSMS Query editor window
  3. Run the above T-SQL statements against your SQL Server instance
  4. Change over to the Execution Plan tab in the Results pane
  5. In the results pane, notice that both CAST & CONVERT have been implemented as “Compute Scalar” operators
  6. image
  7. Press the F4 key or go to View –> Properties to launch the properties window
  8. Expand the “Defined Values” node

CONVERT implementation

CONVERT does not come up with any surprises, and has a straightforward internal implementation as demonstrated in the Properties window.

image

CAST implementation

When we move to the properties for the query using the CAST operation, we see that under the hood, SQL Server does take us for a ride. Internally, CAST is implemented as a CONVERT call. There is no difference between CAST & CONVERT besides the fact that CAST is an ANSI standard, while CONVERT is not. No wonder both CAST & CONVERT demonstrate the same performance.

image

Surprised? Try it for yourself – I was not prepared to see what I saw when I first ran through the test. I restarted the entire server and got a cup of coffee to make sure I was not dreaming!

Conclusion

I guess all I need to do now is to complete the little grid I had above:

Parameter CAST CONVERT
ANSI standard Yes No
Data-type coverage Limited Full (Date & Date Time values supported)
Performance No difference No difference
Microsoft SQL Server implementation CONVERT CONVERT
Books On Line Page http://msdn.microsoft.com/en-us/library/ms187928.aspx http://msdn.microsoft.com/en-us/library/ms187928.aspx

What would I recommend?

Although we now know that to Microsoft SQL Server, data type conversion only happens via CONVERT, I would still prefer to go with CAST unless I am converting dates or date-time values. The reason being CAST is an ANSI standard, CONVERT is not. CONVERT is implementation specific, and therefore may change in terms of the number of parameters or underlying processes. To the calling T-SQL statement, it is safe to assume that CAST would remain unchanged (unless the standard changes) and therefore, it would be the responsibility of Microsoft SQL Server to translate the CAST to a CONVERT implementation.

Do share your thoughts on the whole CAST v/s CONVERT myths and controversies floating around in the SQL Server world. I will be more than happy to research more and share my findings with you.

Until we meet next time,

Be courteous. Drive responsibility.