Category Archives: Blog

Articles related to this Blog

Optimistic Concurrency – A conceptual introduction


A quick query at the office

The other day, one of the developers at the office asked me a seemingly very simple question. They had an application with a form that presented a record to the user by default as “read-only”. If a user wishes to edit the record, they need to click on a little “edit” button and the form would magically become editable. The developer wanted to know how could they prevent a problem commonly known as the “phantom update”.

A “phantom update” is a situation that can occur if two users fetch a record at almost the same time, and one updates the record before the other. What happens is that all update that happened before the final update are lost. Here’s a small demonstration:

A demonstration of the problem

We will first begin by creating a test database and some test objects.

/* Step 01: Create a test database */
CREATE DATABASE [OptimisticConcurrency];
GO

/* Step 02: Create Test objects                                             */
/*          Source: http://support.microsoft.com/kb/917040                  */
/* Since TIMESTAMP is a deprecated feature, we are using ROWVERSION instead */
USE [OptimisticConcurrency]
GO
CREATE TABLE [MyTest] (myKey   INT PRIMARY KEY, 
                       myValue INT, 
                       RV      ROWVERSION
                      )
GO

/* Step 03: Generate Test Data                             */
/*          Source: http://support.microsoft.com/kb/917040 */
USE [OptimisticConcurrency]
GO
INSERT INTO [MyTest] (myKey, myValue) 
VALUES (1, 0), 
       (2, 0)
GO

Now, let us try to see the problem of the “phantom update”:

-- Assume that the below SELECT statement is the process of two connections fetching the data
SELECT * 
FROM [dbo].[MyTest];
GO

-- Assume that this update is happening from connection #1
UPDATE [dbo].[MyTest] 
SET [myValue] = 2 
WHERE [myKey] = 1;
GO

-- Assume that this update is happening from connection #2, a different SSMS Query window
UPDATE [dbo].[MyTest] 
SET [myValue] = 1 
WHERE [myKey] = 1;
GO

-- Now, let us see the values that have actually been committed to the database
SELECT * 
FROM [dbo].[MyTest];
GO

The following is the result set. The update setting the [dbo].[MyTest].[myValue] to 2 has been lost.

The solution

The best way to handle this scenario is to implement concurrency, which means allowing multiple users to access the same record at the same time.

In simple terms, what normally happens during an update is locking, i.e. while a particular record is being accessed by a connection, another connection cannot access the same. To prevent this, Microsoft SQL Server 2005 supports both Optimistic and Pessimistic concurrency models through T-SQL statements and also via programming interfaces like ADO, ADO.NET, OLE DB, and ODBC.

Because most systems are 80% read, 20% write systems, the probability of two connections attempting to update the same record is decreased. On the other hand, the record should be accessed by multiple connections without experiencing any locking. Optimistic concurrency uses this statistical information to always assume that the record under consideration would not be modified until the user holding the current connection decides to do so.

Optimistic concurrency can be used within Microsoft SQL Server by the use of the ROWVERSION column on a table (the test environment created above has the ROWVERSION column in it already). The Books On Line provides a wonderful, simple explanation of the ROWVERSION, which I have quoted below:

ROWVERSION is a data type that exposes automatically generated, unique binary numbers within a database. Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a ROWVERSION column within the database. This counter is the database ROWVERSION. ROWVERSION is thus generally used as a mechanism for version-stamping table rows.

In the stored procedure below, we essentially check for changing ROWVERSION values by using the ROWVERSION value in the WHERE clause.

USE [OptimisticConcurrency];
GO

CREATE PROCEDURE [dbo].[proc_ModifyMyTest]
                @tKey      INT = 0,
                @tNewValue INT = 0,
                @tRV       ROWVERSION
AS
BEGIN
    --Create a table variable to hold the updated ROWVERSION value
    DECLARE @t TABLE (myKey int);

    --Update the record as required, and output the new ROWVERION into the
    --table variable created above
    UPDATE [dbo].[MyTest] 
    SET [myValue] = @tNewValue
    OUTPUT [inserted].[myKey] INTO @t(myKey)
    WHERE [myKey] = @tKey 
      AND [RV]    = @tRV;

    --Ensure if anything changed. If we used a stale ROWVERSION value (i.e. somebody updated the record before us),
    --nothing will be updated, in which case, raise an exception to the calling program
    IF (SELECT COUNT(*) FROM @t) = 0
      BEGIN
        RAISERROR ('Error encountered while changing row with myKey = %d. Please refresh your data and try again.',
                   16, -- Severity.
                   1, -- State.
                   @tKey) -- myKey that was changed
      END
END
GO

Now that we have a stored procedure ready for us to use during the update, we will repeat what we did earlier, only this time, we will use the stored procedure to update the values in the table MyTest. The script below should be self-explanatory.

/* Step 04: Now, run the following queries (entire step) against the [OptimisticConcurrency] database           */
/*          We will read the value once (similar to two users accessing the record for reading simultaneously   */
/*          Next, we will update the record once (similar to user #1 updating the record)                       */
/*          Finally, user #2 tries to update the record without refreshing the change, and encounters the error */
/*                                                                                                              */
/*          Please follow the instructions carefully for successful demo                                        */
/*                                                                                                              */
USE [OptimisticConcurrency];
GO

-- A. Declaration
DECLARE @stmnt    NVARCHAR(200);
DECLARE @params   NVARCHAR(100);
DECLARE @tRV      VARBINARY(8);
DECLARE @tMyKey   INT;
DECLARE @tMyValue INT;

-- B. We are reading the value (this is equivalent to two users opening the record for viewing simultaneously
--    In the application, all you will need to cache is the primary key and the rowversion value. Rest comes from the UI
PRINT '<<<< INFO >>>> Two users have now read the data into their local variables...';

SET @tMyKey   = 1
SET @tMyValue = 2
SET @tRV = (SELECT CONVERT(VARBINARY(8), [RV]) 
            FROM [dbo].[MyTest]
            WHERE [myKey] = @tMyKey
           );
SET @stmnt  = N'[dbo].[proc_ModifyMyTest] @tKey, @tNewValue, @tRowValue'
SET @params = N'@tKey INT, @tNewValue INT, @tRowValue ROWVERSION'

-- C-1. Now, User #1 fires the update, and is successful
--      Using sp_executesql will execute the query under a different connection
PRINT '<<<< INFO >>>> User #1 is now firing an update...';
EXEC sp_executesql @stmnt     = @stmnt,
                   @params    = @params,
                   @tKey      = @tMyKey,
                   @tNewValue = @tMyValue,
                   @tRowValue = @tRV;

-- C-1. Now, User #2 fires the update, and is unsuccessful
--      This is because User #2 is using the same Row Version value, which is invalid
--      because Connection #1 has already changed the record
SET @tMyValue = @tMyValue + 1
PRINT '<<<< INFO >>>> User #2 is now firing an update...';
EXEC sp_executesql @stmnt     = @stmnt,
                   @params    = @params,
                   @tKey      = @tMyKey,
                   @tNewValue = @tMyValue,
                   @tRowValue = @tRV;
GO

The first update will go successful because the underlying record has not been updated before the stored procedure call. After user #1, user #2 then fires off an update using what is now stale information. The mismatch of the ROWVERSION column prevents user #2 from updating the data modified by user #1.

The execution of the above script will produce the following output when run from the SQL Server Management Studio. The exception being thrown during the second update can be trapped by the UI and a nicely decorated message shown to the user.

Msg 50000, Level 16, State 1, Procedure dbo.proc_ModifyMyTest, Line 23 [Batch Start Line 9]
Error encountered while changing row with myKey = 1. Please refresh your data and try again.
Tip:

The global variable @@DBTS returns the last-used timestamp value (VARBINARY) of the current database.

Conclusion

Optimistic concurrency is widely used in production systems. While the locking and concurrency models are huge topics to cover in a handful of articles, I hope that this post gives you a simple, conceptual overview of the Optimistic Concurrency model.

Please note:

  1. The ROWVERSION data type is just an incrementing number and does not preserve a date or a time
  2. A table can have only one ROWVERSION column
  3. ROWVERSION should not be used as keys, especially primary keys
  4. Using a SELECT INTO statement has the potential to generate duplicate ROWVERSION values
  5. Finally for those who came from SQL 2005, the TIMESTAMP is deprecated. Please avoid using this in new DDL statements
  6. Unlike TIMESTAMP, the ROWVERSION column needs a column name in the DDL statements

References

  1. Types of concurrency control – http://msdn.microsoft.com/en-us/library/ms189132(SQL.90).aspx
  2. Database concurrency and Row Versioning: http://msdn.microsoft.com/en-us/library/cc917674.aspx
  3. TIMESTAMP column (SQL 2005): http://msdn.microsoft.com/en-us/library/ms182776(v=SQL.90).aspx
  4. ROWVERSION column (SQL 2008 and above): http://msdn.microsoft.com/en-us/library/ms182776(v=SQL.100).aspx
  5. Optimistic Locking in SQL Server 2005 – http://www.mssqltips.com/tip.asp?tip=1501

Until we meet next time,
Be courteous. Drive responsibly.

The Ghost of the Y2K


This is actually a spin-off from a Question of the Day, which appeared in SQLServerCentral (December 16, 2010).

A brief History

For those of you who have been around in the industry for more than a decade, the term Y2K carries mixed feelings. It was a very much hyped and very real problem, but with a very simple solution. A lot of our software systems were found to be vulnerable, and in hindsight, it was because of the human tendency to assume. Most of the systems designers did not predict that their systems would run for 20 or so years, and were hence bitten by the famous “Y2K bug”.

For those of came in late, the Y2K issue was valid only for systems using a 2 digit year representation. Before the year 2000, most of us had a habit of using 98 to represent the year 1998 and so on. Most systems were designed keeping this in mind, and hence when the year 2000 would set in, the year 00 would thus become, 1900 – throwing all financial calculations out of the window.

The “Y2K” Solution

There were two possible solutions:

  1. Use a 4 digit year
  2. Change the assumptions:
    • Anything greater than “50” would mean we are in the 20th century (i.e. 50 is 1950, 60 is 1960 and so on)
    • Anything less than “50” is the 21st century (i.e. 00 is 2000, 10 is 2010 and so on)

Those who used 4 digit years (option #1) are covered, and will continue to remain covered till we roll over to 5 digit years, and if the systems we build today are still running. However, for those using option #2, the ghost of the Y2K is not yet gone. In the year 2049, the ghost will come back to haunt us.

Most systems, especially today’s database systems have been designed keeping both options in mind. Microsoft SQL Server also accepts two digit years and converts them into a 4 digit year value based on the logic above. The default two-digit year cutoff for Microsoft SQL Server is the year 2049.

Depending upon where and how you do business, the business requirement behind the two-digit year cutoff may vary. Microsoft SQL Server does provide a way to override this default value of 2049 via an advanced configuration option. Today, I will be demonstrating how to use the same.

A Demonstration

Consider the following example:

1

By default, Microsoft SQL Server has it’s two-digit year cutoff as 2049, which causes the two digit year “50” to be interpreted as “1950”.

Just as where there is a will, there is always a way; similarly, where there are ghosts, there are always ghost busters. The ghost buster in this case has been provided by Microsoft in the form of a configurable 2-digit year cutoff value.

The steps below show us how to change the default two digit year cutoff:

  1. The two digit year cutoff is an advanced option. Hence, turn advanced options ON
  2. Check the default/current value of the two digit year cutoff
  3. Change the value of the two digit year cutoff
  4. Check that the modifications have taken effect

NOTE: I have used the keyword “RECONFIGURE”, which specifies that the configuration setting does not require a server stop and restart, and that the currently running value should be updated. Please use this option with utmost care.

2

Now, let us run the example again:

3

Notice that the year “49” is now interpreted as the year “1949”. We have successfully changed the two digit cutoff year.

Now, let’s rollback the settings of the SQL server instance to leave it as we found it.

4

In Conclusion

As always, nothing comes without it’s own pros and cons. Before you go about changing (or not changing) the two digit cutoff value of your SQL Server, please keep the following in mind:

  1. Per MSDN (http://msdn.microsoft.com/en-us/library/ms189577.aspx), the two digit year cutoff values for SQL Server and OLE Automation objects are different. That of SQL Server is 2049, while that of OLE Automation objects is 2030. If your client applications use 2-digit years, it is advisable to have both in sync
  2. The two-digit year provision should be treated as limited-time feature. Applications should be developed using, or should plan to move to 4 digit years

The above demonstration was just for changing the behaviour of Microsoft SQL Server. Other applications (like Excel) are also affected by the “ghost of the Y2K” and separate efforts and planning should be done to avoid any issues. For Excel users, please ensure that you have the required range set in Control Panel –> Regional and Language Options –> Customize –> Date tab.

I trust that the above gives you enough of a heads-up about the upcoming ghost of Y2K and a possibly increased time window to re-engineer your system. If you need the scripts that I ran through in this post, please visit my profile on this site.

Have a good week-end!

Be courteous. Drive responsibly.