SQLServerCentral (or “SSC” as we know it), ran a very good editorial last year on Friday, December 17, 2010. The editorial was a guest editorial by Andy Warren on “under-appreciated features of SQL Server” (http://www.sqlservercentral.com/articles/Editorial/71788/). Andy looks back at the evolution of Microsoft SQL Server over the past decade (that we just bid goodbye to) and the tons of features that have been introduced across the various releases. Some of these features are still with us, some have been deprecated; however the editorial is more targeted to identifying those features that held a lot of promise, but for one reason or the other failed to take off within the general SQL Server community.
If you follow the discussion that took place afterwards, a lot of interesting features were listed down. I have here a summary of the same:
As I was following the discussion and compiling the list, it dawned upon me that this is an excellent opportunity to write a little bit on each of these features that we as a community might end up appreciating a bit more.
The major technological enhancements are such that they require a detailed study of their own. However, the rest are really very quick demos in order to get started and I am sure that they will definitely change your life at the work place.
I will thus spin off a series of small articles and ultimately link them all back to this parent article. Finally, as a closing post to the series, I will be also be pointing you towards some learning resources for rest of the items on the list.
Do send in your feedback on how you would use (or are using) each of these items.
Data type conversion has been one of the most used features of any programming language. Microsoft SQL Server users are familiar with the two functions: CAST() and CONVERT() and use them regularly, and more specifically, interchangeably.
However, using CAST() and CONVERT() interchangeably is one of the most capital mistakes that a developer can make. Madhivanan, a SQL Server blogger on BeyondRelational.com (http://beyondrelational.com/blogs/madhivanan/default.aspx) does a very good job at highlighting the differences between CAST() and CONVERT() in his blog post here.
To summarize, the differences that he highlights are:
CAST is an ANSI standard, while CONVERT is specific to Microsoft SQL Server
CAST cannot be used for formatting purposes (i.e. used to type cast something to a character string), whereas CONVERT can do so, especially for datetime and money datatypes
CAST cannot convert a string to a DATETIME value of a required format (i.e. formatting as mm/dd/yyyy or dd/mm/yyyy as required etc), while the CONVERT can
Finally, he warns us to use a properly calculated size value when using the CAST and CONVERT functions to convert integer values to character data types.
All very great points, and very, very useful. Thank-you, Madhivanan for the wonderful research and enlightenment.
A Question
We can draw a general conclusion that for computation purposes, CAST should be used and for formatting of values for display on a UI or a report, CONVERT should be used. However, the interesting question is one which came up when I was following the discussion on the blog the other day –
Tuesday, July 27, 2010 10:38 AM by cute_boboi
For data extraction purpose, with 1M+ records, which method is faster/recommended ? CAST or CONVERT from:
(i) Date to varchar
(ii) Int to varchar
Today, I try to answer that question.
The Demonstration
Preparing the Environment
Let’s start by creating a table, and filling it up with some test data (Running against the AdventureWorks2008 database gives you about 356409 rows in the table):
-- Step 01. Create a test table
CREATE TABLE CASTCONVERTTest (Id INT,
CrDate DATETIME)
-- Step 02. Generate Test Data
INSERT INTO CASTCONVERTTest
SELECT sso1.Id, sso1.crdate
FROM sys.sysobjects sso1
CROSS JOIN sys.sysobjects sso2
Now, let’s begin by pressing Ctrl+M when in the SQL Server Management Studio or go to Query->Include Actual Execution Plan to enable showing the Actual Execution Plan.
Conversion of INT to VARCHAR
In order to see the performance of using CAST and CONVERT on INT to VARCHAR conversion, we will use the following script. At each stage, we will be using DBCC freeproccache to remove all elements from the plan cache.
/***********************************************************
Press Ctrl+M (or go to Query->Include Actual Execution Plan)
************************************************************/
-- Step 03. Test conversion of INT to VARCHAR
-- A. Remove all elements from the plan cache
DBCC freeproccache
GO
-- B. Check the performance of CAST
SELECT CAST(Id AS VARCHAR(11))
FROM CASTCONVERTTest
GO
-- C. Remove all elements from the plan cache
DBCC freeproccache
GO
-- D. Check the performance of CONVERT
SELECT CONVERT(VARCHAR(11),Id)
FROM CASTCONVERTTest
GO
Let’s look at the Actual Execution Plan used by SQL Server. As you can see, we get a 50-50 split, which means that both CAST and CONVERT perform equally well for the INT to VARCHAR conversion.
Conversion of DATETIME to VARCHAR
In order to see the performance of using CAST and CONVERT on DATETIME to VARCHAR conversion, we will use the following script. At each stage, we will be using DBCC freeproccache to remove all elements from the plan cache.
-- Step 04. Test conversion of DATETIME to VARCHAR
-- A. Remove all elements from the plan cache
DBCC freeproccache
GO
-- B. Check the performance of CAST
SELECT CAST(CrDate AS VARCHAR(25))
FROM CASTCONVERTTest
GO
-- C. Remove all elements from the plan cache
DBCC freeproccache
GO
-- D. Check the performance of CONVERT
SELECT CONVERT(VARCHAR(25),CrDate,106)
FROM CASTCONVERTTest
GO
Let’s look at the Actual Execution Plan used by SQL Server. As you can see, again we get a 50-50 split, which means that both CAST and CONVERT perform equally well for the DATETIME to VARCHAR conversion.
The Cleanup
Finally, as always, let’s cleanup the environment.
-- Step 05. Cleanup!
DROP TABLE CASTCONVERTTest
In Conclusion
In conclusion, we can safely conclude that both CAST and CONVERT perform equally well for the following conversions:
INT to VARCHAR
DATETIME to VARCHAR
I hope the above is a satisfactory reply to the question we started with.
Powered with the research conducted by Madhivanan and with the afore described performance test, the choice is now up to the reader. Do share your practices with CAST & CONVERT on this blog. Also, do mention the reasons why you practice a particular rule of thumb, if possible.
It’s New Year’s Eve, and I am spending some nice family time by taking a holiday from work. However, it is not possible for me to forget the nice memories that the SQL Server community has given me.
As we wind up the year 2010, I would like to express my sincere thanks to the entire SQL Server Community for their kind support and encouragement in 2010. It was fun attending the various Community Tech Days events and also view the recorded Virtual Tech Days (being work days, I was not able to participate as a live attendee).
This year, I made an attempt to start giving back to the SQL Server Community in order to help strengthen it even further. I am thankful that all my contributions have been welcomed by the community and continue to pray that I will receive the same support in the years to come.
One of the the most important professional developments that happened for me in 2010 was that I started my blog with the kind help of Jacob Sebastian and the encouragement of both local SQL Server heroes – Pinal Dave and Jacob Sebastian. Here is a summary of my other contributions to the community:
Contributed quite a few “Question Of the Day” on SQLServerCentral.com
I pray to the Almighty that I continue contributing to the community with even increased zeal and fervor in 2011! I am proud to be associated with a community as vibrant and as wonderful as the SQL Server community.
With this, I will close and allow you all to get back with your families to ring in the New Year. I wish you all a very Happy and Prosperous New Year!
The other day, one of the developers at the office asked me a seemingly very simple question. They had an application with a form that presented a record to the user by default as “read-only”. If a user wishes to edit the record, they need to click on a little “edit” button and the form would magically become editable. The developer wanted to know how could they prevent a problem commonly known as the “phantom update”.
A “phantom update” is a situation that can occur if two users fetch a record at almost the same time, and one updates the record before the other. What happens is that all update that happened before the final update are lost. Here’s a small demonstration:
A demonstration of the problem
We will first begin by creating a test database and some test objects.
/* Step 01: Create a test database */
CREATE DATABASE [OptimisticConcurrency];
GO
/* Step 02: Create Test objects */
/* Source: http://support.microsoft.com/kb/917040 */
/* Since TIMESTAMP is a deprecated feature, we are using ROWVERSION instead */
USE [OptimisticConcurrency]
GO
CREATE TABLE [MyTest] (myKey INT PRIMARY KEY,
myValue INT,
RV ROWVERSION
)
GO
/* Step 03: Generate Test Data */
/* Source: http://support.microsoft.com/kb/917040 */
USE [OptimisticConcurrency]
GO
INSERT INTO [MyTest] (myKey, myValue)
VALUES (1, 0),
(2, 0)
GO
Now, let us try to see the problem of the “phantom update”:
-- Assume that the below SELECT statement is the process of two connections fetching the data
SELECT *
FROM [dbo].[MyTest];
GO
-- Assume that this update is happening from connection #1
UPDATE [dbo].[MyTest]
SET [myValue] = 2
WHERE [myKey] = 1;
GO
-- Assume that this update is happening from connection #2, a different SSMS Query window
UPDATE [dbo].[MyTest]
SET [myValue] = 1
WHERE [myKey] = 1;
GO
-- Now, let us see the values that have actually been committed to the database
SELECT *
FROM [dbo].[MyTest];
GO
The following is the result set. The update setting the [dbo].[MyTest].[myValue] to 2 has been lost.
The solution
The best way to handle this scenario is to implement concurrency, which means allowing multiple users to access the same record at the same time.
In simple terms, what normally happens during an update is locking, i.e. while a particular record is being accessed by a connection, another connection cannot access the same. To prevent this, Microsoft SQL Server 2005 supports both Optimistic and Pessimistic concurrency models through T-SQL statements and also via programming interfaces like ADO, ADO.NET, OLE DB, and ODBC.
Because most systems are 80% read, 20% write systems, the probability of two connections attempting to update the same record is decreased. On the other hand, the record should be accessed by multiple connections without experiencing any locking. Optimistic concurrency uses this statistical information to always assume that the record under consideration would not be modified until the user holding the current connection decides to do so.
Optimistic concurrency can be used within Microsoft SQL Server by the use of the ROWVERSION column on a table (the test environment created above has the ROWVERSION column in it already). The Books On Line provides a wonderful, simple explanation of the ROWVERSION, which I have quoted below:
ROWVERSION is a data type that exposes automatically generated, unique binary numbers within a database. Each database has a counter that is incremented for each insert or update operation that is performed on a table that contains a ROWVERSION column within the database. This counter is the database ROWVERSION. ROWVERSION is thus generally used as a mechanism for version-stamping table rows.
In the stored procedure below, we essentially check for changing ROWVERSION values by using the ROWVERSION value in the WHERE clause.
USE [OptimisticConcurrency];
GO
CREATE PROCEDURE [dbo].[proc_ModifyMyTest]
@tKey INT = 0,
@tNewValue INT = 0,
@tRV ROWVERSION
AS
BEGIN
--Create a table variable to hold the updated ROWVERSION value
DECLARE @t TABLE (myKey int);
--Update the record as required, and output the new ROWVERION into the
--table variable created above
UPDATE [dbo].[MyTest]
SET [myValue] = @tNewValue
OUTPUT [inserted].[myKey] INTO @t(myKey)
WHERE [myKey] = @tKey
AND [RV] = @tRV;
--Ensure if anything changed. If we used a stale ROWVERSION value (i.e. somebody updated the record before us),
--nothing will be updated, in which case, raise an exception to the calling program
IF (SELECT COUNT(*) FROM @t) = 0
BEGIN
RAISERROR ('Error encountered while changing row with myKey = %d. Please refresh your data and try again.',
16, -- Severity.
1, -- State.
@tKey) -- myKey that was changed
END
END
GO
Now that we have a stored procedure ready for us to use during the update, we will repeat what we did earlier, only this time, we will use the stored procedure to update the values in the table MyTest. The script below should be self-explanatory.
/* Step 04: Now, run the following queries (entire step) against the [OptimisticConcurrency] database */
/* We will read the value once (similar to two users accessing the record for reading simultaneously */
/* Next, we will update the record once (similar to user #1 updating the record) */
/* Finally, user #2 tries to update the record without refreshing the change, and encounters the error */
/* */
/* Please follow the instructions carefully for successful demo */
/* */
USE [OptimisticConcurrency];
GO
-- A. Declaration
DECLARE @stmnt NVARCHAR(200);
DECLARE @params NVARCHAR(100);
DECLARE @tRV VARBINARY(8);
DECLARE @tMyKey INT;
DECLARE @tMyValue INT;
-- B. We are reading the value (this is equivalent to two users opening the record for viewing simultaneously
-- In the application, all you will need to cache is the primary key and the rowversion value. Rest comes from the UI
PRINT '<<<< INFO >>>> Two users have now read the data into their local variables...';
SET @tMyKey = 1
SET @tMyValue = 2
SET @tRV = (SELECT CONVERT(VARBINARY(8), [RV])
FROM [dbo].[MyTest]
WHERE [myKey] = @tMyKey
);
SET @stmnt = N'[dbo].[proc_ModifyMyTest] @tKey, @tNewValue, @tRowValue'
SET @params = N'@tKey INT, @tNewValue INT, @tRowValue ROWVERSION'
-- C-1. Now, User #1 fires the update, and is successful
-- Using sp_executesql will execute the query under a different connection
PRINT '<<<< INFO >>>> User #1 is now firing an update...';
EXEC sp_executesql @stmnt = @stmnt,
@params = @params,
@tKey = @tMyKey,
@tNewValue = @tMyValue,
@tRowValue = @tRV;
-- C-1. Now, User #2 fires the update, and is unsuccessful
-- This is because User #2 is using the same Row Version value, which is invalid
-- because Connection #1 has already changed the record
SET @tMyValue = @tMyValue + 1
PRINT '<<<< INFO >>>> User #2 is now firing an update...';
EXEC sp_executesql @stmnt = @stmnt,
@params = @params,
@tKey = @tMyKey,
@tNewValue = @tMyValue,
@tRowValue = @tRV;
GO
The first update will go successful because the underlying record has not been updated before the stored procedure call. After user #1, user #2 then fires off an update using what is now stale information. The mismatch of the ROWVERSION column prevents user #2 from updating the data modified by user #1.
The execution of the above script will produce the following output when run from the SQL Server Management Studio. The exception being thrown during the second update can be trapped by the UI and a nicely decorated message shown to the user.
Msg 50000, Level 16, State 1, Procedure dbo.proc_ModifyMyTest, Line 23 [Batch Start Line 9] Error encountered while changing row with myKey = 1. Please refresh your data and try again.
Tip:
The global variable @@DBTS returns the last-used timestamp value (VARBINARY) of the current database.
Conclusion
Optimistic concurrency is widely used in production systems. While the locking and concurrency models are huge topics to cover in a handful of articles, I hope that this post gives you a simple, conceptual overview of the Optimistic Concurrency model.
Please note:
The ROWVERSION data type is just an incrementing number and does not preserve a date or a time
A table can have only one ROWVERSION column
ROWVERSION should not be used as keys, especially primary keys
Using a SELECT INTO statement has the potential to generate duplicate ROWVERSION values
Finally for those who came from SQL 2005, the TIMESTAMP is deprecated. Please avoid using this in new DDL statements
Unlike TIMESTAMP, the ROWVERSION column needs a column name in the DDL statements
This is actually a spin-off from a Question of the Day, which appeared in SQLServerCentral (December 16, 2010).
A brief History
For those of you who have been around in the industry for more than a decade, the term Y2K carries mixed feelings. It was a very much hyped and very real problem, but with a very simple solution. A lot of our software systems were found to be vulnerable, and in hindsight, it was because of the human tendency to assume. Most of the systems designers did not predict that their systems would run for 20 or so years, and were hence bitten by the famous “Y2K bug”.
For those of came in late, the Y2K issue was valid only for systems using a 2 digit year representation. Before the year 2000, most of us had a habit of using 98 to represent the year 1998 and so on. Most systems were designed keeping this in mind, and hence when the year 2000 would set in, the year 00 would thus become, 1900 – throwing all financial calculations out of the window.
The “Y2K” Solution
There were two possible solutions:
Use a 4 digit year
Change the assumptions:
Anything greater than “50” would mean we are in the 20th century (i.e. 50 is 1950, 60 is 1960 and so on)
Anything less than “50” is the 21st century (i.e. 00 is 2000, 10 is 2010 and so on)
Those who used 4 digit years (option #1) are covered, and will continue to remain covered till we roll over to 5 digit years, and if the systems we build today are still running. However, for those using option #2, the ghost of the Y2K is not yet gone. In the year 2049, the ghost will come back to haunt us.
Most systems, especially today’s database systems have been designed keeping both options in mind. Microsoft SQL Server also accepts two digit years and converts them into a 4 digit year value based on the logic above. The default two-digit year cutoff for Microsoft SQL Server is the year 2049.
Depending upon where and how you do business, the business requirement behind the two-digit year cutoff may vary. Microsoft SQL Server does provide a way to override this default value of 2049 via an advanced configuration option. Today, I will be demonstrating how to use the same.
A Demonstration
Consider the following example:
By default, Microsoft SQL Server has it’s two-digit year cutoff as 2049, which causes the two digit year “50” to be interpreted as “1950”.
Just as where there is a will, there is always a way; similarly, where there are ghosts, there are always ghost busters. The ghost buster in this case has been provided by Microsoft in the form of a configurable 2-digit year cutoff value.
The steps below show us how to change the default two digit year cutoff:
The two digit year cutoff is an advanced option. Hence, turn advanced options ON
Check the default/current value of the two digit year cutoff
Change the value of the two digit year cutoff
Check that the modifications have taken effect
NOTE: I have used the keyword “RECONFIGURE”, which specifies that the configuration setting does not require a server stop and restart, and that the currently running value should be updated. Please use this option with utmost care.
Now, let us run the example again:
Notice that the year “49” is now interpreted as the year “1949”. We have successfully changed the two digit cutoff year.
Now, let’s rollback the settings of the SQL server instance to leave it as we found it.
In Conclusion
As always, nothing comes without it’s own pros and cons. Before you go about changing (or not changing) the two digit cutoff value of your SQL Server, please keep the following in mind:
Per MSDN (http://msdn.microsoft.com/en-us/library/ms189577.aspx), the two digit year cutoff values for SQL Server and OLE Automation objects are different. That of SQL Server is 2049, while that of OLE Automation objects is 2030. If your client applications use 2-digit years, it is advisable to have both in sync
The two-digit year provision should be treated as limited-time feature. Applications should be developed using, or should plan to move to 4 digit years
The above demonstration was just for changing the behaviour of Microsoft SQL Server. Other applications (like Excel) are also affected by the “ghost of the Y2K” and separate efforts and planning should be done to avoid any issues. For Excel users, please ensure that you have the required range set in Control Panel –> Regional and Language Options –> Customize –> Date tab.
I trust that the above gives you enough of a heads-up about the upcoming ghost of Y2K and a possibly increased time window to re-engineer your system. If you need the scripts that I ran through in this post, please visit my profile on this site.