Tag Archives: Development

Articles on Microsoft SQL Server development

#0358 – SQL Server – Best Practices – Avoid implicit conversions (Part 02) – Implicit conversions, COALESCE and ISNULL


Last week, I talked about how implicit conversions can cause unexpected issues. In this post, I will take the example forward. Implicit conversion can create problems not only during basic data load, but also during data retrieval and manipulation operations. Today, we will see how implicit conversions can cause COALESCE to error out in a rather unexpected way.

The script below creates the test data similar to what we saw in the previous post.

USE tempdb;
GO
--Demo objects & data
IF OBJECT_ID('dbo.BewareOfImplicitConversion','U') IS NOT NULL
    DROP TABLE dbo.BewareOfImplicitConversion;
GO

CREATE TABLE dbo.BewareOfImplicitConversion
    ( RecordId                   INT         NOT NULL IDENTITY(100,10),
      RowDescription             VARCHAR(20) NOT NULL,
      IntMasqueradingAsCharacter VARCHAR(20)     NULL,
      IntValue                   INT             NULL
    );
GO

INSERT INTO dbo.BewareOfImplicitConversion (RowDescription, IntMasqueradingAsCharacter, IntValue)
VALUES ('One',   NULL, 1),
       ('Two',   NULL, 4),
       ('Three',  '3', 6),
       ('Four',   '4', 8),
       ('Five',   'V', 10);
GO

Now, let us try to select the data from the test table – we will extract the data for all rows except the last one (where we have inserted a character value (V) instead of a numeric value in the column: dbo.BewareOfImplicitConversion.IntMasqueradingAsCharacter.

USE tempdb;
GO
--With COALESCE, everything will work fine untill a character data is encountered
SELECT ic.RecordId,
       ic.RowDescription,
       COALESCE(ic.IntMasqueradingAsCharacter, ic.IntValue) AS CalculatedValue
FROM dbo.BewareOfImplicitConversion AS ic
WHERE ic.IntValue < 10;
GO

image

As we can see from the screenshot above, the SELECT worked and COALESCE was able to successfully use the values from the integer column if the column IntMasqueradingAsCharacter was NULL. Now, let us try to fetch data for the last row.

USE tempdb;
GO
SELECT ic.RecordId,
       ic.RowDescription,
       COALESCE(ic.IntMasqueradingAsCharacter, ic.IntValue) AS CalculatedValue
FROM dbo.BewareOfImplicitConversion AS ic
WHERE ic.IntValue = 10;
GO

The following error is returned:

image

Msg 245, Level 16, State 1, Line 1
Conversion failed when converting the varchar value ‘V’ to data type int.

The reason is quite simple – as discussed in my previous post, data type precedence comes into effect and the  character data is implicitly converted to an integer.

Root Cause

Per MSDN (see references below), COALESCE follows the CASE expression rules and returns the data type of value with the highest precedence. This means that COALESCE would have attempted to convert supplied values to the data-type with highest precedence, which in this case is INT (the data-type of the column IntValue). Because the value ‘V’ cannot be implicitly converted to an integer, the COALESCE fails and returns the data-type conversion error.

A possible solution

One of the solutions is to use the ISNULL function instead. ISNULL() uses the data-type of the 1st supplied parameter (and all other parameters are converted to it accordingly). What this means is that in this case, all parameters will be treated as character values even though a data-type with higher precedence exists in the inputs.

Here’s the output when we use ISNULL() instead of COALESCE().

USE tempdb;
GO
--With ISNULL, the return datatype is always same as the data-type of the first expression.
SELECT ic.RecordId,
       ic.RowDescription,
       ISNULL(ic.IntMasqueradingAsCharacter, ic.IntValue) AS CalculatedValue
FROM dbo.BewareOfImplicitConversion AS ic
WHERE ic.IntValue < 10;

SELECT ic.RecordId,
       ic.RowDescription,
       ISNULL(ic.IntMasqueradingAsCharacter, ic.IntValue) AS CalculatedValue
FROM dbo.BewareOfImplicitConversion AS ic
WHERE ic.IntValue = 10;
GO

image

Conclusion

As I end this 2-part series, the only point I want to make is that one should be very, very careful in the initial database design and during code review in choosing the right data-types and data-type combinations.

These issues are very data centric and the analogy that I often like to quote is that these issues are like ghosts under the bed – they lie dormant for most of the time, but when the right data conditions are available, they raise their head  – simply because the development teams did not keep the necessary conversions and checks in place.

References

  • COALESCE and it’s comparison with ISNULL [MSDN Link]

Until we meet next time,
Be courteous. Drive responsibly.

#0357 – SQL Server – Best Practices – Avoid Implicit conversions as they may cause unexpected issues (Part 01)


Often, I see database developers relying on implicit conversions more than they should. Implicit conversions can use unexpected problems if not used judiciously.

In this post, I will demonstrate how implicit conversions can cause issues when trying to insert data in a table using the VALUES clause. Assume the following sample object into which we want to insert the data.

USE tempdb;
GO
--Demo objects
BEGIN
    IF OBJECT_ID('dbo.BewareOfImplicitConversion','U') IS NOT NULL
        DROP TABLE dbo.BewareOfImplicitConversion;

    CREATE TABLE dbo.BewareOfImplicitConversion
        ( RecordId                   INT         NOT NULL IDENTITY(100,10),
          RowDescription             VARCHAR(20) NOT NULL,
          IntMasqueradingAsCharacter VARCHAR(20)     NULL,
          IntValue                   INT             NULL
        );
END
GO

The column – dbo.BewareOfImplicitConversion.IntMasqueradingAsCharacter has been specifically named because we will be trying to insert numerical data into this column (which is supposed to hold character data) – it is perhaps one of the most common scenarios as far as integrating systems are concerned, however, we will be adding a twist – we will be inserting numeric data as-is, i.e. as integer values.

INSERT INTO dbo.BewareOfImplicitConversion (RowDescription, IntMasqueradingAsCharacter, IntValue)
VALUES ('One',   NULL, 1),
       ('Two',   NULL, 4),
       ('Three',    3, 6),
       ('Four',     4, 8),
       ('Five',   'V', 10);
GO

What we also did was to try and insert, in the same batch, one row of data that actually has character data in the column of interest. When executed, we end up in the following error:

Msg 245, Level 16, State 1, Line 3
Conversion failed when converting the varchar value ‘V’ to data type int.

However, when executed as shown below, the statements work just fine and insert all the 5 records into the table (dbo.BewareOfImplicitConversion).

INSERT INTO dbo.BewareOfImplicitConversion (RowDescription, IntMasqueradingAsCharacter, IntValue)
VALUES ('One',   NULL, 1),
       ('Two',   NULL, 4),
       ('Three',    3, 6),
       ('Four',     4, 8);

INSERT INTO dbo.BewareOfImplicitConversion (RowDescription, IntMasqueradingAsCharacter, IntValue)
VALUES ('Five',   'V', 10);
GO

SELECT * FROM dbo.BewareOfImplicitConversion;

Result:

image

Root Cause

The root cause here is very simple – before inserting data into the object, SQL Server has to first parse the queries populating the source data. During this process, the rules of data-type precedence come into the picture.

Data type precedence states that whenever an operator combines data from two expressions of different data types, the data-type with the lower precedence (in this case, it is the character data type – VARCHAR) is converted implicitly to a data-type with higher precedence (in this case – INT). When an implicit conversion fails, SQL Server will return an error.

In our example, the implicit conversion worked for the first 4 records being inserted because they did not require any implicit conversion, Implicit conversion is implemented for the data in the 5th record. Because they are part of the same statement, SQL Server will try to convert the string data implicitly to an integer value – causing the failure.

Solution/Best Practice:

Always use proper data-type casting and reference styles as shown in the query below and it is guaranteed to work.

INSERT INTO dbo.BewareOfImplicitConversion (RowDescription, IntMasqueradingAsCharacter, IntValue)
VALUES ('One',   NULL, 1),
       ('Two',   NULL, 4),
       ('Three',  '3', 6),
       ('Four',   '4', 8),
       ('Five',   'V', 10);
GO

Further Reading:

Until we meet next time,
Be courteous. Drive responsibly.

#0356 – SQL Server – Search with the LIKE operator and the underscore (_) wildcard character


String searches in most applications are based on the “contains” or “begins-with” paradigms – which are implemented by the use of the LIKE logical operator. The LIKE operator allows for the use of wildcard characters (underscore being one of them), which is the part that most people unknowingly overlook when they have been working with T-SQL for quite some time.

One such incident happened the other day at work when I was asked by a colleague to help him out with a query that appeared correct, but failed to give the expected results. For the sake of brevity, I have condensed the issue into a test scenario below:

DECLARE @testTbl TABLE (Value VARCHAR(50));

INSERT INTO @testTbl (Value)
VALUES ('SQL 2012'),
('SQL_2014');

SELECT * FROM @testTbl WHERE Value LIKE 'SQL_20%';

image

As can be seen in the screenshot above, the results are not what one would expect them to be at the first glance. One would have expected it to return only the string “SQL_2014” and yet the T-SQL query is returning both the records.

Normally, data coming in via flat-files from raw-data collection systems would require such kinds of string searches and manipulations. A well designed system would have these static values as a look-up so that all references look at the same version of the string value

Solution(s)

There are not one, but two possible solutions to choose from to work around this issue.

Using square brackets ([]) around the wildcard character

Because we want to consider the underscore as a character and not as a wildcard, the easiest thing to do would be to surround it with square brackets, similar to what is shown in the query below.

SELECT * FROM @testTbl WHERE Value LIKE 'SQL[_]20%';

image

Using the ESCAPE keyword

The other option, which I have discussed in one of my earlier posts, is to use the ESCAPE keyword. The ESCAPE keyword works on the basis of a user specified escape sequence. When encountered, the query engine will simply ignore the wildcard character and treat it as a normal character.

SELECT * FROM @testTbl WHERE Value LIKE 'SQL*_20%' ESCAPE '*';

image

Conclusion

The moral here is that one must always remember the basics – this thumb rule not only applies to software, but in everything that we go about doing in our day to day lives.

The other key learning that I encourage everyone to take from this is that Occam’s Razor is real – from the multiple explanations that a problem may have, the explanation with the least number of assumptions should be selected as the answer. In this case, the simplest theory was –  human error, which was indeed the case.

References

Until we meet next time,
Be courteous. Drive responsibly.

#0355 – SQL Server – Implementing Case Sensitive Search in a case Insensitive database – Regular Expressions & PATINDEX


In my previous post (#354)on implementing case sensitive search in a database that was case insensitive, I had presented one possible implementation approach that used the COLLATE keyword to temporary convert the column of interest to a case sensitive collation.

I had mentioned that the mechanism internally used the CONVERT() function with the style parameter = 0, which means that it was performing a binary to character conversion and then executing the comparison, resulting in a performance overhead to the system.

As I was asking for feedback from friends and colleagues, I received a couple of interesting ideas, the top 2 being:

  • Regular Expressions
  • PATINDEX

The reason that these caught my attention is quite simple – logic would say that they have to be natively case sensitive – one is a regular expression (which is case sensitive in other programming languages) and the other looks for a pattern within the supplied sting. However, there were serious doubts on whether the underlying column collation will have any effect on the behaviour of these options or not. I therefore decided to test out both the options.

Generating the Test Data

Let us create the same test table that was used in the previous post and also allow it to use the same case insensitive collation as the AdventureWorks2012 database.

USE AdventureWorks2012;
GO

--Step 01: Create test data
IF OBJECT_ID('dbo.Product','U') IS NOT NULL
    DROP TABLE dbo.Product;
GO
CREATE TABLE dbo.Product 
    (ProductId INT NOT NULL IDENTITY(1,1),
     ProductName VARCHAR(100) NOT NULL,
     VendorName VARCHAR(100)
    );
GO

--Add some test data
INSERT INTO dbo.Product (ProductName, VendorName)
VALUES ('Cycles','Contosso'),
       ('cBike', 'AdventureWorks'),
       ('SomeOtherProduct','SomeOtherVendor');
GO

Using Regular Expressions

Now, the regular expression to execute a search for all products whose name begins with “c” (small case) is “[c]%”.

Let me try to use this regular expression in my search query:

USE AdventureWorks2012;
GO
SELECT ProductId,
       ProductName,
       VendorName
FROM dbo.Product 
WHERE ProductName LIKE '[c]%';
GO

The result is shown in the screenshot below. Records for all products beginning with the letter “c” were returned – irrespective of the case.

image

Using PATINDEX

The second search mechanism that I will be evaluating is the PATINDEX, which basically returns the index or the position where a particular character pattern is found in a given string. If the pattern is not found, the value returned is 0.

USE AdventureWorks2012;
GO
SELECT ProductId,
       ProductName,
       VendorName
FROM dbo.Product 
WHERE PATINDEX('c%',ProductName) > 0;
GO

Again the result is as shown in the screenshot below – a case in-sensitive search was performed by SQL Server.

image

Conclusion

The collation of a particular column determines whether a search on a column will be case sensitive or  case in-sensitive. This is not dependent upon the mechanism used for the comparison (i.e. a simple query v/s regular expressions v/s pattern matching).

Do you have any ideas on implementing  case sensitive search on a case in-sensitive database column? If so, please share in the blog comments.

Further Reading

Until we meet next time,
Be courteous. Drive responsibly.

#0354 – SQL Server – Implementing case sensitive search in a case insensitive database


Searching data from a column is a very common requirement – but with unique variants in each application. Recently, I came across one such requirement:

  1. Underlying database, table and columns use a case in-sensitive (CI) collation
  2. Application must be able to switch between a case sensitive v/s a  case insensitive comparison

The ability to switch between a case sensitive and case insensitive comparison is the challenge because it needs to be done on a database that inherently does not discriminate between the two.

In this post, I will present one method of implementing this requirement. If you have any others that you would like to share, please feel free to do so in the comments area.

Temporarily using Case Sensitive Collations

The most common implementation pattern to realize the case sensitive search requirement is to temporarily use a collation that is case sensitive as shown below.

To begin with, let us create a test table and insert some fictional test data.

USE AdventureWorks2012;
GO

--Step 01: Create test data
IF OBJECT_ID('dbo.Product','U') IS NOT NULL
    DROP TABLE dbo.Product;
GO
CREATE TABLE dbo.Product 
    (ProductId INT NOT NULL IDENTITY(1,1),
     ProductName VARCHAR(100) NOT NULL,
     VendorName VARCHAR(100)
    );
GO

--Add some test data
INSERT INTO dbo.Product (ProductName, VendorName)
VALUES ('Cycles','Contosso'),
       ('cBike', 'AdventureWorks'),
       ('SomeOtherProduct','SomeOtherVendor');
GO

Next, let us fetch the collation of the database where our test table resides. You may also want to use the system stored procedure (sp_help) to study the collation of the table columns (which by default would be same as that of the database in which the table resides). On my server, the collation was – [SQL_Latin1_General_CP1_CI_AS] which is case in-sensitive (notice the CI in the collation name).

--Step 02: Check the database collation
USE AdventureWorks2012;
GO
SELECT DATABASEPROPERTYEX(N'AdventureWorks2012',
                          N'Collation');
GO
--For AdvenutreWorks2012 
--SQL_Latin1_General_CP1_CI_AS

--Check the collation of the table columns
sp_help [dbo.Product];
GO

Now, attempt to query the table for all products whose names begin with the letter ‘c’ (small case). To do so, we need to temporarily COLLATE the column – dbo.Product.ProductName to a case sensitive collation (in my case, this is SQL_Latin1_General_CP1_CS_AS).

USE AdventureWorks2012;
GO
SELECT ProductId,
       ProductName,
       VendorName
FROM dbo.Product 
WHERE ProductName 
    COLLATE SQL_Latin1_General_CP1_CS_AS
    LIKE 'c%';
GO

An observation

As can be seen from the image below, desired results are obtained. However, if one has the actual execution plan opened up, an interesting observation can be made if we look at the properties of the SELECT operation.

Results of case-sensitive comparison

image

We can observe that SQL Server internally uses the CONVERT function with a style of 0. Per Books On Line (see references below), this indicates a binary to character conversion. This conversion is performed for all records in the table, which in-turn affects performance.

Conclusion

Because we have the overhead of applying a conversion on all rows of a particular table, this method should be used only when the column cannot be modified to permanently use a case sensitive collation and a business need exists to provide both case sensitive & insensitive search.

Further Reading

Until we meet next time,
Be courteous. Drive responsibly.