#0355 – SQL Server – Implementing Case Sensitive Search in a case Insensitive database – Regular Expressions & PATINDEX


In my previous post (#354)on implementing case sensitive search in a database that was case insensitive, I had presented one possible implementation approach that used the COLLATE keyword to temporary convert the column of interest to a case sensitive collation.

I had mentioned that the mechanism internally used the CONVERT() function with the style parameter = 0, which means that it was performing a binary to character conversion and then executing the comparison, resulting in a performance overhead to the system.

As I was asking for feedback from friends and colleagues, I received a couple of interesting ideas, the top 2 being:

  • Regular Expressions
  • PATINDEX

The reason that these caught my attention is quite simple – logic would say that they have to be natively case sensitive – one is a regular expression (which is case sensitive in other programming languages) and the other looks for a pattern within the supplied sting. However, there were serious doubts on whether the underlying column collation will have any effect on the behaviour of these options or not. I therefore decided to test out both the options.

Generating the Test Data

Let us create the same test table that was used in the previous post and also allow it to use the same case insensitive collation as the AdventureWorks2012 database.

USE AdventureWorks2012;
GO

--Step 01: Create test data
IF OBJECT_ID('dbo.Product','U') IS NOT NULL
    DROP TABLE dbo.Product;
GO
CREATE TABLE dbo.Product 
    (ProductId INT NOT NULL IDENTITY(1,1),
     ProductName VARCHAR(100) NOT NULL,
     VendorName VARCHAR(100)
    );
GO

--Add some test data
INSERT INTO dbo.Product (ProductName, VendorName)
VALUES ('Cycles','Contosso'),
       ('cBike', 'AdventureWorks'),
       ('SomeOtherProduct','SomeOtherVendor');
GO

Using Regular Expressions

Now, the regular expression to execute a search for all products whose name begins with “c” (small case) is “[c]%”.

Let me try to use this regular expression in my search query:

USE AdventureWorks2012;
GO
SELECT ProductId,
       ProductName,
       VendorName
FROM dbo.Product 
WHERE ProductName LIKE '[c]%';
GO

The result is shown in the screenshot below. Records for all products beginning with the letter “c” were returned – irrespective of the case.

image

Using PATINDEX

The second search mechanism that I will be evaluating is the PATINDEX, which basically returns the index or the position where a particular character pattern is found in a given string. If the pattern is not found, the value returned is 0.

USE AdventureWorks2012;
GO
SELECT ProductId,
       ProductName,
       VendorName
FROM dbo.Product 
WHERE PATINDEX('c%',ProductName) > 0;
GO

Again the result is as shown in the screenshot below – a case in-sensitive search was performed by SQL Server.

image

Conclusion

The collation of a particular column determines whether a search on a column will be case sensitive or  case in-sensitive. This is not dependent upon the mechanism used for the comparison (i.e. a simple query v/s regular expressions v/s pattern matching).

Do you have any ideas on implementing  case sensitive search on a case in-sensitive database column? If so, please share in the blog comments.

Further Reading

Until we meet next time,
Be courteous. Drive responsibly.

Advertisements

2 thoughts on “#0355 – SQL Server – Implementing Case Sensitive Search in a case Insensitive database – Regular Expressions & PATINDEX

  1. Jeff Moden

    Good follow up, Nakul!

    Just to add to this, even if the PATINDEX method had worked, it would not have been SARGable and, thus, would not have been able to take advantage of Index SEEKs.

    Also, (maybe I’ve not looked hard enough), I don’t see your good name anywhere on this blog except under “Tweets”.

    –Jeff Moden

    Liked by 1 person

    Reply
  2. Solomon

    Hi there. I should point out two technicalities here:

    1) Using “[c]%” is actually not a Regular Expression. The square brackets are a single character wildcard for the LIKE operator ( https://msdn.microsoft.com/en-us/library/ms179859.aspx ). This is certainly a similar syntax to what is available in a Regular Expression, but it very much is not a Regular Expression on any level. As such, it is built into SQL Server and follows the same rules as PATINDEX and all other functions and operators (i.e. =, , etc) with regards to COLLATE.

    True Regular Expressions are only available, in SQL Server, via SQLCLR, in which case they are naturally case-sensitive unless the option for Case Insensitive matching is used.

    2) Case-sensitivity, when achieved via a collation labeled as “CS” does not always work as most people are expecting. Equality matches type do work as expected, but range searches are not exclusive to upper or lower case. You might need to use a binary collation (i.e. one ending in either _BIN or _BIN2) to achieve explicit case sensitivity.

    I explain this issue in more detail in this Stack Overflow answer ( http://stackoverflow.com/questions/26176370/why-does-filtering-on-a-range-match-the-wrong-case-when-using-a-case-sensitive-c/26185647#26185647 ) which I will hopefully soon have time to expand into a full article.

    I hope this helps. Take care, Solomon…

    Like

    Reply

Let me know what you think about this post by leaving your feedback here!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s