In my previous post (#354)on implementing case sensitive search in a database that was case insensitive, I had presented one possible implementation approach that used the COLLATE keyword to temporary convert the column of interest to a case sensitive collation.
I had mentioned that the mechanism internally used the CONVERT() function with the style parameter = 0, which means that it was performing a binary to character conversion and then executing the comparison, resulting in a performance overhead to the system.
As I was asking for feedback from friends and colleagues, I received a couple of interesting ideas, the top 2 being:
The reason that these caught my attention is quite simple – logic would say that they have to be natively case sensitive – one is a regular expression (which is case sensitive in other programming languages) and the other looks for a pattern within the supplied sting. However, there were serious doubts on whether the underlying column collation will have any effect on the behaviour of these options or not. I therefore decided to test out both the options.
Generating the Test Data
Let us create the same test table that was used in the previous post and also allow it to use the same case insensitive collation as the AdventureWorks2012 database.
USE AdventureWorks2012; GO --Step 01: Create test data IF OBJECT_ID('dbo.Product','U') IS NOT NULL DROP TABLE dbo.Product; GO CREATE TABLE dbo.Product (ProductId INT NOT NULL IDENTITY(1,1), ProductName VARCHAR(100) NOT NULL, VendorName VARCHAR(100) ); GO --Add some test data INSERT INTO dbo.Product (ProductName, VendorName) VALUES ('Cycles','Contosso'), ('cBike', 'AdventureWorks'), ('SomeOtherProduct','SomeOtherVendor'); GO
Using Regular Expressions
Now, the regular expression to execute a search for all products whose name begins with “c” (small case) is “[c]%”.
Let me try to use this regular expression in my search query:
USE AdventureWorks2012; GO SELECT ProductId, ProductName, VendorName FROM dbo.Product WHERE ProductName LIKE '[c]%'; GO
The result is shown in the screenshot below. Records for all products beginning with the letter “c” were returned – irrespective of the case.
The second search mechanism that I will be evaluating is the PATINDEX, which basically returns the index or the position where a particular character pattern is found in a given string. If the pattern is not found, the value returned is 0.
USE AdventureWorks2012; GO SELECT ProductId, ProductName, VendorName FROM dbo.Product WHERE PATINDEX('c%',ProductName) > 0; GO
Again the result is as shown in the screenshot below – a case in-sensitive search was performed by SQL Server.
The collation of a particular column determines whether a search on a column will be case sensitive or case in-sensitive. This is not dependent upon the mechanism used for the comparison (i.e. a simple query v/s regular expressions v/s pattern matching).
Do you have any ideas on implementing case sensitive search on a case in-sensitive database column? If so, please share in the blog comments.
- Using Regular Expressions in CHECK constraints: http://beyondrelational.com/modules/2/blogs/77/Posts/19603/0285-sql-server-using-regular-expressions-with-check-constraints.aspx
- Identify records with special characters using Regular expressions: http://beyondrelational.com/modules/2/blogs/77/Posts/19775/0314-sql-server-identify-records-with-special-characters-using-regular-expressions.aspx
Until we meet next time,
Be courteous. Drive responsibly.