In my previous post (#354)on implementing case sensitive search in a database that was case insensitive, I had presented one possible implementation approach that used the COLLATE keyword to temporary convert the column of interest to a case sensitive collation.
I had mentioned that the mechanism internally used the CONVERT() function with the style parameter = 0, which means that it was performing a binary to character conversion and then executing the comparison, resulting in a performance overhead to the system.
As I was asking for feedback from friends and colleagues, I received a couple of interesting ideas, the top 2 being:
The reason that these caught my attention is quite simple – logic would say that they have to be natively case sensitive – one is a regular expression (which is case sensitive in other programming languages) and the other looks for a pattern within the supplied sting. However, there were serious doubts on whether the underlying column collation will have any effect on the behaviour of these options or not. I therefore decided to test out both the options.
Generating the Test Data
Let us create the same test table that was used in the previous post and also allow it to use the same case insensitive collation as the AdventureWorks2012 database.
USE AdventureWorks2012; GO --Step 01: Create test data IF OBJECT_ID('dbo.Product','U') IS NOT NULL DROP TABLE dbo.Product; GO CREATE TABLE dbo.Product (ProductId INT NOT NULL IDENTITY(1,1), ProductName VARCHAR(100) NOT NULL, VendorName VARCHAR(100) ); GO --Add some test data INSERT INTO dbo.Product (ProductName, VendorName) VALUES ('Cycles','Contosso'), ('cBike', 'AdventureWorks'), ('SomeOtherProduct','SomeOtherVendor'); GO
Using Regular Expressions
Now, the regular expression to execute a search for all products whose name begins with “c” (small case) is “[c]%”.
Let me try to use this regular expression in my search query:
USE AdventureWorks2012; GO SELECT ProductId, ProductName, VendorName FROM dbo.Product WHERE ProductName LIKE '[c]%'; GO
The result is shown in the screenshot below. Records for all products beginning with the letter “c” were returned – irrespective of the case.
The second search mechanism that I will be evaluating is the PATINDEX, which basically returns the index or the position where a particular character pattern is found in a given string. If the pattern is not found, the value returned is 0.
USE AdventureWorks2012; GO SELECT ProductId, ProductName, VendorName FROM dbo.Product WHERE PATINDEX('c%',ProductName) > 0; GO
Again the result is as shown in the screenshot below – a case in-sensitive search was performed by SQL Server.
The collation of a particular column determines whether a search on a column will be case sensitive or case in-sensitive. This is not dependent upon the mechanism used for the comparison (i.e. a simple query v/s regular expressions v/s pattern matching).
Do you have any ideas on implementing case sensitive search on a case in-sensitive database column? If so, please share in the blog comments.
- Using Regular Expressions in CHECK constraints: http://beyondrelational.com/modules/2/blogs/77/Posts/19603/0285-sql-server-using-regular-expressions-with-check-constraints.aspx
- Identify records with special characters using Regular expressions: http://beyondrelational.com/modules/2/blogs/77/Posts/19775/0314-sql-server-identify-records-with-special-characters-using-regular-expressions.aspx
Until we meet next time,
Be courteous. Drive responsibly.
Good follow up, Nakul!
Just to add to this, even if the PATINDEX method had worked, it would not have been SARGable and, thus, would not have been able to take advantage of Index SEEKs.
Also, (maybe I’ve not looked hard enough), I don’t see your good name anywhere on this blog except under “Tweets”.
LikeLiked by 1 person
Hi there. I should point out two technicalities here:
1) Using “[c]%” is actually not a Regular Expression. The square brackets are a single character wildcard for the LIKE operator ( https://msdn.microsoft.com/en-us/library/ms179859.aspx ). This is certainly a similar syntax to what is available in a Regular Expression, but it very much is not a Regular Expression on any level. As such, it is built into SQL Server and follows the same rules as PATINDEX and all other functions and operators (i.e. =, , etc) with regards to COLLATE.
True Regular Expressions are only available, in SQL Server, via SQLCLR, in which case they are naturally case-sensitive unless the option for Case Insensitive matching is used.
2) Case-sensitivity, when achieved via a collation labeled as “CS” does not always work as most people are expecting. Equality matches type do work as expected, but range searches are not exclusive to upper or lower case. You might need to use a binary collation (i.e. one ending in either _BIN or _BIN2) to achieve explicit case sensitivity.
I explain this issue in more detail in this Stack Overflow answer ( http://stackoverflow.com/questions/26176370/why-does-filtering-on-a-range-match-the-wrong-case-when-using-a-case-sensitive-c/26185647#26185647 ) which I will hopefully soon have time to expand into a full article.
I hope this helps. Take care, Solomon…