A long time ago, I wrote a piece on Implementing Windows OS-style Wild Card Search in SQL Server. Today’s post is a note on the possible performance implications on the system when implementing the search.
Normally, there are two variants of search operations supported in a system:
- A “begins-with” search – this search identifies all records with values that begin with the search string/criteria. For example, a search criteria “SQL” would identify “SQL Server”, but not “Microsoft SQL Server”
- An “ends-with” search – this search identifies all records with values that end with the search string/criteria. For example, a search criteria “2012” would identify “SQL Server 2012”, but not “Microsoft Windows 2012 Server”
- A “contains” search – this search identifies all records with values that contain the search string/criteria. For example, a search criteria “SQL” would identify both “SQL Server” and “Microsoft SQL Server”
Although “contains” and “ends with” searches are very lucrative as a feature, they will have performance implications. Today, I present a very simple demonstration which shows a problem that the these search modes will create:
To begin with, I will create a table that has some test data and a non-clustered index on the column FileNameValue, which is to be searched upon.
USE tempdb ; GO IF OBJECT_ID('dbo.WildCardSearchDemoTable','U') IS NOT NULL DROP TABLE dbo.WildCardSearchDemoTable; GO CREATE TABLE dbo.WildCardSearchDemoTable ( Id INT IDENTITY(1, 1), FileNameValue VARCHAR(100), CONSTRAINT pk_WildCardSearch PRIMARY KEY CLUSTERED ( Id ) ) ; GO CREATE NONCLUSTERED INDEX nidx_WildCardSearchDemoTable ON dbo.WildCardSearchDemoTable (FileNameValue); GO INSERT INTO dbo.WildCardSearchDemoTable ( FileNameValue ) VALUES ( 'Nakul Vachhrajani.xls' ), ( 'BR.xls' ), ( 'Blogs.docx' ), ( 'Swipe Times for Nakul.xls' ), ( 'Sample%File.txt' ), ( 'SampleFile.txt' ); GO
Now, I will turn on the Actual Execution plan (for those who like to use the keyboard, press Ctrl + M within the Query Editor window) and run the following queries:
USE tempdb; GO --"Contains " search SELECT 'Contains' AS SearchType, dt.Id, dt.FileNameValue FROM dbo.WildCardSearchDemoTable AS dt WHERE dt.FileNameValue LIKE '%Nakul%'; GO --"Ends With" search SELECT 'Ends With' AS SearchType, dt.Id, dt.FileNameValue FROM dbo.WildCardSearchDemoTable AS dt WHERE dt.FileNameValue LIKE '%Nakul.xls'; GO --"Begins With" search SELECT 'Begins With' AS SearchType, dt.Id, dt.FileNameValue FROM dbo.WildCardSearchDemoTable AS dt WHERE dt.FileNameValue LIKE 'Nakul%.xls'; GO
The below screenshots show the results and the actual execution plan for each of the above queries:
As can be seen from the actual execution plan, both the “Contains” and the “Ends with” search produced a plan that involved an Index Scan, whereas the “Begins-with” search generated a plan involving an Index Seek, which as we know is better from a performance perspective when a large amount of data is concerned.
Conclusion
Almost all applications have some sort of search functionality built into them. However, when the time comes to define the specifications for and implement the search functionality, I would suggest you work with the business teams to really explain to them the various types of search and what their implications will be from the application performance perspective. Once they understand this, then make a well informed decision as to which search should be implemented as a general feature in your application.
Further Reading
- Implementing Wildcard Search-ESCAPE keyword [Link]
Until we meet next time,
Pingback: #0356 – SQL Server – Search with the LIKE operator and the underscore (_) wildcard character | SQLTwins by Nakul Vachhrajani