Category Archives: #SQLServer

All about Microsoft SQL Server

#0306 – SQL Server – Best Practices – Avoid wild-card characters at the beginning of a search string


A long time ago, I wrote a piece on Implementing Windows OS-style Wild Card Search in SQL Server. Today’s post is a note on the possible performance implications on the system when implementing the search.


Normally, there are two variants of search operations supported in a system:



  1. A “begins-with” search – this search identifies all records with values that begin with the search string/criteria. For example, a search criteria “SQL” would identify “SQL Server”, but not “Microsoft SQL Server”
  2. An “ends-with” search – this search identifies all records with values that end with the search string/criteria. For example, a search criteria “2012” would identify “SQL Server 2012”, but not “Microsoft Windows 2012 Server”
  3. A “contains” search – this search identifies all records with values that contain the search string/criteria. For example, a search criteria “SQL” would identify both “SQL Server” and “Microsoft SQL Server”

Although “contains” and “ends with” searches are very lucrative as a feature, they will have performance implications. Today, I present a very simple demonstration which shows a problem that the these search modes will create:


To begin with, I will create a table that has some test data and a non-clustered index on the column FileNameValue, which is to be searched upon.

USE tempdb ;
GO

IF OBJECT_ID('dbo.WildCardSearchDemoTable','U') IS NOT NULL
    DROP TABLE dbo.WildCardSearchDemoTable;
GO

CREATE TABLE dbo.WildCardSearchDemoTable
    (
      Id INT IDENTITY(1, 1),
      FileNameValue VARCHAR(100),
      CONSTRAINT pk_WildCardSearch PRIMARY KEY CLUSTERED ( Id )
    ) ;
GO

CREATE NONCLUSTERED INDEX nidx_WildCardSearchDemoTable 
ON dbo.WildCardSearchDemoTable (FileNameValue);
GO

INSERT  INTO dbo.WildCardSearchDemoTable ( FileNameValue )
VALUES  ( 'Nakul Vachhrajani.xls' ),
        ( 'BR.xls' ),
        ( 'Blogs.docx' ),
        ( 'Swipe Times for Nakul.xls' ),
        ( 'Sample%File.txt' ),
        ( 'SampleFile.txt' );
GO

Now, I will turn on the Actual Execution plan (for those who like to use the keyboard, press Ctrl + M within the Query Editor window) and run the following queries:

USE tempdb;
GO

--"Contains " search
SELECT  'Contains' AS SearchType,
        dt.Id,
        dt.FileNameValue
FROM    dbo.WildCardSearchDemoTable AS dt
WHERE   dt.FileNameValue LIKE '%Nakul%';
GO

--"Ends With" search
SELECT  'Ends With' AS SearchType,
        dt.Id,
        dt.FileNameValue
FROM    dbo.WildCardSearchDemoTable AS dt
WHERE   dt.FileNameValue LIKE '%Nakul.xls';
GO

--"Begins With" search
SELECT  'Begins With' AS SearchType,
        dt.Id,
        dt.FileNameValue
FROM    dbo.WildCardSearchDemoTable AS dt
WHERE   dt.FileNameValue LIKE 'Nakul%.xls';
GO

The below screenshots show the results and the actual execution plan for each of the above queries:


image


image


As can be seen from the actual execution plan, both the “Contains” and the “Ends with” search produced a plan that involved an Index Scan, whereas the “Begins-with” search generated a plan involving an Index Seek, which as we know is better from a performance perspective when a large amount of data is concerned.


Conclusion


Almost all applications have some sort of search functionality built into them. However, when the time comes to define the specifications for and implement the search functionality, I would suggest you work with the business teams to really explain to them the various types of search and what their implications will be from the application performance perspective. Once they understand this, then make a well informed decision as to which search should be implemented as a general feature in your application.


Further Reading



  • Implementing Wildcard Search-ESCAPE keyword [Link]

Until we meet next time,


Be courteous. Drive responsibly.

#0305 – SQL Server – Myths – UNION and UNION ALL – need same name, number, sequence & data-type of columns


Recently, we were working on a stored procedure which involved combining similar result sets from various queries into a single result set with the help of the UNION operator. As I was reviewing the query written by one of the team members, I noticed that they had explicitly specified column aliases for all the result-sets being consumed in the UNION.


In order to test the team’s understanding about the fundamentals, I simply mentioned that the same column name across all result sets in the UNION was not required. The conversation that followed went something like this:


Sample query being reviewed:

USE tempdb;
GO

--Table variables to simulate the result sets 
--being combiend using UNION
DECLARE @productSet1 TABLE (ProductId INT 
                                      NOT NULL 
                                      IDENTITY(1,1),
                            ProductName VARCHAR(50) 
                                        NULL
                           );

DECLARE @productSet2 TABLE (Id INT 
                               NOT NULL 
                               IDENTITY(1,1),
                            Product VARCHAR(50) 
                                    NULL
                           );

--Insert some test data
INSERT INTO @productSet1 (ProductName)
VALUES ('Camshaft'),
       ('Connecting Rod'),
       ('Starter Motor');

--Insert some test data
INSERT INTO @productSet2 (Product)
VALUES ('Piston'),
       ('Distributor'),
       ('10A Fuse'),
       ('16A Fuse');

--Query that was written by the developers
SELECT ps1.ProductId AS ProductId,
       ps1.ProductName AS ProductName
FROM @productSet1 AS ps1
UNION
SELECT ps2.Id AS ProductId,
       ps2.Product AS ProductName
FROM @productSet2 AS ps2;
GO

Me: What has been done is ideal in the sense that ensuring the same column names in all the result sets makes the query much more readable. But, UNION does not need that column names are the same across all result sets.


Team: No? But then how will SQL Server combine the result sets if it does not know which column in the subsequent result sets to map with which column from the first result set?


Me: All SQL Server needs is that the number and order (i.e. sequence) of columns in the result sets being consumed by the UNION are the same and the data types are compatible. It is not required that the column names are the same.


Team: In that case, what will be the column names in the result set?


Me: That’s quite simple – the final result set follows the column names from the first result set.


Team: Hmm… can you show an example?


Me: Sure! Let’s manipulate the given script itself for the quick demo.

--Assumption: Table variables referenced by the query already exist.

SELECT ps1.ProductId AS ProductId,
       ps1.ProductName AS ProductName
FROM @productSet1 AS ps1
UNION
SELECT ps2.Id,
       ps2.Product
FROM @productSet2 AS ps2;
GO

If we run this statement, we will see that the  UNION succeeds and the results follow the column names from the first result set:


image


Team: This is very interesting! So we can name the columns of result sets involved in a UNION to names of our choice.


Me: Yes. However, it may have an impact on the readability of the query. It’s better to have the name, number, sequence and data-types of the columns involved in a UNION same, but it’s not necessary to do so.


Team: Ok, now we get it. Thank-you, Nakul!


Me: You’re welcome!


Conclusion


Whenever two or more result sets are being combined using a UNION or UNION ALL, all one has to ensure is that:



  • The number & order of the columns being consumed in the UNION are the same across all queries
  • The data types of the columns across the queries must be compatible

It is NOT required (but is recommended from a readability perspective) to have the same column names across all queries.


Further Reading



  • UNION operator [Books On Line, Link]

Until we meet next time,


Be courteous. Drive responsibly.

#0304 – SQL Server – Key constraints are implemented as Indexes


Today’s post is about a SQL Server design aspect that’s not intuitive, yet is quite logical when you think about it.



All Key Constraints in Microsoft SQL Server are implemented as Indexes.


What the above statement means is that whenever we create a key constraint on a table (primary/foreign or unique key), internally an index is created by Microsoft SQL Server. We can see the use of these indexes when executing queries against the table when we look at the execution plan. When we think about it, we create key constraints so that they help us in establishing referential and data integrity. These keys therefore help us in retrieving data efficiently – which is also the job of indexes on the table. Hence, it quite logical that SQL Server implements key constraints as indexes.


Demo


To demonstrate this point, let me take a small example. The script below creates a table within the AdventureWorks2012 database, and create primary, foreign and unique key constraints on the table (Do keep in mind that this particular table design is for demonstration purposes only).

USE AdventureWorks2012;
GO
–Safety Check
IF OBJECT_ID(‘dbo.KeysAsIndexTest’,’U’) IS NOT NULL
DROP TABLE dbo.KeysAsIndexTest;
GO

–Create Table
CREATE TABLE dbo.KeysAsIndexTest (RecordId INT NOT NULL,
EmployeeId INT NOT NULL,
RecordValue VARCHAR(50) NOT NULL
);

–Primary Key
ALTER TABLE dbo.KeysAsIndexTest
ADD CONSTRAINT pk_KeysAsIndexTest
PRIMARY KEY CLUSTERED (RecordId);
GO

–Foreign Key
ALTER TABLE dbo.KeysAsIndexTest
ADD CONSTRAINT fk_Employee_KeysAsIndexTest
FOREIGN KEY (EmployeeId)
REFERENCES HumanResources.Employee (BusinessEntityId);
GO

–Unique Key
ALTER TABLE dbo.KeysAsIndexTest
ADD CONSTRAINT uk_KeysAsIndexTest
UNIQUE (RecordValue);
GO


After creating the table and the constraints, let us query the sys.indexes catalog view to see how many indexes have been created against the table:

USE AdventureWorks2012;
GO
SELECT OBJECT_NAME(si.object_id) AS ObjectName,
si.name AS IndexName,
si.type_desc AS IndexType,
si.is_unique AS IsUnique,
si.is_primary_key AS IsPrimaryKey,
si.is_unique_constraint AS IsUniqueConstraint,
si.index_id AS IndexId
FROM sys.indexes AS si
WHERE si.object_id = OBJECT_ID(‘dbo.KeysAsIndexTest’,’U’);
GO

The output of the above query is shown in the screenshot below:


image


As you can see, both the primary key and the unique key are implemented as indexes on the test table – dbo.KeysAsIndexTest.


However, where’s the index for the Foreign key?


The answer is quite simple and lies in the definition of a foreign key. Per Books On Line, “FOREIGN KEY constraints can reference only columns in PRIMARY KEY or UNIQUE constraints in the referenced table or in a UNIQUE INDEX on the referenced table.” This means that if a foreign key constraint was successfully defined, it will be referring to a primary key or a unique key on the referenced table which in-turn would have been implemented as indexes.


Foreign keys help ensure that if a value exists in the table, it must exist first in the referenced table column. Hence, the real index and lookup requirements are on the referenced table and not on the referencing table. So, as far as foreign keys are concerned, SQL Server goes the smart way by not creating what would be a duplicate index.


Concluding…


To me, it is an interesting aspect on the overall SQL Server design – it tells me how SQL Server works and how to get the most advantage by defining the correct keys. Because key constraints are implemented as indexes, they should also be defined to be the most selective in a join or a real-world scenario – an design point which many designs consider when defining primary and foreign keys, but miss when creating unique keys.


Further Reading



  • Clustered Indexes may not be unique, and unique indexes may not be clustered [Link]

  • Primary Keys without clustered indexes [Link]

  • Create Table [Books On Line, Link]

  • sys.indexes Catalog View [Books On Line, Link]

Until we meet next time,


Be courteous. Drive responsibly.

#0303 – SQL Server – TRUSTWORTHY property is reset to OFF by database attach or restore activity


The databases that I work with use a considerable amount of SQL-CLR functionality implemented via assemblies registered on the SQL Server instance. Recently, as we were closing up the day, we received a phone call with the news that one of our databases was down with a permissions error similar to the one shown below.

An error occurred in the Microsoft .NET Framework while trying to load assembly id nnnnn. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE. Run the query again, or check documentation to see how to solve the assembly trust issues.

Over the phone itself, we were immediately able to help the support teams resolve the issue by setting the TRUSTWORTHY property of our database to ON (as was required by our design). But, this incident triggered a bigger question:

What caused the TRUSTWORTHY property to be set to OFF automatically?

Of course, none of the involved administrators admitted to anything being changed on the server or on the SQL Server instance. Upon research and hanging around on the #sqlhelp community on Twitter, I was able to confirm that the TRUSTWORTHY property is reset to OFF by the following operations:

  • Restoring a database
  • Attaching a database to a SQL Server instance

Provided below is a very simple script that does the following:

  1. Creates a database
  2. Sets the TRUSTWORTHY property to ON
  3. Backup the database
  4. Detach the database
  5. Attach the database
  6. Drop the database so that it can be restored from the backup taken earlier

After the attach and the restore process, I query the sys.databases catalog view to check the value of the is_trustworthy_on column.

--Safety Check
USE master;
GO
IF EXISTS (SELECT * FROM sys.databases AS sd WHERE sd.name = 'TrustCheck')
BEGIN
    ALTER DATABASE TrustCheck SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE TrustCheck;
END
GO

--Create the database
CREATE DATABASE TrustCheck;
GO

ALTER DATABASE TrustCheck SET TRUSTWORTHY ON;
GO

--Check the database properties
SELECT 'Check after Database Creation',
       sd.name,
       sd.database_id,
       sd.create_date,
       sd.is_trustworthy_on
FROM sys.databases AS sd 
WHERE sd.name = 'TrustCheck';
GO

--Backup the database
BACKUP DATABASE TrustCheck 
TO DISK = 'C:Database FilesTrustCheck.bak' 
WITH INIT, 
     COMPRESSION;
GO

--Detach the database
EXEC sp_detach_db @dbname = 'TrustCheck',
                  @skipchecks = 'false';
GO

--Ensure that the database is no longer "available" on this instance
SELECT 'Check after Detach',
       sd.name,
       sd.database_id,
       sd.create_date,
       sd.is_trustworthy_on
FROM sys.databases AS sd 
WHERE sd.name = 'TrustCheck';
GO

USE master;
GO
--Attach the database
CREATE DATABASE TrustCheck
    ON  PRIMARY ( NAME = N'TrustCheck_Data', 
              FILENAME = N'C:Program FilesMicrosoft SQL ServerMSSQL11.SQL2K12MSSQLDATATrustCheck.mdf')
    FOR ATTACH
GO

--Check the database properties
SELECT 'Check after Database Attach',
       sd.name,
       sd.database_id,
       sd.create_date,
       sd.is_trustworthy_on
FROM sys.databases AS sd 
WHERE sd.name = 'TrustCheck';
GO

--Now drop the database explicitly to see what happens when we restore the database
USE master;
GO
IF EXISTS (SELECT * FROM sys.databases AS sd WHERE sd.name = 'TrustCheck')
BEGIN
    ALTER DATABASE TrustCheck SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE TrustCheck;
END
GO


--Restore the database
RESTORE DATABASE TrustCheck
FROM DISK = 'C:Database FilesTrustCheck.bak'
WITH RECOVERY;

--Check the database properties
SELECT 'Check after Database Restore',
       sd.name,
       sd.database_id,
       sd.create_date,
       sd.is_trustworthy_on
FROM sys.databases AS sd 
WHERE sd.name = 'TrustCheck';
GO

--Cleanup
USE master;
GO
IF EXISTS (SELECT * FROM sys.databases AS sd WHERE sd.name = 'TrustCheck')
BEGIN
    ALTER DATABASE TrustCheck SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE TrustCheck;
END
GO

The output of the query above is shown below.

image

Conclusion

If the TRUSTWORTHY setting was mysteriously reset to OFF, check whether the database was recently restored and/or attached.

Further Reading

  • Find DMVs in SSMS/Object Explorer [Link]
  • Catalog Views [Books on Line Link]
  • sys.databases Catalog View [Books On Line Link]
  • ALTER DATABASE {Books On Line Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0302 – SQL Server – Myths – ORDER BY does not impact performance


One of the most common mistakes that I see developers do is the use the ORDER BY clause in every stored procedure – irrespective of whether the application can sort the data independently or not. There have been many posts written on this topic, and I wanted to do my bit to spread awareness by busting this myth.

The truth is that the ORDER BY clause consumes precious server resources. As we know, the ORDER BY clause is applied after the select. What I mean by this is that once the result set of a query is generated, SQL Server will then perform the sort. In order to do the sort operation, SQL Server will need to use some temporary disk space from the tempdb and also consume processor time. Just imagine the amount of resources being consume to service hundreds of concurrent connections – each requiring a sorted output.

Here’s a simple example, executed by keeping the “Actual Execution plan” ON:

USE AdventureWorks2012;
GO
--Query without ORDER BY
SELECT hredh.BusinessEntityID,
       hre.JobTitle,
       hredh.DepartmentID,
       hrd.Name,
       hredh.StartDate,
       hredh.EndDate,
       hredh.ShiftID,
       hrs.Name
FROM HumanResources.Shift AS hrs 
INNER JOIN HumanResources.EmployeeDepartmentHistory AS hredh ON hredh.ShiftID = hrs.ShiftID
INNER JOIN HumanResources.Employee AS hre ON hredh.BusinessEntityID = hre.BusinessEntityID
INNER JOIN HumanResources.Department AS hrd ON hredh.DepartmentID = hrd.DepartmentID;
GO

USE AdventureWorks2012;
GO
--Query with ORDER BY
SELECT hredh.BusinessEntityID,
       hre.JobTitle,
       hredh.DepartmentID,
       hrd.Name,
       hredh.StartDate,
       hredh.EndDate,
       hredh.ShiftID,
       hrs.Name
FROM HumanResources.Shift AS hrs 
INNER JOIN HumanResources.EmployeeDepartmentHistory AS hredh ON hredh.ShiftID = hrs.ShiftID
INNER JOIN HumanResources.Employee AS hre ON hredh.BusinessEntityID = hre.BusinessEntityID
INNER JOIN HumanResources.Department AS hrd ON hredh.DepartmentID = hrd.DepartmentID
ORDER BY hredh.DepartmentID;
GO

image

As can be seen from the execution plan above, the query with the ORDER BY consumes more processing time (it increased from 46% to 54% of the batch) due to the introduction of the SORT operator.

Conclusion

Most applications only need pre-sorted output in about 20% of the cases. In all other cases, the sort operations should be avoided – SQL Server works best when processing the entire set of records, so why sort them?

Until we meet next time,

Be courteous. Drive responsibly.