Tag Archives: Administration

Articles related to Microsoft SQL Server Administration

Comparison between the index physical stats when the table uses NEWSEQUENTIALID() v/s NEWID(). We can see that NEWID() results in higher fragmentation and consumes higher number of pages on disk when compared to NEWSEQUENTIALID().

#0381 – SQL Server – Table design – Is it better to use NEWID or NEWSEQUENTIALID when defining the key as a UNIQUEIDENTIFIER?


Database schema design involves defining the clustered keys (generally the primary key) on a table, and one of the main decisions to be taken is whether to use a clustered key based on a UNIQUEIDENTIFIER/ROWGUID or an INTEGER?

Generally, an INTEGER is a default choice for the clustered key column. Compared to an INTEGER, a GUID takes up a lot of space (36 characters!). Hence the decision to use UNIQUEIDENTIFIER/ROWGUID depends a lot upon the desired application:

  • Whether the amount of data to be stored in the table is ever going to exceed the limits of  an integer key value?
  • The code which is going to consume the data (it helps if the underlying storage uses keys in the same format as the code, as in the case of the .NET Enterprise Framework)
  • Nature of integration/DR implemented (e.g. External Id keys used in the integrated system, replication, etc)

If the requirements do require that UNIQUEIDENTIFIER is to be used, the next question is:

What is better to use as the default value for a GUID? NEWID or NEWSEQUENTIALID?

NEWSEQUENTIALID() and NEWID() both generate GUIDs, however NEWSEQUENTIALID() has certain advantages:

  • NEWID() generates a lot of random activity, whereas NEWSEQUENTIALID() does not. Hence, NEWSEQUENTIALID() is faster
  • Because NEWSEQUENTIALID() is sequential, it helps to fill the data pages faster

In order words,

NEWID() generates more fragmentation when compared to NEWSEQUENTIALID()

NEWID() generates more fragments

To test this, I ran the following test:

  1. Create two tables – one with NEWID() as the default value for the key and one with NEWSEQUENTIALID()
  2. Ensure that MAXDOP is set to 0 so that I can insert data into the tables in parallel (to simulate inputs into a table from multiple threads)
  3. Repeat this process multiple times
  4. Once the insert is complete, check the average fragmentation on the tables

Allow me to run through the test step-by-step.

The script below creates two tables:

USE tempdb;
GO
SET NOCOUNT ON;
GO
--Safety Check
IF OBJECT_ID('dbo.SequentialIdCheck','U') IS NOT NULL
BEGIN
   DROP TABLE dbo.SequentialIdCheck;
END
GO

IF OBJECT_ID('dbo.NonSequentialIdCheck','U') IS NOT NULL
BEGIN
   DROP TABLE dbo.NonSequentialIdCheck;
END
GO

--Create the tables (SequentialId Check)
CREATE TABLE dbo.SequentialIdCheck (RowId UNIQUEIDENTIFIER NOT NULL 
                                          CONSTRAINT df_SequentialRowId 
                                          DEFAULT NEWSEQUENTIALID(),
                                    ObjectId INT NOT NULL,
                                    RowValue NVARCHAR(200) NULL,
                                    CONSTRAINT pk_SequentialIdCheck
                                    PRIMARY KEY CLUSTERED (RowId)
                                   );
GO

--Create the tables (Non SequentialId Check)
CREATE TABLE dbo.NonSequentialIdCheck (RowId UNIQUEIDENTIFIER NOT NULL
                                             CONSTRAINT df_NonSequentialRowId 
                                             DEFAULT NEWID(),
                                       ObjectId INT NOT NULL,
                                       RowValue NVARCHAR(200) NULL,
                                       CONSTRAINT pk_NonSequentialIdCheck
                                       PRIMARY KEY CLUSTERED (RowId)
                                      );
GO

Now, I will ensure that max degree of parallelism is turned ON.

sp_configure 'show advanced options',1
RECONFIGURE
GO
sp_configure 'max degree of parallelism',0
RECONFIGURE
GO

Next, I will insert some test data. I will repeat the insert multiple times to simulate an extremely large data-set over multiple inserts.

USE tempdb;
GO
--Insert some test data
--Run the insert 5 times
INSERT INTO dbo.SequentialIdCheck (ObjectId, RowValue)
SELECT sao1.[object_id] AS [ObjectId],
       sao1.[name] AS [RowValue]
FROM sys.all_objects AS sao1
CROSS JOIN sys.all_objects AS sao2;

INSERT INTO dbo.NonSequentialIdCheck (ObjectId, RowValue)
SELECT sao1.[object_id] AS [ObjectId],
       sao1.[name] AS [RowValue]
FROM sys.all_objects AS sao1
CROSS JOIN sys.all_objects AS sao2;
GO 5
Beginning execution loop
Batch execution completed 5 times.

Finally, I check the average fragmentation on the tables by checking the fragmentation of the clustered index.

USE tempdb;
GO
--Check the fragmentation
SELECT OBJECT_NAME(ps.[object_id]) AS ObjectName,
       ps.[index_type_desc],
       ps.[avg_fragmentation_in_percent],
       ps.[fragment_count],
       ps.[page_count],
       ps.[database_id],
       ps.[object_id]
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),
                                    OBJECT_ID('dbo.SequentialIdCheck'),
                                    DEFAULT,
                                    DEFAULT,
                                    DEFAULT
                                   ) AS ps;
GO

SELECT OBJECT_NAME(ps.[object_id]) AS ObjectName,
       ps.[index_type_desc],
       ps.[avg_fragmentation_in_percent],
       ps.[fragment_count],
       ps.[page_count],
       ps.[database_id],
       ps.[object_id]
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),
                                    OBJECT_ID('dbo.NonSequentialIdCheck'),
                                    DEFAULT,
                                    DEFAULT,
                                    DEFAULT
                                   ) AS ps;
GO
Comparison between the index physical stats when the table uses NEWSEQUENTIALID() v/s NEWID(). We can see that NEWID() results in higher fragmentation and consumes higher number of pages on disk when compared to NEWSEQUENTIALID().

NEWID() results in higher fragmentation and higher pages consumed on disk.

As can be seen clearly from the screenshot above, we see that the table with the NEWID() default key value has a higher fragmentation value when compared to the table with NEWSEQUENTIALID().

We also see that the table with NEWID() default key value has taken a lot of pages – resulting in more space being occupied on disk.

The underlying reason for this is  that NEWSEQUENTIALID() generates GUIDs that are in sequence for the given batch, whereas the GUIDs generated by NEWID() are random. When used as a clustered key, having values in sequence helps fill the pages faster and reduce fragmentation.

Conclusion

In most cases, an INTEGER based key on a table is sufficient. However, when a GUID is required by design, it is important to keep in mind that using NEWID() causes more fragmentation in the underlying data resulting in poor system performance. Because non-sequential GUIDs cause fragmentation, they are (generally) not a good choice for using as a clustered index key unless it is required to do so by the business/application design.

If NEWSEQUENTIALID() is to be used, please do keep in mind that the keys need to be generated by the database engine making it tricky when using with Entity Frameworks where the key value is required by the code in order to instantiate an entity.

Further Reading

Until we meet next time,
Be courteous. Drive responsibly.

#0378 – SQL Server – Performance – CASE evaluates all the input result expressions


Recently, I asked to troubleshoot a performance issue with a stored procedure that was being used for reporting purposes. Looking at the execution plan, I realized that while the joins and the filters were as expected, the core bottleneck were sub-queries in the CASE expression. In order to execute the query, SQL Server needs to evaluate all the input result expressions and then return the value in the output result set based on the switch (when expression).

In case one of these input result expressions refer a large table or a table that’s locked, it could compromise the performance of the entire statement – even though the conditions are such that the table is not directly accessed (which is what was happening in our case).

The script below demonstrates the behaviour with an example. In the script, the CASE expression returns the values from one of 3 tables in the AdventureWorks database – Production.Product, Person.Person and Sales.SalesOrderHeader.

USE AdventureWorks2012;
GO

DECLARE @caseSwitch INT = 1;

SELECT CASE @caseSwitch 
            WHEN 1 THEN (SELECT TOP 1 
                                pp.Name
                            FROM Production.Product AS pp
                        )
            WHEN 2 THEN (SELECT TOP 1 
                                per.LastName + ', ' + per.FirstName
                            FROM Person.Person AS per
                        )
            WHEN 3 THEN (SELECT TOP 1 
                                soh.Comment
                            FROM Sales.SalesOrderHeader AS soh
                        )
            ELSE 'Default Value'
       END;
GO

When we execute the script with the “Show Actual Execution Plan” (Ctrl + M) turned on, we can see that all three tables were accessed.

A CASE expressions evaluates all the input result expressions

A CASE expressions evaluates all the input result expressions

If this behaviour presents a lot of performance issues in the system, the solution is to re-engineer the way the system is queried such that the required set of data is staged into temporary tables to avoid loading the underlying tables.

Until we meet next time,
Be courteous. Drive responsibly.

#0376 – SQL Server – Msg 2714: There is already an object named ‘#tableName’ in the database.


Recently, I came across the same question a couple of times around usage of temporary tables in one of the forums I participate in. The questions revolved around attempting to create/use temporary tables in the various scenarios within the same batch:

  1. Same temporary table name, but a different definition
  2. Create the temporary table conditionally with the same definition multiple times (i.e. trying to create the temporary table only if necessary)
  3. Create the temporary table conditionally with possibly different definitions (i.e.  in cases when the definition is unknown, e.g. data dumps/exports, etc)

In all of these cases, users were encountering the following error:

Msg 2714, Level 16, State 1...
There is already an object named '#tableName' in the database.

The scenario can be recreated in your development/study environment using the script provided below.

USE AdventureWorks2012;
GO
--Safety Check 
IF OBJECT_ID('#intermediateDataStore','U') IS NOT NULL
BEGIN
    DROP TABLE #intermediateDataStore;
END
GO

DECLARE @isEmployeeList BIT = 1;

--Generate a list of employees, 
--else generate a list of customer contacts
IF (@isEmployeeList = 1)
BEGIN
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    INTO #intermediateDataStore
    FROM Person.Person AS pp
    INNER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID;
END
ELSE
BEGIN
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    INTO #intermediateDataStore
    FROM Person.Person AS pp
    LEFT OUTER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID
    WHERE pp.BusinessEntityID IS NOT NULL
        AND he.BusinessEntityID IS NULL;
END

SELECT BusinessEntityID,
        FirstName,
        LastName
FROM #intermediateDataStore;
GO

Root Cause

In the context of non-temporary user tables, the root cause is quite simple – one has another table with the same name already created in the database.

However, in the context of temporary tables, root cause of the behaviour is a SQL Server design aspect called “Deferred Name Resolution” (DNR) (it  has nothing to do with a medical protocol that shares the same abbreviation).

What is Deferred Name Resolution (DNR)?

If one observes carefully, the statement failed at the time of parsing, and not during execution (one can easily verify this by trying to generate an estimated execution plan, which will also fail).

Here are the sequence of events that caused the script to fail during parsing due to deferred name resolution.

  1. Drop any existing temporary tables with the name #intermediateDataStore
    • Compiles OK
  2. Definition of variable @isEmployeeList
    • Compiles OK
  3. First SELECT…INTO statement for storing a simple list of employees
    • Temporary table does not exist here, and hence is marked for DNR
  4. ELSE block – Second SELECT…INTO statement for storing a simple list of non-employees/contacts
    • Temporary table does not exist here, and hence attempt is made to mark for DNR
    • However, another temporary table with the same name is already marked for DNR within the same batch, resulting in a conflict

Why is DNR required?

Deferred Name Resolution is not a bug, but a design feature – it allows SQL Server database engine to continue with the parsing of the batch with information that it can only get at runtime.

In the case of temporary tables, they have a slightly different actual name during execution. It is therefore impossible to have an idea about what the actual name of the table will be during compilation, which is why the name resolution needs to be deferred.

This is also the reason why stored procedures may show compilation errors only when executed for the first time, and not during deployment.

The Solution

There are multiple ways to arrive at the solutions to this problem, but all solutions revolve around the following two points:

  1. Either separate the offending statements into different batches OR
  2. If the structure is known to be the same/similar, try to define the structure of the temporary table in advance and just refer the already created instance of the temporary table when required

In our case, it is possible to know the structure of the temporary table in advance and hence the solution for our problem is as below:

USE AdventureWorks2012;
GO
--Safety Check 
IF OBJECT_ID('#intermediateDataStore','U') IS NOT NULL
BEGIN
    DROP TABLE #intermediateDataStore;
END
GO

--Because the structure is known, create the temp table here
--Else, one can try to split the application logic into 
--multiple distinct parts 
CREATE TABLE #intermediateDataStore 
    (BusinessEntityID INT           NOT NULL,
     FirstName        NVARCHAR(100)     NULL,
     LastName         NVARCHAR(100)     NULL
    )

DECLARE @isEmployeeList BIT = 1;

--Generate a list of employees, 
--else generate a list of customer contacts
IF (@isEmployeeList = 1)
BEGIN
    --Rather than using a SELECT...INTO, we are now
    --using an INSERT INTO...SELECT
    INSERT INTO #intermediateDataStore (BusinessEntityID,
                                        FirstName,
                                        LastName
                                       )
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    FROM Person.Person AS pp
    INNER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID;
END
ELSE
BEGIN
    INSERT INTO #intermediateDataStore (BusinessEntityID,
                                        FirstName,
                                        LastName
                                       )
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    FROM Person.Person AS pp
    LEFT OUTER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID
    WHERE pp.BusinessEntityID IS NOT NULL
        AND he.BusinessEntityID IS NULL;
END

SELECT BusinessEntityID,
        FirstName,
        LastName
FROM #intermediateDataStore;
GO

Conclusion

When I was researching the solution, it actually reminded me of the Single-Responsibility Principle of the SOLID design principles. The DNR problems demonstrated in this example would never have come up if the application had two separate modules – one to report for employees and the other to report for contacts.

Other than the obvious conclusion about the solution to the deferred name resolution message, the hidden conclusion is that design and architectural principles are independent of the programming language and platform – they stand equally true whether you are developing a piece of C# code, or writing a T-SQL stored procedure.

Further Reading

Deferred Name Resolution

  • Microsoft TechNet: Deferred Name Resolution and Compilation [Link]
  • Microsoft Connect: Creating a temporary table within a stored procedure with the same name as a temporary table created in the calling stored procedure [Link]

Fun with Temporary Tables

  • Temporary Table Names – maximum allowed length [Link]
  • Working with Temporary Tables (L100) [Link]
  • Temporary Tables and the impact of ANSI_NULLS_DFLT_ON [Link]
  • Temporary Tables and Foreign Keys [Link]
  • Temporary Tables – Named Constraints, Msg 2714, 1750 [Link]
  • Stored Procedures and Temporary Tables [Link]

Until we meet next time,
Be courteous. Drive responsibly.

#0375 – SQL Server -Collation conflicts may occur when working with string functions (REPLACE, SUBSTRING, etc)


Recently, I was called upon to troubleshoot an interesting issue that came up with manipulating string termination characters (“\0”) using REPLACE. I wrote about the issue here.

As I was writing about the incident, I realized that if the data conditioning/manipulation script involves staging a “fixed” copy of the data one may end up with collation conflicts if the column collations were not specified as part of the column definition of these staging tables.

Allow me to demonstrate with an example.

Basically what I will do in the script below is create a test table with some manufacturer and product data. I later attempt to combine the ManufacturerName and ProductName into a column called FullName. In order to do this, I stage the fixed/updated data into a staging table and later join that to the main table for the update.

--1. Safety check - drop before we recreate
IF OBJECT_ID('tempdb..#product','U') IS NOT NULL
DROP TABLE tempdb..#product;
GO

IF OBJECT_ID('tempdb..#tempProduct','U') IS NOT NULL
DROP TABLE tempdb..#tempProduct;
GO

--2. Create the table to be fixed
CREATE TABLE #product
(ManufacturerName VARCHAR(50) COLLATE Latin1_General_CI_AS,
ProductName VARCHAR(50) COLLATE Latin1_General_CI_AS,
FullName VARCHAR(100) COLLATE Latin1_General_CI_AS
);
GO

--3. Insert some test data
INSERT INTO #product (ManufacturerName,
ProductName,
FullName)
VALUES ('Microsoft-','SQL Server', NULL),
('Microsoft-','Windows', 'Microsoft Windows'),
('Google','Chrome', 'Google Chrome'),
('Microsoft-','Azure',NULL);
GO

--4. Fix the data and store it to a staging table
SELECT ManufacturerName,
ProductName,
(REPLACE(p.ManufacturerName COLLATE Latin1_General_CI_AI, '-', ' ') + p.ProductName) AS FullName
INTO #tempProduct
FROM #product AS p;
GO

The key points to observe are:

  1. When doing the string manipulation, I explicitly specified a collation that is different from the collation of the original table (this may be required in case you are working on strings that need special handling)
  2. I did not explicitly specify the column definition for the staging table

What happens is that the collation of the output of a string function (REPLACE in this case) is same as that of the input string. This collation in the example is different from the collation of the main table. The temporary staging table is created with this different collation.

--5. Check out the meta-data of both the tables
SELECT pisc.COLUMN_NAME,
pisc.COLLATION_NAME AS SourceCollation,
tpisc.COLLATION_NAME AS StagingCollation
FROM tempdb.INFORMATION_SCHEMA.COLUMNS AS pisc
INNER JOIN tempdb.INFORMATION_SCHEMA.COLUMNS AS tpisc
ON pisc.COLUMN_NAME = tpisc.COLUMN_NAME
WHERE pisc.TABLE_NAME LIKE '#product%'
AND tpisc.TABLE_NAME LIKE '#tempProduct%'
GO

CollationsInMainVsStagingTable

Now, let us see what happens when we try to execute the update:

--6. Try the update and see what happens
UPDATE p
SET p.FullName = tp.FullName
FROM #product AS p
INNER JOIN #tempProduct AS tp
ON p.ManufacturerName = tp.ManufacturerName
AND p.ProductName = tp.ProductName
AND p.FullName <> tp.FullName;
GO
Msg 468, Level 16, State 9, Line 44
Cannot resolve the collation conflict between "Latin1_General_CI_AI" and "Latin1_General_CI_AS" in the not equal to operation.

The update basically fails with a collation conflict error which could have easily been avoided by specifying the column definition (with appropriate collations) for the staging tables.

The problem demonstrated above can be reproduced with other string manipulation functions as well (e.g. SUBSTRING).

The moral of the story: Always follow best practices and specify a column definition when defining a table – permanent, staging or temporary.

Until we meet next time,
Be courteous. Drive responsibly.

Script for this post:

--1. Safety check - drop before we recreate
IF OBJECT_ID('tempdb..#product','U') IS NOT NULL
DROP TABLE tempdb..#product;
GO

IF OBJECT_ID('tempdb..#tempProduct','U') IS NOT NULL
DROP TABLE tempdb..#tempProduct;
GO

--2. Create the table to be fixed
CREATE TABLE #product (ManufacturerName VARCHAR(50) COLLATE Latin1_General_CI_AS,
ProductName VARCHAR(50) COLLATE Latin1_General_CI_AS,
FullName VARCHAR(100) COLLATE Latin1_General_CI_AS
);
GO

--3. Insert some test data
INSERT INTO #product (ManufacturerName, ProductName, FullName)
VALUES ('Microsoft-','SQL Server', NULL),
('Microsoft-','Windows', 'Microsoft Windows'),
('Google','Chrome', 'Google Chrome'),
('Microsoft-','Azure',NULL);
GO

--4. Fix the data and store it to a staging table
SELECT ManufacturerName, ProductName, (REPLACE(p.ManufacturerName COLLATE Latin1_General_CI_AI, '-', ' ') + p.ProductName) AS FullName
INTO #tempProduct
FROM #product AS p;
GO

--5. Check out the meta-data of both the tables
SELECT pisc.COLUMN_NAME, pisc.COLLATION_NAME AS SourceCollation, tpisc.COLLATION_NAME AS StagingCollation
FROM tempdb.INFORMATION_SCHEMA.COLUMNS AS pisc
INNER JOIN tempdb.INFORMATION_SCHEMA.COLUMNS AS tpisc ON pisc.COLUMN_NAME = tpisc.COLUMN_NAME
WHERE pisc.TABLE_NAME LIKE '#product%' AND tpisc.TABLE_NAME LIKE '#tempProduct%'
GO

--6. Try the update and see what happens
UPDATE p
SET p.FullName = tp.FullName
FROM #product AS p
INNER JOIN #tempProduct AS tp ON p.ManufacturerName = tp.ManufacturerName
AND p.ProductName = tp.ProductName
AND p.FullName <> tp.FullName;
GO

--7. Creating the test tables
IF OBJECT_ID('tempdb..#product','U') IS NOT NULL
DROP TABLE tempdb..#product;
GO

IF OBJECT_ID('tempdb..#tempProduct','U') IS NOT NULL
DROP TABLE tempdb..#tempProduct;
GO

#0372 – SQL Server -SSIS – VSTA, Could not load file or assembly ‘Microsoft.VisualStudio.Tools.Applications.Core’


We use SQL Server Integration Services (SSIS) for all of our data movement and upgrades. Recently, when running a particular SSIS package, one of our new servers threw a strange error:

There was an exception while loading Script Task from XML: System.IO.FileNotFoundException: Could not load file or assembly ‘Microsoft.VisualStudio.Tools.Applications.Core’

I was on call and one of the first things I did was to cover the basics – checking out the environment. The SSIS package file (.dtsx) was located on the file system exactly where I had expected it to be and hence, I was stumped to see the “File Not Found” exception. The strange thing was that the same package worked fine when run on the staging environment and also in my local development environment.

After checking out various things and a break, I refocused my attention to reading the error message again. That’s when the words simply popped out at me – “…while loading Script Task from XML…” – the script task is where the problem was. Script tasks are most commonly used to set connections on the connection managers and that’s exactly what we were doing.

The problem

Script tasks in SSIS leverage the Visual Studio Tools For Automation (VSTA). I therefore headed over to Add/Remove Programs to confirm that VSTA is indeed installed. I only had the 64-bit version installed on the affected server. As with all things SSIS, on a 64-bit system, I had expected to see both the x64 and the x86 version of VSTA installed. Below is a screenshot of the Add/Remove programs from one of my test VMs which is a 32-bit environment (I could not get the actual screenshot because it was a production server).

Screenshot showing the installation of VSTA in a 32-bit environment. In a 64-bit environment, one would expect to see 2 entries - one for the 64-bit version and one for the 32-bit version

Sample VSTA installation on a 32-bit environment (non-production)

The Solution

I immediately contacted the on-call IT team, who kindly shared the SQL Server media from which the installation was done. Just navigating to the following path on the media gave me the 32-bit installable for VSTA. We installed 32-bit version of VSTA and the issue was resolved.

Path on the SQL Server media to get VSTA: ..\redist\VSTA\runtime\x86

Have you ever encountered this or other interesting issues in your deployment experience? How did you troubleshoot them? Do share via the blog comments – I would love to know!

Until we meet next time,
Be courteous. Drive responsibly.