Tag Archives: Guidance

Articles intended to provide guidance to the readers based on my past experiences.

#0381 – SQL Server – Table design – Is it better to use NEWID or NEWSEQUENTIALID when defining the key as a UNIQUEIDENTIFIER?

Database schema design involves defining the clustered keys (generally the primary key) on a table, and one of the main decisions to be taken is whether to use a clustered key based on a UNIQUEIDENTIFIER/ROWGUID or an INTEGER?

Generally, an INTEGER is a default choice for the clustered key column. Compared to an INTEGER, a GUID takes up a lot of space (36 characters!). Hence the decision to use UNIQUEIDENTIFIER/ROWGUID depends a lot upon the desired application:

Whether the amount of data to be stored in the table is ever going to exceed the limits of an integer key value?
The code which is going to consume the data (it helps if the underlying storage uses keys in the same format as the code, as in the case of the .NET Enterprise Framework)
Nature of integration/DR implemented (e.g. External Id keys used in the integrated system, replication, etc)

If the requirements do require that UNIQUEIDENTIFIER is to be used, the next question is:

What is better to use as the default value for a GUID? NEWID or NEWSEQUENTIALID?

NEWSEQUENTIALID() and NEWID() both generate GUIDs, however NEWSEQUENTIALID() has certain advantages:

NEWID() generates a lot of random activity, whereas NEWSEQUENTIALID() does not. Hence, NEWSEQUENTIALID() is faster
Because NEWSEQUENTIALID() is sequential, it helps to fill the data pages faster

In order words,

NEWID() generates more fragmentation when compared to NEWSEQUENTIALID()

NEWID() generates more fragments

To test this, I ran the following test:

Create two tables – one with NEWID() as the default value for the key and one with NEWSEQUENTIALID()
Ensure that MAXDOP is set to 0 so that I can insert data into the tables in parallel (to simulate inputs into a table from multiple threads)
Repeat this process multiple times
Once the insert is complete, check the average fragmentation on the tables

Allow me to run through the test step-by-step.

The script below creates two tables:

USE tempdb;
GO
SET NOCOUNT ON;
GO
--Safety Check
IF OBJECT_ID('dbo.SequentialIdCheck','U') IS NOT NULL
BEGIN
   DROP TABLE dbo.SequentialIdCheck;
END
GO

IF OBJECT_ID('dbo.NonSequentialIdCheck','U') IS NOT NULL
BEGIN
   DROP TABLE dbo.NonSequentialIdCheck;
END
GO

--Create the tables (SequentialId Check)
CREATE TABLE dbo.SequentialIdCheck (RowId UNIQUEIDENTIFIER NOT NULL 
                                          CONSTRAINT df_SequentialRowId 
                                          DEFAULT NEWSEQUENTIALID(),
                                    ObjectId INT NOT NULL,
                                    RowValue NVARCHAR(200) NULL,
                                    CONSTRAINT pk_SequentialIdCheck
                                    PRIMARY KEY CLUSTERED (RowId)
                                   );
GO

--Create the tables (Non SequentialId Check)
CREATE TABLE dbo.NonSequentialIdCheck (RowId UNIQUEIDENTIFIER NOT NULL
                                             CONSTRAINT df_NonSequentialRowId 
                                             DEFAULT NEWID(),
                                       ObjectId INT NOT NULL,
                                       RowValue NVARCHAR(200) NULL,
                                       CONSTRAINT pk_NonSequentialIdCheck
                                       PRIMARY KEY CLUSTERED (RowId)
                                      );
GO

Now, I will ensure that max degree of parallelism is turned ON.

sp_configure 'show advanced options',1
RECONFIGURE
GO
sp_configure 'max degree of parallelism',0
RECONFIGURE
GO

Next, I will insert some test data. I will repeat the insert multiple times to simulate an extremely large data-set over multiple inserts.

USE tempdb;
GO
--Insert some test data
--Run the insert 5 times
INSERT INTO dbo.SequentialIdCheck (ObjectId, RowValue)
SELECT sao1.[object_id] AS [ObjectId],
       sao1.[name] AS [RowValue]
FROM sys.all_objects AS sao1
CROSS JOIN sys.all_objects AS sao2;

INSERT INTO dbo.NonSequentialIdCheck (ObjectId, RowValue)
SELECT sao1.[object_id] AS [ObjectId],
       sao1.[name] AS [RowValue]
FROM sys.all_objects AS sao1
CROSS JOIN sys.all_objects AS sao2;
GO 5

Beginning execution loop
Batch execution completed 5 times.

Finally, I check the average fragmentation on the tables by checking the fragmentation of the clustered index.

USE tempdb;
GO
--Check the fragmentation
SELECT OBJECT_NAME(ps.[object_id]) AS ObjectName,
       ps.[index_type_desc],
       ps.[avg_fragmentation_in_percent],
       ps.[fragment_count],
       ps.[page_count],
       ps.[database_id],
       ps.[object_id]
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),
                                    OBJECT_ID('dbo.SequentialIdCheck'),
                                    DEFAULT,
                                    DEFAULT,
                                    DEFAULT
                                   ) AS ps;
GO

SELECT OBJECT_NAME(ps.[object_id]) AS ObjectName,
       ps.[index_type_desc],
       ps.[avg_fragmentation_in_percent],
       ps.[fragment_count],
       ps.[page_count],
       ps.[database_id],
       ps.[object_id]
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),
                                    OBJECT_ID('dbo.NonSequentialIdCheck'),
                                    DEFAULT,
                                    DEFAULT,
                                    DEFAULT
                                   ) AS ps;
GO

Comparison between the index physical stats when the table uses NEWSEQUENTIALID() v/s NEWID(). We can see that NEWID() results in higher fragmentation and consumes higher number of pages on disk when compared to NEWSEQUENTIALID().

NEWID() results in higher fragmentation and higher pages consumed on disk.

As can be seen clearly from the screenshot above, we see that the table with the NEWID() default key value has a higher fragmentation value when compared to the table with NEWSEQUENTIALID().

We also see that the table with NEWID() default key value has taken a lot of pages – resulting in more space being occupied on disk.

The underlying reason for this is that NEWSEQUENTIALID() generates GUIDs that are in sequence for the given batch, whereas the GUIDs generated by NEWID() are random. When used as a clustered key, having values in sequence helps fill the pages faster and reduce fragmentation.

Conclusion

In most cases, an INTEGER based key on a table is sufficient. However, when a GUID is required by design, it is important to keep in mind that using NEWID() causes more fragmentation in the underlying data resulting in poor system performance. Because non-sequential GUIDs cause fragmentation, they are (generally) not a good choice for using as a clustered index key unless it is required to do so by the business/application design.

If NEWSEQUENTIALID() is to be used, please do keep in mind that the keys need to be generated by the database engine making it tricky when using with Entity Frameworks where the key value is required by the code in order to instantiate an entity.

#0378 – SQL Server – Performance – CASE evaluates all the input result expressions

Leave a reply

Recently, I asked to troubleshoot a performance issue with a stored procedure that was being used for reporting purposes. Looking at the execution plan, I realized that while the joins and the filters were as expected, the core bottleneck were sub-queries in the CASE expression. In order to execute the query, SQL Server needs to evaluate all the input result expressions and then return the value in the output result set based on the switch (when expression).

In case one of these input result expressions refer a large table or a table that’s locked, it could compromise the performance of the entire statement – even though the conditions are such that the table is not directly accessed (which is what was happening in our case).

The script below demonstrates the behaviour with an example. In the script, the CASE expression returns the values from one of 3 tables in the AdventureWorks database – Production.Product, Person.Person and Sales.SalesOrderHeader.

USE AdventureWorks2012;
GO

DECLARE @caseSwitch INT = 1;

SELECT CASE @caseSwitch 
            WHEN 1 THEN (SELECT TOP 1 
                                pp.Name
                            FROM Production.Product AS pp
                        )
            WHEN 2 THEN (SELECT TOP 1 
                                per.LastName + ', ' + per.FirstName
                            FROM Person.Person AS per
                        )
            WHEN 3 THEN (SELECT TOP 1 
                                soh.Comment
                            FROM Sales.SalesOrderHeader AS soh
                        )
            ELSE 'Default Value'
       END;
GO

When we execute the script with the “Show Actual Execution Plan” (Ctrl + M) turned on, we can see that all three tables were accessed.

A CASE expressions evaluates all the input result expressions

If this behaviour presents a lot of performance issues in the system, the solution is to re-engineer the way the system is queried such that the required set of data is staged into temporary tables to avoid loading the underlying tables.

Until we meet next time,
Be courteous. Drive responsibly.

#0373 – SQL Server – Interview Questions – Parameters required for server-side paging mechanisms

Leave a reply

For a mid-level SQL developer profile, this is one of my favourite interview questions if their current projects involve some sort of paging mechanism while displaying results on a grid.

What is the minimum number of parameters that your stored procedure needs to implement server-side paging?

Despite the multiple blog posts around this topic, this question surprisingly still stumps most of the developers.

The answer is quite simple – at a basic level, you absolutely need two (2) input parameters on your stored procedure to implement server-side paging:

The page number
The page size, i.e. the number of records to be displayed on each page

Of course, the implementation may vary – one can have multiple additional parameters like the sort column and the sort order, but at the very minimum one needs the two (2) parameters listed above.

For reference, here is a sample implementation (for SQL Server 2008 R2 and SQL Server 2012):

USE AdventureWorks2012 ;
GO
SET NOCOUNT ON ;

DECLARE @pageSize INT = 100 ;
DECLARE @pageNumber INT = 3 ;

--SQL 2008 R2 Method
--Source Data, fetched with paging
;
WITH    SourceDataCTE ( CarrierTrackingNumber, SalesOrderId, TotalShipmentValue, RowNumber )
          AS ( SELECT   sod.CarrierTrackingNumber,
                        sod.SalesOrderId,
                        SUM(sod.LineTotal) AS TotalShipmentValue,
                        ROW_NUMBER() OVER ( ORDER BY sod.CarrierTrackingNumber, sod.SalesOrderId ) AS RowNumber
               FROM     Sales.SalesOrderHeader AS soh
                        INNER JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
               WHERE    soh.ShipDate IS NOT NULL
                        AND sod.CarrierTrackingNumber IS NOT NULL
               GROUP BY sod.CarrierTrackingNumber,
                        sod.SalesOrderId
             )
    SELECT TOP ( @pageSize )
            sdCTE.CarrierTrackingNumber,
            sdCTE.SalesOrderId,
            sdCTE.TotalShipmentValue,
                     sdCTE.RowNumber
    FROM    SourceDataCTE AS sdCTE
    WHERE   sdCTE.RowNumber BETWEEN ( ( @pageNumber * @pageSize ) + 1 )
                            AND     ( ( @pageNumber + 1 ) * @pageSize ) ;

--SQL 2012 Method
SELECT   sod.CarrierTrackingNumber,
        sod.SalesOrderId,
        --SUM(sod.LineTotal) OVER ( ORDER BY sod.CarrierTrackingNumber, sod.SalesOrderId ) AS TotalShipmentValue,
        SUM(sod.LineTotal) AS TotalShipmentValue,
        ROW_NUMBER() OVER ( ORDER BY sod.CarrierTrackingNumber, sod.SalesOrderId ) AS RowNumber
FROM     Sales.SalesOrderHeader AS soh
        INNER JOIN Sales.SalesOrderDetail AS sod ON sod.SalesOrderID = soh.SalesOrderID
WHERE    soh.ShipDate IS NOT NULL
        AND sod.CarrierTrackingNumber IS NOT NULL
GROUP BY sod.CarrierTrackingNumber,
        sod.SalesOrderId
ORDER BY sod.CarrierTrackingNumber
OFFSET (@pageNumber * @pageSize) ROWS
FETCH NEXT (@pageSize) ROWS ONLY
GO

Here’s the result of the code snippet provided above:

Shows the output of the various server-side paging mechanisms available for Microsoft SQL Server

SQL Server Paging Mechanisms

Until we meet next time,
Be courteous. Drive responsibly.

#0370 – SQL Server – Myths – CREATE DATABASE FOR ATTACH_REBUILD_LOG will not work for read-only databases

Leave a reply

In the past, I have written about attaching data files to a SQL Server instance when log files are missing by the use of the FOR ATTACH_REBUILD_LOG clause of the CREATE DATABASE statement. Recently, I was referring the CREATE DATABASE documentation on MSDN. It’s a comprehensive document and and the following line for read-only databases caught my attention.

For a read-only database, the log cannot be rebuilt because the primary file cannot be updated. Therefore, when you attach a read-only database with a log that is unavailable, you must provide the log files, or the files in the FOR ATTACH clause.

This statement was contrary to what I had observed before. Hence, I decided to re-validate the findings via a demo.

The first step, of course is to create a database.

USE [master];
GO
CREATE DATABASE ReadOnlyDBForAttach
ON PRIMARY (NAME = ReadOnlyDBForAttach_Data,
            FILENAME='C:\SQLData\MSSQL12.SQL2014\MSSQL\DATA\ReadOnlyDBForAttach_Data.mdf'),
   FILEGROUP RODefault (NAME = ReadOnlyDBForAttach_RODefault,
              FILENAME='C:\SQLData\MSSQL12.SQL2014\MSSQL\DATA\ReadOnlyDBForAttach_RODefault.mdf')
LOG ON (NAME = ReadOnlyDBForAttach_Log,
        FILENAME='C:\SQLData\MSSQL12.SQL2014\MSSQL\DATA\ReadOnlyDBForAttach_Log.ldf');
GO

As can be seen from the script referenced above, the database we created as 2 data file groups, one is the default PRIMARY filegroup. Because we cannot make the PRIMARY filegroup READ_ONLY, we will be making the other filegroup (RODefault) READ_ONLY. To add complexity, we will also make it the default filegroup, i.e. any new objects created without specifying a filegroup, it will be created in the filegroup marked as default.

Finally, we will also mark the database as READ_ONLY.

USE [master];
GO
ALTER DATABASE ReadOnlyDBForAttach
MODIFY FILEGROUP [RODefault] DEFAULT;
GO

ALTER DATABASE ReadOnlyDBForAttach
MODIFY FILEGROUP [RODefault] READ_ONLY;
GO

ALTER DATABASE ReadOnlyDBForAttach SET READ_ONLY;
GO

Now, let us check out the database and filegroup properties.

USE ReadOnlyDBForAttach;
GO
SELECT 'Database Properties',
       sd.[name],
       sd.is_read_only,
       sd.is_cleanly_shutdown
FROM sys.databases AS sd
WHERE sd.[name] = 'ReadOnlyDBForAttach';

SELECT 'File properties',
       sdf.file_id,
       sdf.type,
       sdf.data_space_id,
       sdf.type_desc,
       sdf.name,
       sdf.is_read_only,
       sdf.is_media_read_only
FROM sys.database_files AS sdf;
GO

Image showing database and database file properties for the read-only database - ReadOnlyDBForAttach

Read Only database and database file properties

Now, let us detach the database, and delete the log file. (NOTE: We are removing the log file for the purposes of this demo only. Please do not do this in your QA or production environments).

USE [master];
GO
EXEC sp_detach_db @dbname = 'ReadOnlyDBForAttach';
GO

Image showing the log file physically removed from the file system

Read Only Database – Log File Removed

Finally, let us attach the database back to the SQL Server instance using the CREATE DATABASE…FOR ATTACH_REBUILD_LOG clause.

USE [master]
GO
CREATE DATABASE [ReadOnlyDBForAttach]
ON  PRIMARY ( NAME = N'ReadOnlyDBForAttach_Data',
                FILENAME = N'C:\SQLData\MSSQL12.SQL2014\MSSQL\DATA\ReadOnlyDBForAttach_Data.mdf')
FOR ATTACH_REBUILD_LOG
GO

File activation failure. The physical file name "C:\SQLData\MSSQL12.SQL2014\MSSQL\DATA\ReadOnlyDBForAttach_Log.ldf" may be incorrect.
New log file 'C:\SQLData\MSSQL12.SQL2014\MSSQL\DATA\ReadOnlyDBForAttach_log.ldf' was created.

As can be seen from the message above, the log file was successfully created and the database was successfully attached to the SQL Server instance.

Let us cross-check the database and file properties again:

USE ReadOnlyDBForAttach;
GO
SELECT 'Database Properties',
       sd.[name],
       sd.is_read_only,
       sd.is_cleanly_shutdown
FROM sys.databases AS sd
WHERE sd.[name] = 'ReadOnlyDBForAttach';

SELECT 'File properties',
       sdf.file_id,
       sdf.type,
       sdf.data_space_id,
       sdf.type_desc,
       sdf.name,
       sdf.is_read_only,
       sdf.is_media_read_only
FROM sys.database_files AS sdf;
GO

Read Only Database and Data File Properties After Attach

Conclusion

Contrary to the MSDN remark, a read-only database can be successfully attached to a SQL Server instance even when the log file does not exist by the use of CREATE DATABASE…FOR ATTACH_REBUILD_LOG
A read-only database becomes a read/write database if it has been attached to the SQL Server using FOR ATTACH_REBUILD_LOG and the log file was rebuilt

My findings above do not agree with the MSDN remark. At the moment, I am inclined to believe that this is a bug in the documentation as I have found a couple of years ago as well (see references). Please do share your views on the same in the post comments.

References/Further Reading

SQL Server Myth: Log files are removed when a database is made READ_ONLY [Link]
Creating a database without the log backup file – Error Msg. 5120 [Link]
CREATE DATABASE…ATTACH_REBUILD_LOG resets database recovery mode to SIMPLE [Link]
Setting database to READ_ONLY does not change the file-group properties [Link]
sp_attach_db v/s CREATE DATABASE FOR ATTACH – Number of files limitation; Msg. 8144 (Procedure has too many arguments specified) [Link]
CREATE DATABASE [MSDN Documentation]

Until we meet next time,
Be courteous. Drive responsibly.

#0369 – Visual Studio Solutions – SSIS and Coded UI projects – encounter “System.ArgumentNullException” and tests do not build successfully

Leave a reply

Recently, we undertook a project to write a couple of automation tools for the utilities that we use to move data around.

The automation solution involved a couple of SSIS packages that created a few data views, generated some key columns and cross-referenced records with predefined static data to facilitate data comparison. In addition, we also wrote a couple of Coded UI tests which would read configuration parameters from a simple Excel file and run through the utilities so that they can run independently without any need of human interaction.

The issue

Unfortunately, it seems that at least in Visual Studio 2012, one cannot have both – an SSIS project and a Coded UI project. This is because the Coded UI tests fail to build with a “System.ArgumentNullException” error.

Here are the simple steps to reproduce the issue:

Launch VS2012
In an new solution, add a new Integration Services project. Leave defaults and save the solution
In this solution, add a new Coded UI Test project. Leave defaults and save the solution. The solution now has the 2 projects as required
Go to Test -> Windows -> Test Explorer
In the solution explorer, right click on the solution and choose “Build Solution”
Build fails unexpectedly with the following error:

System.ArgumentNullException: Value cannot be null.
Parameter name: source
at Microsoft.VisualStudio.Shell.ValidateArg.NotNull[T](T arg, String parameterName)
at Microsoft.VisualStudio.Shell.ValidateArg.NotNullOrEmpty[T](IEnumerable`1 arg, String parameterName)
at Microsoft.VisualStudio.Shell.Interop.VSProjectExtensions.CompareNormalizedPath(String source, String target)
at Microsoft.VisualStudio.Shell.Interop.VSProjectExtensions.GetKeyOutputForIVsOutputGroup(IVsOutputGroup outputGroup)
at Microsoft.VisualStudio.Shell.Interop.VSProjectExtensions.GetProjectOutputPath(IVsProjectCfg2 projectConfig, String groupName)
at Microsoft.VisualStudio.Shell.Interop.VSProjectExtensions.GetKeyOutputForGroup(IVsSolutionBuildManager buildManager, IVsProject project, String groupName)
at Microsoft.VisualStudio.TestWindow.VsAdapters.VsTestContainer.get_Source()
at Microsoft.VisualStudio.TestWindow.VsAdapters.VsTestContainer.CreateTimeStamp()
at Microsoft.VisualStudio.TestWindow.VsAdapters.VsTestContainer..ctor(ITestContainerDiscoverer discoverer, ILogger log, IVsProject project, IServiceProvider serviceProvider, IVsAppContainerUtilities appContainerUtilities)
at Microsoft.VisualStudio.TestWindow.VsAdapters.VsProjectOutputContainerDiscoverer.<get_TestContainers>b__0(IVsProject p)
at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
at System.Linq.Enumerable.WhereEnumerableIterator`1.MoveNext()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at Microsoft.VisualStudio.TestWindow.Controller.TestContainerDiscovererExtension.GetSortedContainers(ITestContainerDiscoverer discoverer)
at Microsoft.VisualStudio.TestWindow.Controller.TestContainerProvider.GetContainersFromDiscoverer(ITestContainerDiscoverer discoverer)

The Workaround

The workaround to build and run the Coded UI tests is to simply unload (removal of the project is not necessary) the SSIS project(s) for the time being. Once the Coded UI tests are done, the SSIS project(s) can be loaded back into the solution before checking it back into source control.

MS Connect Case

While the workaround is simple enough, it can be slightly inconvenient when there are multiple team members who consume the solution. I therefore wrote up a Connect case for this issue (ID# 2194579). Here’s the link: https://connect.microsoft.com/VisualStudio/feedback/details/2194579/solution-with-both-coded-ui-and-integration-services-projects-encounters-system-argumentnullexception-and-tests-do-not-build-successfully

If you encounter the same issue, or if you believe it should be fixed, please “up” vote on the MS connect item linked above.

Until we meet next time,
Be courteous. Drive responsibly.

SQLTwins by Nakul Vachhrajani

SQL Server tips and experiences dedicated to my twin daughters.

Tag Archives: Guidance

#0381 – SQL Server – Table design – Is it better to use NEWID or NEWSEQUENTIALID when defining the key as a UNIQUEIDENTIFIER?

NEWID() generates more fragments

Conclusion

Further Reading

#0378 – SQL Server – Performance – CASE evaluates all the input result expressions

#0373 – SQL Server – Interview Questions – Parameters required for server-side paging mechanisms

#0370 – SQL Server – Myths – CREATE DATABASE FOR ATTACH_REBUILD_LOG will not work for read-only databases

Conclusion

References/Further Reading

#0369 – Visual Studio Solutions – SSIS and Coded UI projects – encounter “System.ArgumentNullException” and tests do not build successfully

The issue

The Workaround

MS Connect Case