Tag Archives: #TSQL

Articles on T-SQL. This can be a script or a syntax element

#0380 – SQL Server – Basics – Specify Scale and Precision when defining Decimal and Numeric datatypes

I had some interesting conversation during a code review that I was asked to conduct for a simple query that a team had written to support their project monitoring. The team was specializing in quality assurance and had minimal development experience. The team had used variables of decimal data types in their script, but they were declared without any precision or scale. When I gave a review comment on the declaration of variables, I was asked the question:

Does it make a difference if we do not specify scale and precision when defining variables of decimal or numeric datatypes?

I often look forward to such encounters for two reasons:

When I answer their questions, the process reinforces my concepts and learnings
It helps me contribute to the overall community by writing a blog about my experience

When the question was asked, I honestly admitted that I did not have a specific answer other than it was the best practice to do so from a long-term maintainability standpoint. Post lunch, I did a small test which I showed the team and will be presenting today.

The Problem

In the script below, I take a decimal variable (declared without a fixed scale or precision) with value (20.16) and multiply it by a constant number (100) and then by another constant decimal (100.0). If one uses a calculator, the expected result is:

20.16 * 100 = 2016
20.16 * 100.0 = 2016

When we multiply a Decimal (20.16) with another number (100) using the calculator, the result is as expected (2016)

Expected results when we multiply a Decimal with another number using the calculator

However, when we perform the same test via SQL Server, we are in for a surprise:

DECLARE @dVal1 DECIMAL = 20.16;

SELECT (@dVal1 * 100)   AS DecimalMultipliedByAnInteger, 
       (@dVal1 * 100.0) AS DecimalMultipliedByADecimal;
GO

As can be seen from the seen from the results below, we do not get the expected results, but we find that the decimal value was rounded off before the multiplication took place.

Although the test input value is declared as a decimal, the result appears to be based only on the significand, not the mantissa part of the input.

Root Cause

The reason behind this behaviour is hidden in the following lines of the SQL Server online documentation on MSDN (formerly known as “Books-On-Line”) for decimal and numeric data-types available here: https://msdn.microsoft.com/en-us/library/ms187746.aspx.

…s (scale)
The number of decimal digits that will be stored to the right of the decimal point….Scale can be specified only if precision is specified. The default scale is 0…

The real reason however is a few lines below – rounding.

Converting decimal and numeric Data

…By default, SQL Server uses rounding when converting a number to a decimal or numeric value with a lower precision and scale….

What SQL Server appears to be doing here is that when a variable of DECIMAL datatype is declared without a precision and scale value, the scale is taken to be zero (0). Hence, the test value of 20.16 is rounded to the nearest integer, 20.

To confirm that rounding is indeed taking place, I swapped the digits in the input value from 20.16 to 20.61 and re-ran the same test.

DECLARE @dVal1 DECIMAL = 20.61;

SELECT (@dVal1 * 100)   AS DecimalMultipliedByAnInteger, 
       (@dVal1 * 100.0) AS DecimalMultipliedByADecimal;
GO

Now, the result was 2100 instead of 2000 because the input test value of 20.61 was rounded to 21 before the multiplication took place.

Because the test input value was declared as a decimal without precision and scale, rounding took place, resulting in a different result.

By this time, my audience was struck in awe as they realized the impact this behaviour would have had on their project monitoring numbers.

The Summary – A Best Practice

We can summarize the learning into a single sentence:

It is a best practice for ensuring data quality to always specify a precision and scale when working with variables of the numeric or decimal data types.

To confirm, here’s a version of the same test as we saw earlier. The only difference is that this time, we have explicitly specified the precision and scale on our input values.

DECLARE @dVal1 DECIMAL(19,4) = 20.16;

SELECT (@dVal1 * 100)   AS DecimalMultipliedByAnInteger, 
       (@dVal1 * 100.0) AS DecimalMultipliedByADecimal;
GO

When we look at the results, we see that the output is exactly what we wanted to see, i.e. 2016.

Because the test input value was declared as a decimal with precision and scale, no rounding took place and we got the expected result.

#0379 – SQL Server – Basics- Declaring multiple variables in a single statement

Leave a reply

Making a switch between technologies is sometimes difficult and it always helps to establish parallels between them during the learning phase. Recently, I met someone who had worked on object-oriented programming languages like C# and had to start learning T-SQL in order to work on a new Agile project that was coming his way.

In order to help him get started, the first thing I did was to establish a parallel on how to declare new variables in a module/script. Just as one can declare more than one variable in a single statement in C#, one can do so in T-SQL.

This actually came as a surprise to a few of my team-mates, which is why I decided to write it up as a T-SQL basics post.

So, here’s how to declare multiple variables spanning multiple data-types in a single DECLARE statement:

USE tempdb;
GO
DECLARE @iVar1 INT = 10,
        @iVar2 INT = 05,
        @dVar  DECIMAL(19,4) = 10.05,
        @sVar  VARCHAR(20) = 'Ten';

SELECT @iVar1 AS IntegerValue1, 
       @iVar2 AS IntegerValue2, 
       @dVar  AS DecimalValue,
       @sVar  AS StringValue;
GO

Declaring multiple variables and assigning values to them in a single statement

Do keep in mind though that starting SQL Server 2008, the DECLARE statement can generate exceptions if you perform declarations and initialization/assignment in the same statement.

Until we meet next time,
Be courteous. Drive responsibly.

#0378 – SQL Server – Performance – CASE evaluates all the input result expressions

Leave a reply

Recently, I asked to troubleshoot a performance issue with a stored procedure that was being used for reporting purposes. Looking at the execution plan, I realized that while the joins and the filters were as expected, the core bottleneck were sub-queries in the CASE expression. In order to execute the query, SQL Server needs to evaluate all the input result expressions and then return the value in the output result set based on the switch (when expression).

In case one of these input result expressions refer a large table or a table that’s locked, it could compromise the performance of the entire statement – even though the conditions are such that the table is not directly accessed (which is what was happening in our case).

The script below demonstrates the behaviour with an example. In the script, the CASE expression returns the values from one of 3 tables in the AdventureWorks database – Production.Product, Person.Person and Sales.SalesOrderHeader.

USE AdventureWorks2012;
GO

DECLARE @caseSwitch INT = 1;

SELECT CASE @caseSwitch 
            WHEN 1 THEN (SELECT TOP 1 
                                pp.Name
                            FROM Production.Product AS pp
                        )
            WHEN 2 THEN (SELECT TOP 1 
                                per.LastName + ', ' + per.FirstName
                            FROM Person.Person AS per
                        )
            WHEN 3 THEN (SELECT TOP 1 
                                soh.Comment
                            FROM Sales.SalesOrderHeader AS soh
                        )
            ELSE 'Default Value'
       END;
GO

When we execute the script with the “Show Actual Execution Plan” (Ctrl + M) turned on, we can see that all three tables were accessed.

A CASE expressions evaluates all the input result expressions

If this behaviour presents a lot of performance issues in the system, the solution is to re-engineer the way the system is queried such that the required set of data is staged into temporary tables to avoid loading the underlying tables.

Until we meet next time,
Be courteous. Drive responsibly.

#0377 – SQL Server – Msg 206; Operand Type Clash; Return type of a CASE expression follows datatype precedence

1 Reply

When we were working on a particular SSIS package that was extracting data from a legacy SQL Server database. We kept running into meta-data and datatype mismatch issues on a couple of databases. It was later that we realized that we were overlooking a basic aspect of the CASE expression – the return type.

When the CASE expression is evaluated, it is possible that different result expressions (what’s written in the THEN argument) are of different data-types. In such cases, the CASE expression would follow datatype precedence and return the highest datatype.

The example below is a simple CASE expression that evaluates 3 different inputs – an integer, a decimal and a character value. To demonstrate the return type, I will simply use the SELECT…INTO paradigm to create a new table on the fly.

USE tempdb;
GO
--Safety Check
IF OBJECT_ID('[dbo].[sqlTwinsTable]','U') IS NOT NULL
BEGIN
    DROP TABLE [dbo].[sqlTwinsTable];
END
GO

--Use 3 variables of different datatypes and 
--use a CASE expression to return from them
DECLARE @integerValue INT = 8;
DECLARE @decimalValue DECIMAL(19,4) = 20.16;
DECLARE @characterValue VARCHAR(200) = 'Character Value';

DECLARE @caseSwitch INT = 1;

SELECT CASE @caseSwitch WHEN 1 THEN @integerValue
                        WHEN 2 THEN @decimalValue
                        WHEN 3 THEN @characterValue
                        ELSE 'Undefined Value'
       END AS CaseOutput
INTO [dbo].[sqlTwinsTable]
GO

Checking the table definition of the newly created table, we see that it was created with the datatype “decimal” – even though the switch/input expression was such that it returns an integer.

USE tempdb;
GO
--Check the table definition
SELECT isc.TABLE_SCHEMA,
       isc.TABLE_NAME,
       isc.COLUMN_NAME,
       isc.DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS AS isc
WHERE isc.TABLE_SCHEMA = 'dbo'
  AND isc.TABLE_NAME = 'sqlTwinsTable'
GO
--Check the returned value
SELECT CaseOutput
FROM [dbo].[sqlTwinsTable];
GO

377-CASE_ReturnType

As can be seen from the screenshot above, the return type of the CASE expression is the highest precedence datatype from the input result expressions across all the switch branches. Not considering this behaviour may cause issues if the calling application is datatype sensitive.

Word of caution

A classic example of this behaviour causing unexpected issues is when one has incompatible datatypes (which cannot be implicitly converted from one type to another) mixed in the same expression, as in this case.

--Safety Check
IF OBJECT_ID('[dbo].[sqlTwinsTable]','U') IS NOT NULL
BEGIN
    DROP TABLE [dbo].[sqlTwinsTable];
END
GO

--Use 3 variables of different datatypes and 
--use a CASE expression to return from them
DECLARE @integerValue INT = 8;
DECLARE @decimalValue DECIMAL(19,4) = 20.16;
DECLARE @characterValue VARCHAR(200) = 'Character Value';
DECLARE @dateValue DATE = '2016-07-18';

DECLARE @caseSwitch INT = 1;

--The DATE datatype has a higher precedence
--However, all inputs cannot be implicitly converted to DATE
--We will therefore get a data-type conversion error
SELECT CASE @caseSwitch WHEN 1 THEN @integerValue
                        WHEN 2 THEN @decimalValue
                        WHEN 3 THEN @characterValue
                        WHEN 4 THEN @dateValue
                        ELSE 'Undefined Value'
       END AS CaseOutput
INTO [dbo].[sqlTwinsTable]
GO

Msg 206, Level 16, State 2, Line 62
Operand type clash: int is incompatible with date
Msg 206, Level 16, State 2, Line 62
Operand type clash: decimal is incompatible with date

#0376 – SQL Server – Msg 2714: There is already an object named ‘#tableName’ in the database.

Leave a reply

Recently, I came across the same question a couple of times around usage of temporary tables in one of the forums I participate in. The questions revolved around attempting to create/use temporary tables in the various scenarios within the same batch:

Same temporary table name, but a different definition
Create the temporary table conditionally with the same definition multiple times (i.e. trying to create the temporary table only if necessary)
Create the temporary table conditionally with possibly different definitions (i.e. in cases when the definition is unknown, e.g. data dumps/exports, etc)

In all of these cases, users were encountering the following error:

Msg 2714, Level 16, State 1...
There is already an object named '#tableName' in the database.

The scenario can be recreated in your development/study environment using the script provided below.

USE AdventureWorks2012;
GO
--Safety Check 
IF OBJECT_ID('#intermediateDataStore','U') IS NOT NULL
BEGIN
    DROP TABLE #intermediateDataStore;
END
GO

DECLARE @isEmployeeList BIT = 1;

--Generate a list of employees, 
--else generate a list of customer contacts
IF (@isEmployeeList = 1)
BEGIN
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    INTO #intermediateDataStore
    FROM Person.Person AS pp
    INNER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID;
END
ELSE
BEGIN
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    INTO #intermediateDataStore
    FROM Person.Person AS pp
    LEFT OUTER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID
    WHERE pp.BusinessEntityID IS NOT NULL
        AND he.BusinessEntityID IS NULL;
END

SELECT BusinessEntityID,
        FirstName,
        LastName
FROM #intermediateDataStore;
GO

Root Cause

In the context of non-temporary user tables, the root cause is quite simple – one has another table with the same name already created in the database.

However, in the context of temporary tables, root cause of the behaviour is a SQL Server design aspect called “Deferred Name Resolution” (DNR) (it has nothing to do with a medical protocol that shares the same abbreviation).

What is Deferred Name Resolution (DNR)?

If one observes carefully, the statement failed at the time of parsing, and not during execution (one can easily verify this by trying to generate an estimated execution plan, which will also fail).

Here are the sequence of events that caused the script to fail during parsing due to deferred name resolution.

Drop any existing temporary tables with the name #intermediateDataStore
- Compiles OK
Definition of variable @isEmployeeList
- Compiles OK
First SELECT…INTO statement for storing a simple list of employees
- Temporary table does not exist here, and hence is marked for DNR
ELSE block – Second SELECT…INTO statement for storing a simple list of non-employees/contacts
- Temporary table does not exist here, and hence attempt is made to mark for DNR
- However, another temporary table with the same name is already marked for DNR within the same batch, resulting in a conflict

Why is DNR required?

Deferred Name Resolution is not a bug, but a design feature – it allows SQL Server database engine to continue with the parsing of the batch with information that it can only get at runtime.

In the case of temporary tables, they have a slightly different actual name during execution. It is therefore impossible to have an idea about what the actual name of the table will be during compilation, which is why the name resolution needs to be deferred.

This is also the reason why stored procedures may show compilation errors only when executed for the first time, and not during deployment.

The Solution

There are multiple ways to arrive at the solutions to this problem, but all solutions revolve around the following two points:

Either separate the offending statements into different batches OR
If the structure is known to be the same/similar, try to define the structure of the temporary table in advance and just refer the already created instance of the temporary table when required

In our case, it is possible to know the structure of the temporary table in advance and hence the solution for our problem is as below:

USE AdventureWorks2012;
GO
--Safety Check 
IF OBJECT_ID('#intermediateDataStore','U') IS NOT NULL
BEGIN
    DROP TABLE #intermediateDataStore;
END
GO

--Because the structure is known, create the temp table here
--Else, one can try to split the application logic into 
--multiple distinct parts 
CREATE TABLE #intermediateDataStore 
    (BusinessEntityID INT           NOT NULL,
     FirstName        NVARCHAR(100)     NULL,
     LastName         NVARCHAR(100)     NULL
    )

DECLARE @isEmployeeList BIT = 1;

--Generate a list of employees, 
--else generate a list of customer contacts
IF (@isEmployeeList = 1)
BEGIN
    --Rather than using a SELECT...INTO, we are now
    --using an INSERT INTO...SELECT
    INSERT INTO #intermediateDataStore (BusinessEntityID,
                                        FirstName,
                                        LastName
                                       )
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    FROM Person.Person AS pp
    INNER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID;
END
ELSE
BEGIN
    INSERT INTO #intermediateDataStore (BusinessEntityID,
                                        FirstName,
                                        LastName
                                       )
    SELECT pp.BusinessEntityID,
            pp.FirstName,
            pp.LastName
    FROM Person.Person AS pp
    LEFT OUTER JOIN HumanResources.Employee AS he 
        ON pp.BusinessEntityID = he.BusinessEntityID
    WHERE pp.BusinessEntityID IS NOT NULL
        AND he.BusinessEntityID IS NULL;
END

SELECT BusinessEntityID,
        FirstName,
        LastName
FROM #intermediateDataStore;
GO

Conclusion

When I was researching the solution, it actually reminded me of the Single-Responsibility Principle of the SOLID design principles. The DNR problems demonstrated in this example would never have come up if the application had two separate modules – one to report for employees and the other to report for contacts.

Other than the obvious conclusion about the solution to the deferred name resolution message, the hidden conclusion is that design and architectural principles are independent of the programming language and platform – they stand equally true whether you are developing a piece of C# code, or writing a T-SQL stored procedure.

SQLTwins by Nakul Vachhrajani

SQL Server tips and experiences dedicated to my twin daughters.

Tag Archives: #TSQL

#0380 – SQL Server – Basics – Specify Scale and Precision when defining Decimal and Numeric datatypes

The Problem

Root Cause

The Summary – A Best Practice

Further Reading

#0379 – SQL Server – Basics- Declaring multiple variables in a single statement

#0378 – SQL Server – Performance – CASE evaluates all the input result expressions

#0377 – SQL Server – Msg 206; Operand Type Clash; Return type of a CASE expression follows datatype precedence

Word of caution

Further Reading

#0376 – SQL Server – Msg 2714: There is already an object named ‘#tableName’ in the database.

Root Cause

What is Deferred Name Resolution (DNR)?

Why is DNR required?

The Solution

Conclusion

Further Reading

Deferred Name Resolution

Fun with Temporary Tables