Tag Archives: Best Practices

Articles related to performance and usability best practices in Microsoft SQL Server.

#0356 – SQL Server – Search with the LIKE operator and the underscore (_) wildcard character


String searches in most applications are based on the “contains” or “begins-with” paradigms – which are implemented by the use of the LIKE logical operator. The LIKE operator allows for the use of wildcard characters (underscore being one of them), which is the part that most people unknowingly overlook when they have been working with T-SQL for quite some time.

One such incident happened the other day at work when I was asked by a colleague to help him out with a query that appeared correct, but failed to give the expected results. For the sake of brevity, I have condensed the issue into a test scenario below:

DECLARE @testTbl TABLE (Value VARCHAR(50));

INSERT INTO @testTbl (Value)
VALUES ('SQL 2012'),
('SQL_2014');

SELECT * FROM @testTbl WHERE Value LIKE 'SQL_20%';

image

As can be seen in the screenshot above, the results are not what one would expect them to be at the first glance. One would have expected it to return only the string “SQL_2014” and yet the T-SQL query is returning both the records.

Normally, data coming in via flat-files from raw-data collection systems would require such kinds of string searches and manipulations. A well designed system would have these static values as a look-up so that all references look at the same version of the string value

Solution(s)

There are not one, but two possible solutions to choose from to work around this issue.

Using square brackets ([]) around the wildcard character

Because we want to consider the underscore as a character and not as a wildcard, the easiest thing to do would be to surround it with square brackets, similar to what is shown in the query below.

SELECT * FROM @testTbl WHERE Value LIKE 'SQL[_]20%';

image

Using the ESCAPE keyword

The other option, which I have discussed in one of my earlier posts, is to use the ESCAPE keyword. The ESCAPE keyword works on the basis of a user specified escape sequence. When encountered, the query engine will simply ignore the wildcard character and treat it as a normal character.

SELECT * FROM @testTbl WHERE Value LIKE 'SQL*_20%' ESCAPE '*';

image

Conclusion

The moral here is that one must always remember the basics – this thumb rule not only applies to software, but in everything that we go about doing in our day to day lives.

The other key learning that I encourage everyone to take from this is that Occam’s Razor is real – from the multiple explanations that a problem may have, the explanation with the least number of assumptions should be selected as the answer. In this case, the simplest theory was –  human error, which was indeed the case.

References

Until we meet next time,
Be courteous. Drive responsibly.

#0347 – SQL Server – Basics – A SELECT that does not fetch anything also does not SET anything


In today’s post, I go all the way back to the basics because of a logical bug I saw in production code recently.

It is always said to be a good practice to initialize variables within procedures so that you know what the default value is and what are valid and invalid values for a variable. For example, if the procedure does not have to deal with NULLs, then it is a good idea to initialize the variables as NULL so that a simple = NULL check is sufficient to tell whether we still have a default, invalid value or not.

The code below attempts to fetch the vacation hours for a given employee. The variables have default values defined, so it follows the variable initialization best practice mentioned above. In all probability, static code analysis/review would have never caught a logical bug hidden in the code below which raises it’s head when the code is run with an incorrect employee Id.

USE AdventureWorks2012;
GO

DECLARE @employeeVacationHours INT = -1;
DECLARE @employeeId INT;

--To be filled in by user
SELECT @employeeId = 777;

SELECT @employeeVacationHours = VacationHours
FROM HumanResources.Employee
WHERE BusinessEntityID = @employeeId;

IF (@employeeVacationHours < 0)
PRINT 'Employee has invalid vacation hours.';

SELECT @employeeVacationHours AS VacationHours;

/* RESULT:

VacationHours
-------------
-1

*/

In this case, the SELECT statement which is supposed to fetch the vacation hours for the employee will not return any records. Because of this, the variable will continue to preserve it’s value and will not reset to the default value or NULL.

A SELECT statement that does not fetch anything also does not SET anything.

The scenario mentioned here is a simplified one – in the production code, this logical bug caused the procedure to go down a completely different code path causing various other “random” issues.

The solution to this problem? In this example, it’s quite simple – change the initialization value for the @employeeVacationHours variable to NULL and then check for NULL after the SELECT operation is complete.

Until we meet next time,

Be courteous. Drive responsibly.

#0343 – SQL Server – String values and effect on DateTime objects


One of the things that I was frequently told during my schooling was to initialize the variables that I use in my programs and scripts. The obvious benefit of having such mentors is that it has now become a matter of habit for me to initialize every variable in my SQL scripts.

One script required me to use the DATETIME datatype and I added an initialization to the variable explicitly. The script was simple enough and I had it all setup as pseudocode on my little notepad. However, in translating my thoughts from the notepad to the SQL script, I made a fatal error – I ended up initializing the DATETIME variable with a string. What follows is an account of the rather painful realization about the impact of my error.

Normally, when we attempt to assign a string value to a DATETIME variable, it results into an error.

DECLARE @myDate DATETIME;
SET @myDate = 'abcd';

SELECT @myDate;
GO

Immediately, we receive the following error:

Msg 241, Level 16, State 1, Line 2
Conversion failed when converting date and/or time from character string.

However, supplying valid dates as string does work.

--Passing valid dates as strings does work
DECLARE @myDate1 DATETIME, 
        @myDate2 DATETIME;
SET @myDate1 = '08/25/2014'
SET @myDate2 = 'August 25, 2014';

SELECT @myDate1 AS MyDate1, 
       @myDate2 AS MyDate2;
GO

/* RESULTS:
MyDate1                 MyDate2
----------------------- -----------------------
2014-08-25 00:00:00.000 2014-08-25 00:00:00.000
*/

However, I had initialized the variable with an empty, i.e. blank string. This does not return any error. However, it does do something different – something that might cause you to go around in circles unless maybe the code is reviewed by another set of eyes (which is what happened in my case) or after you take a much deserved break.

This is what I had done:

DECLARE @myDate DATETIME;
SET @myDate = '';

And the result was something that threw me off because all subsequent calculations that I had done on that particular date variable were returning incorrect results. After a couple of minutes and some help in reviewing the code again, I realized that:

Initializing a DATETIME, DATE or TIME variable with empty string sets it to 01/01/1900 (default value for DATE & TIME data-types).

Initialization with an empty string had initialized my date variable with the value – 00hrs on 01/01/1900, which is the default value for DATE & TIME data-types.

DECLARE @myDate DATETIME;
SET @myDate = '';

SELECT @myDate AS MyDate;
GO

/* RESULT
MyDate
-----------------------
1900-01-01 00:00:00.000
*/

NOTE: Attempting to use an empty string as default value for DATE and TIME data-types in the ISNULL() and COALESCE() functions also results into 01/01/1900 being returned as the default value.

Moral of the story

  • Always initialize all variables – it serves a dual purpose by both preventing and (as in this case) identifying potential errors
  • Be careful about the value being used for the initialization – it is possible that one may end up with an undesired effect of an incorrect initialization
  • It’s always good to have code reviews!

Further Reading

  • DATE and TIME data-types [MSDN Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0341 – SQL Server – Random “String or binary data would be truncated” errors during bulk data loads


Interfacing multiple systems is always a great source of learning. In the various integrations that I have been a part of over the years, one of the many things that I have learnt is that no two systems have the same world view of the data that they store.

To take a very simple example, assume that we are interfacing a home-grown warehousing system with a 3rd party front desk PoS system. One of the first problems that most people will run into is that the supported length for product name is different – the warehousing system might be accepting a product name of let’s say 60 characters whereas the PoS may have a limit of 50 characters.

Integrations need to take care of this sort of a mismatch, but it’s not always easy. Sometimes, the interface specifications are vague or in case of home-grown legacy systems, possibly non-existent. In most enterprise integrations, a pre-production run is often done with a backup of the existing production data. When doing a data load from one system to another, it is quite possible that we run into situations wherein the tests run fine, but a production run fails with the following error:

Msg 8152, Level 16, State 14, Line 17
String or binary data would be truncated.

As the error suggests, it is what can very simply be termed as a buffer overflow – we are attempting to insert a larger string into a smaller container (table column). Because the data was the same, the question that comes up is:

Why was the “String or binary data would be truncated” error random and not reproducible at will?

Connection Settings – ANSI_WARNINGS

The answer to the puzzle lies in the connection settings for the SET option – ANSI_WARNINGS.

If the ANSI_WARNINGS is set to OFF, attempting to insert a longer string into a smaller column automatically truncates the string. The following test confirms the behaviour:

USE tempdb;
GO

--Create Test table
IF OBJECT_ID('dbo.AnsiWarningsTest','U') IS NOT NULL
DROP TABLE dbo.AnsiWarningsTest;
GO

CREATE TABLE dbo.AnsiWarningsTest (FourtyCharacterTest VARCHAR(40));
GO

--Default value, resetting for safety
SET ANSI_WARNINGS OFF;

--Check current value of ANSI_WARNINGS
IF (@@OPTIONS & 8) = 8
PRINT 'ANSI_WARNINGS are ON';
ELSE
PRINT 'ANSI_WARNINGS are OFF';

--Attempt to insert data longer than available column value
--This should work fine
INSERT INTO dbo.AnsiWarningsTest (FourtyCharacterTest)
SELECT REPLICATE('a',100);
GO

SELECT AnsiWarningsTest.FourtyCharacterTest
FROM dbo.AnsiWarningsTest;
GO

As can be seen from the screenshot below, the only 40 characters were successfully inserted into the table – dbo.AnsiWarningsTest. The string truncation was handled automatically by Microsoft SQL Server.

image

Now, we turn ANSI_WARNINGS OFF and repeat the same test.

--Set ANSI_WARNINGS to ON
SET ANSI_WARNINGS ON;

--Check current value of ANSI_WARNINGS
IF (@@OPTIONS & 8) = 8
PRINT 'ANSI_WARNINGS are ON';
ELSE
PRINT 'ANSI_WARNINGS are OFF';

--Attempt to insert data longer than available column value
--This SHOULD result into an exception
INSERT INTO dbo.AnsiWarningsTest (FourtyCharacterTest)
SELECT REPLICATE('b',100);
GO

Attempting to execute this immediately results into the following error in the “Messages” tab of the SSMS.

ANSI_WARNINGS are ON
Msg 8152, Level 16, State 14, Line 12
String or binary data would be truncated.

Conclusion

When working with bulk data imports, it is important to know the differences in the storage design of both systems. Once known, mismatches that may result into potential string termination errors should be explicitly handled by the SUSBTRING clause.

Further Reading

  • ANSI_WARNINGS [MSDN Link]
  • SUBSTRING [MSDN Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0170-SQL Server-Deprecated Features-Column Alias defined by a string enclosed in quotation marks


I recently wrote a piece about the impact of table and column aliases on query performance and plan cache size. As I was writing the post, I recalled that there are multiple ways in which column aliases can be defined in a T-SQL query, depending upon the developer’s preference. However, one of these methods is marked for deprecation in one of the future releases of Microsoft SQL Server (it is still a valid method for SQL Server 2012, so there is no cause of worry in the immediate future).

The available methods for column aliasing and their status is shown in the table below:

Format Status
‘column_alias’ = expression Deprecated
expression AS column_alias Active
expression AS ”column_alias” Active
expression AS [column_alias] Active
[column_alias] = expression Active
expression AS “column_alias” Active

All of these methods are demonstrated in the query below:

USE AdventureWorks2012;
GO

SELECT                                            --           Format            ;   Status
    "Employee Birth Date" = Employee.BirthDate,   -- "column_alias" = expression ; Deprecated
    Employee.HireDate AS JoiningDate,             -- expression AS column_alias  ; Active
    Employee.BusinessEntityID AS "EmployeeId",    -- expression AS "column_alias"; Active
    Employee.OrganizationLevel AS [Org. Level],   -- expression AS [column_alias]; Active
    [Salary Flag] = Employee.SalariedFlag,        -- [column_alias] = expression ; Active
    Employee.SickLeaveHours AS "SickHours",       -- expression AS “column_alias"; Active
    "VacationHours" = Employee.VacationHours      -- "column_alias" = expression ; Deprecated
FROM HumanResources.Employee AS Employee;
GO

My recommendations

My recommendations around any such feature is:

  • To have one standard for the entire organization/product
  • The element adopted as the standard should not have been marked for deprecation for at least the next 2 major releases of Microsoft SQL Server
  • Finally, and the most important consideration is that the feature should help the team become more productive and efficient

I personally use either expression AS [column_alias] OR [column_alias] = expression methods.

Before you leave, do share your preferred format for column aliasing.

Until we meet next time,

Be courteous. Drive responsibly.