Tag Archives: Development

Articles on Microsoft SQL Server development

#0406 – SQL Server – Remember that spaces and blank strings are the same


It was recently brought to my attention that a particular script was passing spaces when it should not. Here’s an example:

DECLARE @spaceCharacter NVARCHAR(1) = N' ';
DECLARE @blankCharacter NVARCHAR(1) = N'';

--Confirm that we are looking at different values
--The ASCII codes are different!
SELECT ASCII(@spaceCharacter) AS ASCIICodeForSpace,
       ASCII(@blankCharacter) AS ASCIICodeForBlankString;

--Compare a blank string with spaces
IF (@spaceCharacter = @blankCharacter)
    SELECT 'Yes' AS IsSpaceSameAsBlankString;
ELSE 
    SELECT 'No' AS IsSpaceSameAsBlankString;
GO

/* RESULTS
ASCIICodeForSpace ASCIICodeForBlankString
----------------- -----------------------
32                NULL

IsSpaceSameAsBlankString
------------------------
Yes
*/

01_Symptom

We then checked the LENGTH and DATALENGTH of both strings and noticed something interesting – the check on the LENGTH was trimming out trailing spaces whereas the check on the DATALENGTH was not.

DECLARE @spaceCharacter NVARCHAR(1) = N' ';
DECLARE @blankCharacter NVARCHAR(1) = N'';

--Check the length
SELECT LEN(@spaceCharacter) AS LengthOfSpace, 
       LEN(@blankCharacter) AS LengthOfBlankCharacter,
       DATALENGTH(@spaceCharacter) AS DataLengthOfSpace, 
       DATALENGTH(@blankCharacter) AS DataLengthOfBlankCharacter;
GO

/* RESULTS
LengthOfSpace LengthOfBlankCharacter DataLengthOfSpace DataLengthOfBlankCharacter
------------- ---------------------- ----------------- --------------------------
0             0                      2                 0
*/

02_LengthAndDataLength

Often, we loose sight of the most basic concepts – they hide in our subconscious. This behaviour of SQL Server is enforced by the SQL Standard (specifically SQL ’92) based on which most RDBMS systems are made of.

The ideal solution for an accurate string comparison was therefore to also compare the data length in addition to a normal string comparison.

DECLARE @spaceCharacter NVARCHAR(1) = N' ';
DECLARE @blankCharacter NVARCHAR(1) = N'';

--The Solution
IF (@spaceCharacter = @blankCharacter) 
   AND (DATALENGTH(@spaceCharacter) = DATALENGTH(@blankCharacter))
    SELECT 'Yes' AS IsSpaceSameAsBlankString;
ELSE 
    SELECT 'No' AS IsSpaceSameAsBlankString;
GO

/* RESULTS
IsSpaceSameAsBlankString
------------------------
No
*/

03_Solution

Further Reading

  • How SQL Server Compares Strings with Trailing Spaces [KB316626]

Until we meet next time,

Be courteous. Drive responsibly.

#0404 – SQL Server – Interview Question – What is logical data integrity?


Recently, I encountered an interesting question in one of the forums:

What is logical data integrity?

The person who posted the question was reading about SQL Server and databases in general, when this term was encountered. Because the answer to this question can help clarify one’s understanding of data design  concepts, I thought it would also make a very interesting interview question as well.

Today, I try to describe that data integrity is.

What is data integrity?

Data is a critical part of any business. But, data by itself holds no value. For data to be information of business value, it needs to be valid with respect to the business domain.

A piece of data may be perfectly acceptable from the physical design perspective, but may be still be invalid for the domain.

Let’s take an example – a rate of 2000 is perfectly acceptable for an integer. That is physical data integrity – the value is valid with respect to the physical design of the database. But, if  we are talking  about an application that captures and analyzes patient/medicinal data, the rate of 2000 is totally invalid and indicates some sort of logical bug/corruption.

Other examples would be a meeting end date that’s less than the meeting start date or a business/person without a name.

A data point may not be acceptable within the business rules defined for a domain. Similarly, what’s valid as a data point for one domain may be invalid for another domain. Ensuring that your database only accepts valid values with respect to your domain is what I call logical data integrity”.

Types of Data Integrity

Logical data integrity can be enforced in two ways:

Declarative Data Integrity

If data  integrity is enforced via the data model (implemented via the Data-Definition-Language, i.e. DDL), it is declarative data  integrity. One would enforce declarative integrity via the elements of the table definition:

  • Appropriate Data-Types
    • In our example for the medical domain, it would limit the possibility of corruption if a TINYINT is used to store the heart rate instead of an INT
  • Primary Keys
    • Avoid the insertion of duplicate data!
  • Foreign Keys
    • Ensures that all references are known (it is a valid primary key in another table)
  • Default, Check, Unique and Not-NULL constraints
    • Unique and Not-NULL constraints help maintain uniqueness and avoid insertion of unknown (NULL) data
    • Usage of default constraints ensure that by default unknown (NULL) values are replaced by valid default values
    • Check constraints help ensure that data meets the valid range defined by the business (e.g. a check constraint would help ensure that the meeting end date is greater than or equal to the start date)

Procedural Data Integrity

Legacy applications (I have worked on a few that match this description) which were originally developed in the days of flat-file databases, often used procedural code to enforce data integrity.

When these were migrated to Microsoft SQL Server, the integrity was enforced via stored procedures and triggers to avoid re-engineering the database structure and changing the application code to match the new structure.

Data integrity enforced via code, i.e. via stored procedures, triggers and/or functions is called procedural data integrity.

My take: Procedural code can be disabled, fail or have bugs. This may cause the application code to generate bad/invalid data rather than prevent it.

I believe procedural data integrity is acceptable as long  as it is used as a “fail-safe” mechanism. The primary mechanism to ensure logical data integrity should be declarative in nature, in my humble opinion.

The above is my take on logical data integrity. I welcome your thoughts on the subject in the space below.

Until we meet next time,

Be courteous. Drive responsibly.