Category Archives: Imported from BeyondRelational

These posts are imported from my old blog page: http://beyondrelational.com/modules/2/blogs/77/nakuls-blog.aspx

#0314-SQL Server-Identify records with special characters using regular expressions


One of my team mates was recently faced with an interesting problem – a few records were inserted/updated into the database via some backend data append and bulk insert tasks. These records apparently had some special characters which was causing the application to crash. While a application code fix was being developed, tested and deployed it was essential to be able to identify the bad data and clean it up as a stop-gap arrangement. The problem was that the offending characters were not available on the keyboard! (i.e. they were “special” characters). I was approached to help with this task and this is an account of a solution that I came up with.


Demo


To begin with the demo, let us assume that we have a table with a text column that has both – “bad” and “good” data. To simulate the bad data, I am using special characters from the ASCII table which are not generally visible on the console (e.g. a vertical tab and others). Characters like Space and Horizontal tabs are quite normal in almost all documents and are therefore considered to be valid characters for the purposes of this demo.


Once the table has been created and test data created, we go about the task of identifying the offending records. To do so, we simply build a lookup table which would help us create a string of “bad” characters which can be used as part of a regular expression.


That’s it! Once the regular expression is ready, it’s a matter of writing a simple SELECT statement to identify the records matching the expression.


The complete script, with the output is shown below:

USE tempdb;
GO
–Create the test table
IF OBJECT_ID(‘tempdb..#hiddenCharacterValues’,’U’) IS NOT NULL
DROP TABLE #hiddenCharacterValues;
GO

CREATE TABLE #hiddenCharacterValues
(RecordId INT IDENTITY(1,1),
RecordValue VARCHAR(25)
);
GO

–Insert some test data into the table
INSERT INTO #hiddenCharacterValues
(RecordValue)
VALUES (‘StandardSpace’ + CHAR(32)), –Good
(‘GroupSeparator’ + CHAR(29)), –Bad
(‘HorizontalTab’ + CHAR(9)), –Good
(‘VerticalTab’ + CHAR(11)), –Bad
(‘ExtendedASCII’ + CHAR(176)); –Bad
GO

–Build a comma separated list of known “bad characters”
DECLARE @regExString VARCHAR(MAX);
DECLARE @invalidCharLookUp TABLE (InvalidChars VARCHAR(2));
INSERT INTO @invalidCharLookUp (InvalidChars)
VALUES (CHAR(11)),
(CHAR(29)),
(CHAR(176));

SELECT @regExString = COALESCE(@regExString,”)
+ iclu.InvalidChars
FROM @invalidCharLookUp AS iclu;

–Finally select records which contain these characters
SELECT hcv.RecordId,
hcv.RecordValue
FROM #hiddenCharacterValues AS hcv
WHERE hcv.RecordValue LIKE ‘%[‘ + @regExString + ‘]%’;
GO


image


Now that the records are identified, they can either be manually updated or deleted (as required by the business).


Related Posts



  • Using Regular Expressions with CHECK constraints [Link]

Are there any other methods that you would use (or are using) to address such an issue? Please do let me know.


Until we meet next time,


Be courteous. Drive responsibly.

#0313 – SQL Server – sp_help returns double the field length for Unicode characters


Recently, I was asked a good question by an intern. As he was going through our database design, he noticed that the column lengths of character columns were coming out to be exactly the double of what was mentioned in our documentation. When he approached me, I asked him just one question – “What are you using to validate the column length?” As expected the answer was – “The System stored procedure – sp_help”. The question therefore is:

Why would sp_help report a column length that is double than what is expected for a string column value?

The answer lies in knowing whether a particular character column supports Unicode or not. When working with Unicode data, if the collation code page does not use double-byte characters, the underlying storage requirement is two times the string length defined when defining the column.

An Example

The AddressLine1 and AddressLine2 fields in the Person.Address table of the AdventureWorks2012 sample database are defined to be 60 characters in length (see screenshot below), but when checked using sp_help, they show the length as being 120.

image

USE AdventureWorks2012 ;
GO
SELECT  DATABASEPROPERTYEX(DB_NAME(), 'Collation') 
        AS DBCollationName ;
GO

sp_help [Person.Address] ;
GO

The results of this query are shown below, which clearly show the doubled length value.

image

The reason for this discrepancy is that sp_help does not report the number of characters, but reports on the maximum number of bytes that the column can occupy.

In this case, both the AddressLine1 and AddressLine2 columns are 60 characters in length, but because the column is a NVARCHAR column, it occupies 120 bytes which is reported by the sp_help system stored procedure.

Further Reading

Until we meet next time,

Be courteous. Drive responsibly.

#0312 – SQL Server – List of SQL 2008 R2 and SQL 2012 SP1 based cumulative updates


As some of you may be aware, Microsoft has released some important performance and stability updates to both SQL Server 2008 R2 (CU9) and SQL Server 2012 SP1 (CU7) via the cumulative updates release vehicle in the last couple of months.

As I raised IT tickets for their download and installation on our development servers, someone from the team asked me a question:

SQL Server CUs are not part of the Microsoft update that downloads patches on “patch Tuesday”. How do you know when Microsoft releases cumulative updates for download?

The answer is quite simple – Microsoft publicly announces the availability of the CUs on their website!

If you want to remain up-to-date with the latest of cumulative patches, please feel free to visit the following pages every now and then:

  • The SQL Server 2008 R2 builds that were released after SQL Server 2008 R2 Service Pack 2 was released [Link]
  • The SQL Server 2012 builds that were released after SQL Server 2012 Service Pack 1 was released [Link]

Please note that as soon as a service pack is released, subsequent CUs will be listed on a new page. Hence, a new page will be available if and when SQL Server 2012 SP2 will be released. This is because of Microsoft’s patch naming strategy which will make the next CU to be SQL 2012 SP2 CU1.

Finally, if you ever misplace these links just type the following in your favourite search engine (it has always worked for me!) – “List of SQL Server 2012 SP1 cumulative updates” (update the string to have the required SQL Server version and patch level as necessary).

Further Reading:

  • Naming schema for Microsoft SQL Server software update packages [Link]

Until we meet next time,

Be courteous. Drive responsibly.

#0311 – SQL Server – Database files upgraded from version nnnn to mmm – why databases cannot be restored backwards?


Recently, we were upgrading our SQL Server instances to the latest cumulative upgrades from Microsoft. All our development and QA environments were upgraded. After a couple of days, I received the following query from one of the interns:



I’m restoring the backup of an existing database. During the restore, the Results pane in the SQL Server Management Studio displays the following message. Am I doing anything wrong?


Converting database ‘FileVersionUpgrade’ from version 661 to the current version 706.
Database ‘FileVersionUpgrade’ running the upgrade step from version 661 to version 668.
…..
Database ‘FileVersionUpgrade’ running the upgrade step from version 705 to version 706.


This message is perfectly normal and simply indicates that the database being restored was backed up in a version of SQL Server that has a major version (e.g. major version of SQL Server 2008 is 10, that of SQL 2012 is 11 and so on) different from the one to which it is being restored or is on a different patch level.


In fact, the database files have an underlying  structure that is unique to the version of SQL Server (major versions may also have major changes to the file structure). When a database is restored, SQL Server upgrades the underlying file structure step-by-step from the supported source version to the destination version – which are the informational messages that are seen on the SSMS Messages tab.


Unfortunately, no documentation exists on Books On Line that would provide a mapping between the file version number and the corresponding SQL Server version. One can build such a table by running the following query against the master database for each SQL Server version:

DBCC DBINFO(‘master’) WITH TABLERESULTS

The upgrade in the internal file structure is also the reason why a database backup taken on a newer release/patch level of SQL Server cannot be restored on a previous release/patch level of SQL Server.


As closing notes, please note that:



  • This is seen during attach and restore operations both

  • These file versions are the internal storage format, and have nothing to do with the compatibility level of the database

Further Reading:



  • DBCC DBINFO WTIH TABLERESULTS – In which SQL Server version was this database created? [Link]
  • Restore databases from a backup file across a network [Link]

Until we meet next time,


Be courteous. Drive responsibly.

#0310 – SQL Server – CONCAT_NULL_YIELDS_NULL property


In the world of electronics and even in case of databases, we know that a NULL is different from a 0 value or a blank string. A NULL stands for undefined value, however a 0 or a blank string is some definite value.


I was recently asked how to prevent string concatenation operations with NULL from resulting into NULL.


When I first heard about the question, my response was – “NULL and blank strings have separate meanings. Why would you want to treat these values as one and the same during a concatenation operation?” I was told that the script had to generate a single address field by concatenating Address Lines 1 & 2, City, State, Postal Code and related information. Because fields like the Address Line 2 may be NULL, then the default behaviour would be that the entire result will become NULL.


The team were looking for validating one of the options that they had come up with, which is what prompted this post.


Let’s take an example:

USE AdventureWorks2012;
GO
SELECT pp.FirstName,
pp.LastName,
pbea.AddressTypeID,
pa.AddressLine1 AS AddressLine1,
pa.AddressLine2 AS AddressLine2,
pa.AddressLine1 + CHAR(10) +
pa.AddressLine2 + CHAR(10) +
psp.StateProvinceCode + ‘ – ‘ +
psp.CountryRegionCode AS PersonAddress
FROM HumanResources.Employee AS hre
INNER JOIN Person.Person AS pp ON hre.BusinessEntityID = pp.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress AS pbea ON hre.BusinessEntityID = pbea.BusinessEntityID
INNER JOIN Person.Address AS pa ON pbea.AddressID = pa.AddressID
INNER JOIN Person.StateProvince AS psp ON pa.StateProvinceID = psp.StateProvinceID;
GO

The AddressLine2 column in the Person.Address table is NULL for a lot of the records in the AdventureWorks2012 sample database. Do note that AddressLine1 is not a NULL value. The screenshot below shows the default behaviour of SQL Server when we execute the query provided above.


image


Changing concatenation behaviour of NULL at the query level


To change the behaviour of string concatenation with respect to NULL, one can use the SET option – CONCAT_NULL_YIELDS_NULL to OFF. Here’s a quick modification to the script above that demonstrates the usage of this SET option.

USE AdventureWorks2012;
GO

SET CONCAT_NULL_YIELDS_NULL OFF

SELECT pp.FirstName,
pp.LastName,
pbea.AddressTypeID,
pa.AddressLine1 AS AddressLine1,
pa.AddressLine2 AS AddressLine2,
pa.AddressLine1 + CHAR(10) +
pa.AddressLine2 + CHAR(10) +
psp.StateProvinceCode + ‘ – ‘ +
psp.CountryRegionCode AS PersonAddress
FROM HumanResources.Employee AS hre
INNER JOIN Person.Person AS pp ON hre.BusinessEntityID = pp.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress AS pbea ON hre.BusinessEntityID = pbea.BusinessEntityID
INNER JOIN Person.Address AS pa ON pbea.AddressID = pa.AddressID
INNER JOIN Person.StateProvince AS psp ON pa.StateProvinceID = psp.StateProvinceID;

–Safety Check – turn CONCAT_NULL_YEILDS_NULL back ON
SET CONCAT_NULL_YIELDS_NULL ON
GO


The result of the query execution is shown below.


image


Changing concatenation behaviour of NULL at the database level


If the entire database requires that concatenation with a NULL should yield NULL, then it would be a good idea to also set the option on the database to OFF using the ALTER DATABASE statement shown below:

ALTER DATABASE AdventureWorks2012 SET CONCAT_NULL_YIELDS_NULL OFF
GO

The Correct Way


Personally, I would not go with either of the two options provided above for two reasons:



  1. At the end of the day, NULL is different from a 0 or blank string. Both have distinct meanings and must not be mixed with each other
  2. The second reason is that if we look at Books-On-Line, it clearly tells us that:


In a future version of SQL Server CONCAT_NULL_YIELDS_NULL will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.


The method that I consider to be the correct way in addressing the NULL concatenation problem is to use the ISNULL or COALESCE function as shown in the query below.

USE AdventureWorks2012 ;
GO

SELECT pp.FirstName,
pp.LastName,
pbea.AddressTypeID,
pa.AddressLine1 AS AddressLine1,
pa.AddressLine2 AS AddressLine2,
pa.AddressLine1 + CHAR(10) +
ISNULL(pa.AddressLine2, ”) + CHAR(10) +
psp.StateProvinceCode + ‘ – ‘ +
psp.CountryRegionCode AS PersonAddress
FROM HumanResources.Employee AS hre
INNER JOIN Person.Person AS pp ON hre.BusinessEntityID = pp.BusinessEntityID
INNER JOIN Person.BusinessEntityAddress AS pbea ON hre.BusinessEntityID = pbea.BusinessEntityID
INNER JOIN Person.Address AS pa ON pbea.AddressID = pa.AddressID
INNER JOIN Person.StateProvince AS psp ON pa.StateProvinceID = psp.StateProvinceID;
GO


Further Reading



  • CONCAT_NULL_YIELDS_NULL – [Books On Line Link]

Until we meet next time,


Be courteous. Drive responsibly.