I work with academia frequently, and therefore, get a chance to interact with students and experience the issues they face first hand. I recently had a very interesting experience during one of these visits. I will try to present the experience as a story.
One of the students had developed a system which involved some database interaction. As I was reviewing the system, I noticed an issue with the table design. The design that the student had come up with was something like the following:
USE tempdb GO --Create the table, with fixed length columns CREATE TABLE CityMaster (CityId INT IDENTITY (1,1), CityName CHAR(50), StateName CHAR(50), CountryName CHAR(50) ) GO
(The design above is not exactly what he had, but you get the general idea.)
The problem that I had was with his choice of data types. He had chosen a fixed-length data type – CHAR.
I explained that CHAR, being a fixed-length data type, pads a string with trailing blanks when the data is stored to achieve the fixed-length. Variable length data types (e.g. VARCHAR), on the other hand, do not pad trailing blanks, and therefore, have a variable space requirement.
To explain this with an example, the simplest thing for me to do was to insert some test values in his database. Once the test data was ready, I ran the following query, which uses the DATALENGTH() and LEN() functions. For those who came in late,
- DATALENGTH = Returns the number of bytes used to represent an expression
- This can be used to estimate the storage space that SQL Server has taken up for an expression
- LEN = Returns the number of characters of the specified string expression, excluding trailing blanks
USE tempdb GO --Insert test data INSERT INTO CityMaster (CityName, StateName, CountryName) VALUES ('Ahmedabad','Gujarat', 'India'), ('Mumbai' ,'Maharashtra','India'), ('New Delhi','Delhi' ,'India'), ('Bengaluru','Karnataka' ,'India') GO USE tempdb GO --View the length and the data length for the data in the table SELECT CityMaster.CityId, CityMaster.CityName, LEN(CityMaster.CityName) AS CityNameLength, DATALENGTH(CityMaster.CityName) AS CityNameDataLength, CityMaster.StateName, LEN(CityMaster.StateName) AS StateNameLength, DATALENGTH(CityMaster.StateName) AS StateNameDataLength, CityMaster.CountryName, LEN(CityMaster.CountryName) AS CountryNameLength, DATALENGTH(CityMaster.CountryName) AS CountryNameDataLength FROM CityMaster GO
Using the output of the above query, it became easy for me to explain that because CHAR consumes the full 50 characters, the SQL Server consumes a space of 50 + 50 + 50 = 150 bytes irrespective of the amount of data actually being requested for storage.
As a solution, I recreated the table with the following design:
USE tempdb GO --Create the table, with variable length columns CREATE TABLE CityMaster_v2 (CityId INT IDENTITY (1,1), CityName VARCHAR(50), StateName VARCHAR(50), CountryName VARCHAR(50) ) GO
Again, I inserted the test data and ran the length determination queries.
USE tempdb GO --Insert test data INSERT INTO CityMaster_v2 (CityName, StateName, CountryName) VALUES ('Ahmedabad','Gujarat', 'India'), ('Mumbai' ,'Maharashtra','India'), ('New Delhi','Delhi' ,'India'), ('Bengaluru','Karnataka' ,'India') GO USE tempdb GO --View the length and the data length for the data in the table SELECT CityMaster_v2.CityId, CityMaster_v2.CityName, LEN(CityMaster_v2.CityName) AS CityNameLength, DATALENGTH(CityMaster_v2.CityName) AS CityNameDataLength, CityMaster_v2.StateName, LEN(CityMaster_v2.StateName) AS StateNameLength, DATALENGTH(CityMaster_v2.StateName) AS StateNameDataLength, CityMaster_v2.CountryName, LEN(CityMaster_v2.CountryName) AS CountryNameLength, DATALENGTH(CityMaster_v2.CountryName) AS CountryNameDataLength FROM CityMaster_v2 GO
The results spoke for themselves, and he quickly grasped the concept that I was trying to put forward, i.e. variable length datatypes have varying storage requirements, making them more efficient from a storage perspective.
ALTER TABLE…ALTER COLUMN will not help in releasing space from fixed-length data-types
Because the concept was clear, he proceeded with converting his existing table designs to use variable-length data types using the ALTER TABLE…ALTER COLUMN statement.
USE tempdb GO --Changing CHAR to VARCHAR will NOT release space back once the allocations have been done ALTER TABLE CityMaster ALTER COLUMN CityName VARCHAR(50) ALTER TABLE CityMaster ALTER COLUMN StateName VARCHAR(50) ALTER TABLE CityMaster ALTER COLUMN CountryName VARCHAR(50) GO
However, he found that the ALTER TABLE…ALTER COLUMN statement did not make a difference to the storage requirements.
The reason, I explained, was that once SQL Server has added trailing spaces, it is no longer able to distinguish between an intentionally padded and an automatically padded version of the same string (e.g. ‘BeyondRelational.com ‘ (3 spaces) from ‘BeyondRelational.com’ (no spaces)) and therefore, SQL Server will not be able to release the space allocated to the trailing spaces back.
The only option for him was to drop-and-recreate the tables and then re-insert all his test data.
Now that we were all set with the concept, I pointed him to one of Vinod Kumar’s (blog|twitter) posts: http://blogs.extremeexperts.com/2003/10/10/using-varchars/
There are not one, but 2 very good reasons why I shared this experience with the community.
- The prime reason being that this post might be helpful to students trying to understand and explore databases, and inculcate in them a habit of choosing the correct data-types
- I have seen production code with such issues, such as the incorrect choice of data-types. For us in the industry, this is a capital mistake that should not be made, ever
- CHAR & VARCHAR Datatypes: http://msdn.microsoft.com/en-us/library/ms176089.aspx
- DATALENGTH: http://msdn.microsoft.com/en-us/library/ms173486.aspx
- LEN: http://msdn.microsoft.com/en-us/library/ms190329.aspx
Until we meet next time,