Recently, I encountered an interesting question in one of the forums:
What is logical data integrity?
The person who posted the question was reading about SQL Server and databases in general, when this term was encountered. Because the answer to this question can help clarify one’s understanding of data design concepts, I thought it would also make a very interesting interview question as well.
Today, I try to describe that data integrity is.
What is data integrity?
Data is a critical part of any business. But, data by itself holds no value. For data to be information of business value, it needs to be valid with respect to the business domain.
A piece of data may be perfectly acceptable from the physical design perspective, but may be still be invalid for the domain.
Let’s take an example – a rate of 2000 is perfectly acceptable for an integer. That is physical data integrity – the value is valid with respect to the physical design of the database. But, if we are talking about an application that captures and analyzes patient/medicinal data, the rate of 2000 is totally invalid and indicates some sort of logical bug/corruption.
Other examples would be a meeting end date that’s less than the meeting start date or a business/person without a name.
A data point may not be acceptable within the business rules defined for a domain. Similarly, what’s valid as a data point for one domain may be invalid for another domain. Ensuring that your database only accepts valid values with respect to your domain is what I call “logical data integrity”.
Types of Data Integrity
Logical data integrity can be enforced in two ways:
Declarative Data Integrity
If data integrity is enforced via the data model (implemented via the Data-Definition-Language, i.e. DDL), it is declarative data integrity. One would enforce declarative integrity via the elements of the table definition:
- Appropriate Data-Types
- In our example for the medical domain, it would limit the possibility of corruption if a TINYINT is used to store the heart rate instead of an INT
- Primary Keys
- Avoid the insertion of duplicate data!
- Foreign Keys
- Ensures that all references are known (it is a valid primary key in another table)
- Default, Check, Unique and Not-NULL constraints
- Unique and Not-NULL constraints help maintain uniqueness and avoid insertion of unknown (NULL) data
- Usage of default constraints ensure that by default unknown (NULL) values are replaced by valid default values
- Check constraints help ensure that data meets the valid range defined by the business (e.g. a check constraint would help ensure that the meeting end date is greater than or equal to the start date)
Procedural Data Integrity
Legacy applications (I have worked on a few that match this description) which were originally developed in the days of flat-file databases, often used procedural code to enforce data integrity.
When these were migrated to Microsoft SQL Server, the integrity was enforced via stored procedures and triggers to avoid re-engineering the database structure and changing the application code to match the new structure.
Data integrity enforced via code, i.e. via stored procedures, triggers and/or functions is called procedural data integrity.
My take: Procedural code can be disabled, fail or have bugs. This may cause the application code to generate bad/invalid data rather than prevent it.
I believe procedural data integrity is acceptable as long as it is used as a “fail-safe” mechanism. The primary mechanism to ensure logical data integrity should be declarative in nature, in my humble opinion.
The above is my take on logical data integrity. I welcome your thoughts on the subject in the space below.
Until we meet next time,
Be courteous. Drive responsibly.