#0189-SQL Server-Primary Keys without clustered indexes

32 Replies

One of my favourite interview questions at an intermediate level has always been: “Can I have a primary key on a table that does not contribute to the clustered index?”. Of course, it’s not always the same question – there are multiple variants, such as :

Can a table have a primary key that does not participate in a clustered index?
Does designating a column(s) as a primary key automatically create a clustered index on the table based on the key column(s)?
Can a clustered index be created using a column(s) other than the primary key of the table?

The answer that most candidates come up with is that there is a direct dependency between the primary key and the clustered index – the primary key has to be the set of columns on which one wishes to create a clustered index.

Contrary to popular belief, the correct answer is “Yes, a table may have a clustered index defined on a column other than the primary key of the table”. It’s just that most developers are so used to the default design and coding mechanisms that generally the thought of going against the flow may not come to the mind.

Please note that this post is not about the recommended practices for creation of clustered index and primary keys, but is intended to achieve an understanding of the fact that having a primary key does not automatically imply a clustered index in all situations.

Demo

As supporting proof, allow me to create a simple scenario. As part of the demonstration, I would be executing the following steps:

Create a test table without any keys or indexes defined
Define a primary key on the test table, with an explicit NONCLUSTERED keyword added to the definition
Define a CLUSTERED index on the table on a column that’s not the primary key
Check the details of the indexes created using sys.indexes catalog view

USE tempdb
GO

--Safety Check
IF OBJECT_ID('pkTest') IS NOT NULL
    DROP TABLE pkTest
GO

--Create test table definition
CREATE TABLE pkTest (Id INT NOT NULL IDENTITY(1,1),
                     DummyId INT NOT NULL,
                     KeyName VARCHAR(20)
                    )
GO

--Notice the use of NONCLUSTERED 
ALTER TABLE pkTest
    ADD CONSTRAINT pk_pkTest_pkTestId PRIMARY KEY NONCLUSTERED (Id)
GO

--Now create a clustered index on a column other than the Primary Key
CREATE CLUSTERED INDEX idx_pkTest_DummyId ON pkTest (DummyId)
GO

As you can see, we created the Primary Key with a keyword – NONCLUSTERED. This should create a non-clustered index for the primary key and a clustered index for the DummyId column. Let’s run the following query to check:

--Check the types of Indexes that have been created
SELECT si.object_id,
       OBJECT_NAME(si.object_id) AS ObjectName,
       si.name,
       si.index_id,
       si.type,
       si.type_desc,
       si.is_unique,
       si.is_unique_constraint,
       si.is_primary_key
FROM sys.indexes AS si
WHERE si.object_id = OBJECT_ID('pkTest')
GO

Query execution yields the following results:

object_id	ObjectName	name	index_id	type	type_desc	is_unique	is_unique_constraint	is_primary_key
853578079	pkTest	idx_pkTest_DummyId	1	1	CLUSTERED	0	0	0
853578079	pkTest	pk_pkTest_pkTestId	2	2	NONCLUSTERED	1	0	1

Conclusion

A clustered index can be defined on any column of a table as long as the column satisfies the criteria for a clustered index. Having a primary key does not imply that clustered index will exist on the primary key under all situations.

Reference

Clustered Index Design Guidelines (BOL/MSDN): http://msdn.microsoft.com/en-us/library/ms190639(v=sql.105).aspx

Until we meet next time,

Be courteous. Drive responsibly.

32 thoughts on “#0189-SQL Server-Primary Keys without clustered indexes”

sqlcommitted August 23, 2012 at 1:06 AM

Even I ask the same question during interview.

Primary Key and Clustered Index are two different things. Developers get confused because they usually misunderstand the following line “On a table when a Primary Key defined by default SQL Server creates a Unique Clustered Index on those column(s) where primary key is defined”

As we know that we can’t have 2 clustered index on a table, When we create a Primary key where a Clustered index is already exist in that case , SQL Server creates a Unique Non Clustered Index.

Sometime I ask a different version of same topic question? i.e. What will happen if I wanted to create a Primary on a column on which already a clustered index exist?

Thanks & Regards,
Sandip Pani

[1]: http://SQLcommitted.com

LikeLike

Reply ↓
Nakul Vachhrajani August 23, 2012 at 1:41 AM

Thank-you for sharing your views, Sandip. I do like the nice twist to the question. Thank-you for taking the time out and reading this post.

LikeLike

Reply ↓
Madhivanan August 23, 2012 at 6:10 AM

However there is a difference. When you create a table with column with Primary key defined, by default the column which is defined as Primary key will have clustered index. Now see what happens when you create this table

CREATE TABLE pkTest (Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
DummyId INT NOT NULL,
KeyName VARCHAR(20)
)

Column Id is automatically associated with a clustered index. Your case is valid only when you alter the table to have primary key constraint

LikeLike

Reply ↓
sqlcommitted August 23, 2012 at 6:31 AM

@Madhivanan,
You are right,What you mentioned here is the default behavior.

Kindly refer the **Conclusion** section, Here It is mentioned that “Having a primary key does not imply that clustered index will exist on the primary key under all situations”.

Thanks & Regards,
Sandip Pani
[link text][1]

[1]: http://SQLcommitted.com

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 23, 2012 at 8:45 AM

@Madhivanan try this:

CREATE TABLE pkTest
(
[id] [int],
[DummyId] [int],
[KeyName] [varchar](20),
CONSTRAINT [PK__pkTest__Id] PRIMARY KEY NONCLUSTERED ( [id] )
)

There is no need for an ALTER TABLE, you can create a non-clustered primary key directly in the CREATE TABLE statement.

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 23, 2012 at 8:45 AM

Here’s the real question: Why have a clustered index on a column or columns OTHER than the Primary Key?

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 23, 2012 at 2:05 PM

> A clustered index can be defined on
> any column of a table as long as the
> column satisfies the criteria for a
> clustered index

What exactly are the critieria for clustered indexes?

LikeLike

Reply ↓
vmvadivel August 23, 2012 at 9:40 PM

I have posted similar stuff in 2006 @ http://vadivel.blogspot.in/2006/03/primary-keys-without-clustered-index.html …

LikeLike

Reply ↓
Nakul Vachhrajani August 24, 2012 at 12:40 PM

Thank-you for sharing @Vadivel!

LikeLike

Reply ↓
Nakul Vachhrajani August 25, 2012 at 9:20 AM

@Marc: Criteria for clustered indexes can be found in the references section of my post. The link is repeated here for your kind reference: [http://msdn.microsoft.com/en-us/library/ms190639(v=sql.105).aspx][1]

[1]: http://msdn.microsoft.com/en-us/library/ms190639(v=sql.105).aspx

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 25, 2012 at 9:40 AM

@Nakul, there are no “criteria” listed, only recommendations or situations where a clustered index should be considered.

LikeLike

Reply ↓
chojrak11 August 27, 2012 at 2:15 AM

@Marc

> Here’s the real question: Why have a
> clustered index on a column or columns
> OTHER than the Primary Key?

Well, for example, because primary key is multicolumn and large, so you risk inefficient INSERTs, many page splits, and having large, inefficient clustering key in every nonclustering index on the table. Acquaint yourself with the concept of [surrogate keys][1].

[1]: http://en.wikipedia.org/wiki/Surrogate_key

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 27, 2012 at 9:00 AM

@chojrak: After working with SQL Server for the past 18 years, including 5 years at Microsoft, I’m very familiar with the concepts (and worst practices) around primary keys, candidate keys and surrogate keys. I’ve spent a good portion of my career improving the performance of people who blindly create an [int] IDENTITY(1, 1) based prirmary key on every table

If your primary key is clustered, the risks you list are valid. If the primary key is nonclustered, none of the risks you list are valid.

Create your clustered index to satisfy your most common sorting requirements. Create your primary key to satisfy your most relevant uniqueness constraints. If they are the same column (or columns), use a clusterer primary key. If they are not, the most common sorting requirement with the least volitile data should be the clustered index.

LikeLike

Reply ↓
chojrak11 August 27, 2012 at 9:50 AM

@Marc

thus you just answered your very question. Offtopic – if possible, I’d like to know more about problems with those “[int] IDENTITY(1, 1) based prirmary key on every table”. I have to confess that, although not blindly, I use them from time to time, and until now successfully 🙂

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 27, 2012 at 10:50 AM

@Chojrak11: my question was rhetorical… a question designed to get the audience to think. Here are some more rhetorical questions:

Problems with primary keys based on [int] IDENTITY(1,1)

Lets’s say that you have a customer list, which is most often sorted by name.

Ask yourself:

A) What does a sorting operation on [CustomerName] look like when the primary key is clustered and based on [CustomerId]?
B) What does a sorting operation on [CustomerName] look like when the primary key is nonclustered and based on [CustomerID] and there is a clustered index on [CustomerName]?

Both A and B will work. Which will work faster?

Which will work faster when retrieving records based on a customer name (a lookup)?

Now let’s say that you have a link table that links Vendors to the Products they sell. Products can be provided by multiple Vendors and Vendors sell multiple Products.

CREATE TABLE [Products]
(
[ProductId] [int] IDENTITY (1, 1),
[Description] [varchar](max)
)

CREATE TABLE [Vendors]
(
[VendorId] [int] IDENTITY (1,1),
[Name] [varchar](100)
)

CREATE TABLE [lnk_Vendors_Products]
(
[recordId] [int] IDENTITY(1,1),
[fkVendorId] [int],
[fkProductId] [int],
[StartDate] [date], — date vendor starting offering product
[EndDate] [date] — date vendor stopped offering product
)

Is it going to be best to create a clustered primary key on [lnk_Vendors_Products].[recordId] with a uniqueness constraint on ([fkVendorId], [fkProductId]) or just have a clustered primary key directly on ([fkVendorId], [fkProductId])?

In order to keep key proliferation to a minimum (any children of [lnk_Vendors_Products], should reference the primary key [recordId]. But the clustered index should go on ([fkVendorId], [fkProductId]). This allows children to reference only the [recordId], but allow for fast retrieval based on [fkVendorId].

Note: if you are searching for Products more often than Vendors, the clustered index should go on [fkProductId], [fkVendorId]; rather than the other way around.

LikeLike

Reply ↓
Nakul Vachhrajani August 27, 2012 at 12:30 PM

@Marc Jellinek: Thank-you very much for the questions. They do get me to think hard and think more – which is always exciting. Also, thank-you very much for sharing some of the nuances around key design.

Regarding your note:

> Note: if you are searching for
> Products more often than Vendors, the
> clustered index should go on
> [fkProductId], [fkVendorId]; rather
> than the other way around.

I am sure you remember, but for the benefit of others I would like to refer a previous post here which would highlight one of the many reasons behind the note: [SQL Server – T-SQL – Best Practice – Composite Primary Keys – Performance (and accuracy) impact][1]

This has been a very, very interesting discussion. Thank-you, all.

[1]: http://beyondrelational.com/modules/2/blogs/77/posts/11379/sql-server-t-sql-best-practice-composite-primary-keys-performance-and-accuracy-impact.aspx

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 27, 2012 at 1:20 PM

@Nakul, I’m not sure I understand the reference to the blog post about composite primary keys. For that matter, I don’t understand what best practice it recommends (other than referencing ALL columns in a primary key when making a join)

LikeLike

Reply ↓
Nakul Vachhrajani August 27, 2012 at 2:10 PM

@Marc: Yes, referencing ALL columns of the composite key is definitely a take-away and the other (I agree, it’s not so obvious) point was that switching the order of the columns does have an impact on the way SQL Server would use the index defined on the key.

LikeLike

Reply ↓
chojrak11 August 28, 2012 at 1:50 AM

@Marc thank you for detailed answer. A lot of additional rhetorical questions in it 🙂

So, this is an additional thing to consider while designing a table. In case of clustered keys on most often sorted columns that aren’t ever-increasing on INSERT, we gain performance on /some/ queries, but one must be sure that amount of INSERTs (and associated splits) will not kill overall performance and not cause locks on high concurrency. Some performance can be gained by properly selecting fill factor, but then we increasing # of I/Os for reads. That said such a key may possibly be better for larger tables that are mostly read and sometimes updated, but for smaller transactional tables heavily INSERTed/DELETEd I’d use small, monotonic clustering key. But as always in SQL world, it’d be best to try and measure.

I’ll do my homework and analyze this subject matter thoroughly.

Thanks again, I’d upvote you more times, but it doesn’t let me.

LikeLike

Reply ↓
a.diniz August 28, 2012 at 4:05 AM

Returning to the orignal theme of “default behaviour” for a moment… Adding a primary key to a table which is already clustered creates a new non-clustered
index by default.

— The NONCLUSTERED keyword is optional in the DDL statement:
ALTER TABLE pkTest
ADD CONSTRAINT pk_pkTest_pkTestId PRIMARY KEY /* NONCLUSTERED */ (Id)
GO

This is true **even when the primary key is added to the clustering key**:

— A redundant nonclustered index is created on the clustering key, DummyId:
ALTER TABLE pkTest
ADD CONSTRAINT pk_pkTest_pkTestId PRIMARY KEY /* NONCLUSTERED */ (DummyId)
GO

As you’d expect, you can’t “promote” an already-existing clustering key to a primary key without dropping it first. eg:

ALTER TABLE pkTest
ADD CONSTRAINT pk_pkTest_pkTestId PRIMARY KEY CLUSTERED (DummyId)
GO

Fails with error:

Msg 1902, Level 16, State 3, Line 3
Cannot create more than one clustered index on table ‘pkTest’. Drop the existing clustered index ‘idx_pkTest_DummyId’ before creating another.
Msg 1750, Level 16, State 0, Line 3
Could not create constraint. See previous errors.

LikeLike

Reply ↓
sqlcommitted August 28, 2012 at 5:15 AM

@a.dimiz

Again the fundamental is same. See my first comment.

Primary Key and Clustered Index are two different thing.

When we define primary key on a table It goes through 2 stages. One is constraint creation and then Index creation. So When you define a primary key on a column where already Clustered index is exist it will create a non clustered index on the same column.

Think of a composite key. If you define a clustered index on two columns and then You are defining Primary key in one of the column , It has to create a index on that column.

Thanks,
Sandip Pani

[1]: http://SQLcommitted.com

LikeLike

Reply ↓
a.diniz August 28, 2012 at 11:50 AM

@sqlcommitted: I was expanding on Nakul’s post about default behaviour, in particular implicit CLUSTERED/NONCLUSTERED primary key constraint.

LikeLike

Reply ↓
a.diniz August 29, 2012 at 3:50 AM

Another spin on the interview question:

A database contains an empty table called SomeClusteredData and defines the field, ClusteredKey int IDENTITY.
What type of index is created when you execute the following batch:

ALTER TABLE SomeClusteredData
ADD CONSTRAINT SomeClusteredDataPKClusteredKey PRIMARY KEY (ClusteredKey);
GO

The answer: it depends. If you don’t have appropriate rights on the table, an error is thrown and no index is created! Assuming you have rights and a clustered index already exists on SomeClusteredData at the time the batch is run, a new **unqiue, non-clustered index** is created with the key, ClusteredKey. Conversely, the batch will create a **clustered index** with clustering key, ClusteredKey, if SomeClusteredData is a heap when run by someone with appropriate access.

This question tests several things:

1) the candidates’ knowledge of the default behaviour when creating a primary key constraint.

2) the candidates’ appreciation of object-naming conventions. Object names have no relation to the structure of the object; implying a structure with the object name is bad practice (except in one or two exceptional circumstances).

3) the candidates’ attention to detail and ability to ‘think outside the box’. By considering security, they demonstrate broad vision; they question the scenario that most would take as given.

When interviewing candidates, I believe it’s equally important to test soft skills. Deterministic questions provide a basis for assessing technical knowledge. Formulating slighlty open or ambiguous questions – with the occaisional typo – allows candidates to demonstrate their softer skills (like attention to detail) rather than just making claim to them.

LikeLike

Reply ↓
Michael John August 29, 2012 at 10:00 AM

As stated perviously, primary keys and clustered indexes are two distinct thigns. One has nothing to do with the other. A primary key uniquely identifies a row, a clustered index orders the data.

To state this:
Here’s the real question: Why have a clustered index on a column or columns OTHER than the Primary Key?

indicates no real understanding of relational database design, the SQL database engine, or the query optimizer.

There are many reasons, far too many to list in a forum posting, where the primary key should not be the clustered index.

As for the criteria for creating a clustered index, the stated article is only a GUIDE to point you in the right direction on choosing the proper field(s) in a table for an index. The rules are any column, or set of columns, can be part of an index.

LikeLike

Reply ↓
sqlcommitted August 29, 2012 at 12:45 PM

Agree with Michael

LikeLike

Reply ↓
a.diniz August 30, 2012 at 2:25 AM

I agree with Michael

If candidates demonstrate they don’t know the difference between a primary key and an index (let alone why you might omit the primary key from a clustered key) the question of default behaviour is moot. Unless I was looking to train a junior, I’d politely show them the door.

@Michael: I know I’m a stickler for detail but:

> The rules are any column, or set of columns, can be part of an index.

Is not accurate. Columns of type text, ntext, and image cannot form part of an index. Further restrictions apply to the index’s key (<=900 bytes, up to 16 columns, non-LOB types etc…)

LikeLike

Reply ↓
Michael John August 30, 2012 at 7:05 AM

I stand corrected!!! Thanks! And, to make matters worse, I attempted to created an index with included columns about an hour before I posted this. One of the columns was a binary type!!! Short term memory loss, I guess!!!

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 30, 2012 at 7:55 AM

Why was this reposted, without the comments or discussion?

LikeLike

Reply ↓
Nakul Vachhrajani August 30, 2012 at 9:35 AM

@Marc: I am not sure what went wrong, however, will ask the webmaster to check the logs. I can see most of the comments back now.

LikeLike

Reply ↓
webmaster August 30, 2012 at 10:55 AM

Here is what happened:

Originally, we had comments paging by 10 records. Then we saw that many posts are having more than 10 comments and then it becomes bit un-easy as a reader to look at the latest comments. So we increased the page size to 20.

However, the email notification was assuming to be 10 rows per page and was taking you to page 3, which is empty. So the paging parameter on the query string (when you open it from the email directly) was having the problem. If you open the post directly, you could see all comments.

We just fixed it and currently monitoring it.

LikeLike

Reply ↓
marc_jellinek@hotmail.com August 30, 2012 at 11:15 AM

Thanks

LikeLike

Reply ↓
Guenter August 31, 2012 at 2:10 AM

In Data Wareouse situations, especially when you’re partitioning your cube on date, a clustered index that only includes the date can be very efficient, because it does not produce too much index pages (fast first row) and the cube will always read the whole data for one day, so including more columns does not speed up things. We do this as a default on all OLAP fact tables.

In this case, you might want to include a primary unique index to enforce data integrity (e.g. the list of all surrogate keys in the data) or help while searching the data for reference entries.

LikeLike

Reply ↓