Unique/Non-Unique Clustered Indexes

In the last blog post I have talked about unique/non-unique clustered indexes on a heap table. A table without a clustered index is called a heap table in SQL Server. When you define a clustered index on such a table, the table data gets structured and is therefore referred as clustered table. In this blog post I want to talk about the differences in unique and non-unique clustered indexes, and what are the storage impacts between those 2 types of clustered indexes.

As a prerequisite I assume that you have a basic understanding of clustered indexes, and that you know the difference between heap and clustered tables, and how your data pages are structured when a clustered index is defined on a table.

Let’s start by looking on a unique clustered index. With SQL Server you have several possibilities to define a unique clustered index. The first way – the easy one – is to define a PRIMARY KEY constraint on a column. SQL Server enforces this PRIMARY KEY constraint through the creation of a unique clustered index on that table and that column. The another option is to create a unique clustered index through the CREATE CLUSTERED INDEX statement – but when you don’t specify the UNIQUE property, SQL Server will create a non-unique clustered index by default for you! The following code fragment creates the Customers table that you already know from the previous blog posting, but this time we create a PRIMARY KEY constraint on the column CustomerID. Therefore SQL Server creates a unique clustered index on that table and sorts the data pages in the leaf level according the values in the column CustomerID.

-- Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
(
   CustomerID INT NOT NULL PRIMARY KEY IDENTITY(1, 1),
   CustomerName CHAR(100) NOT NULL,
   CustomerAddress CHAR(100) NOT NULL,
   Comments CHAR(189) NOT NULL
)
GO

-- Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 80000)
BEGIN
   INSERT INTO Customers VALUES
   (
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )

   SET @i += 1
END
GO

After we have identified the index root page (through the use of the DBCC IND command), we can dump out that page with the DBCC PAGE command. In my case the index root page is 775:

DBCC PAGE(UniqueClusteredIndexStructure, 1, 775, 3)
GO

As you can see from the following figure each index record contains the clustered key, in this case the value of the column CustomerID.

image

When you examine the byte by byte representation of a clustered index record, you can see that SQL Server uses here the following bytes:

  • 1 byte: Status Bits
  • n bytes: Clustered Key – in this case 4 bytes
  • 4 bytes: PageID
  • 2 bytes: FileID

As you can see the length of the clustered key has a direct relationship of the length of an index record. This mean as smaller your clustered key is, the more index record can be put onto an index page, and therefore your clustered index will be much more compact and will perform faster and are easier to maintain. When you walk down your clustered index you will see that all intermediate levels have the same storage format as described above. There are no differences on each level, expect the index leaf level, because this level contains your actual logically ordered data pages.

Let's have now a look onto non-unique clustered indexes in SQL Server and how they differ from unique clustered indexes. To demonstrate this kind of indexes, I have just recreated the Customers table and created a non-unique clustered index on that table through the CREATE CLUSTERED INDEX statement:

-- Create a table with 393 length + 7 bytes overhead = 400 bytes
-- Therefore 20 records can be stored on one page (8.096 / 400) = 20,24
CREATE TABLE Customers
( 
   CustomerID INT NOT NULL,
   CustomerName CHAR(100) NOT NULL,
   CustomerAddress CHAR(100) NOT NULL,
   Comments CHAR(181) NOT NULL
)
GO

-- Create a non unique clustered index
CREATE CLUSTERED INDEX idx_Customers_CustomerID
ON Customers(CustomerID)
GO

Finally I have inserted 80.000 records, where the column CustomerID (the clustered key) is not unique anymore:

-- Insert 80.000 records
DECLARE @i INT = 1
WHILE (@i <= 20000)
BEGIN
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )
 
   INSERT INTO Customers VALUES
   (
      @i,
      ‘CustomerName’ + CAST(@i AS CHAR),
      ‘CustomerAddress’ + CAST(@i AS CHAR),
      ‘Comments’ + CAST(@i AS CHAR)
   )

SET @i += 1
END
GO

When you now dump out the root index page of the non-unique clustered index, you get the following result:

image

As you can see, SQL Server returns here an additional column named UNIQUIFIER (key). This column is used by SQL Server to make a non-unique clustered key unique. Behind the scenes it is a 4 byte long integer value starting at 0. E.g. when you have 2 customers with the ID 1380 the first record gets the uniquifier value 0 and the second one gets the uniquifier value of 1. But SQL Server only stores the uniquifier in the navigation structure of an index (all levels above the leaf level), when the uniquifier is not equal to 0. SQL Server only includes uniquifier values of 0 in the navigation structure of a non-unique clustered index, which means that the navigation structure will never store the uniquifier physically. The only place where the uniquifier is stored in a non-unique clustered index is on the data pages, where the actual data records are stored. The following figure shows a data page dump of our clustered index, where you can also see the stored uniquifier.

image

So the only difference between a unique and non-unique clustered index is on the data pages, because when using a non-unique clustered index, SQL Server will use the 4 byte long uniquifier to make them unique, which is a small storage overhead that you have to keep in mind, when working with non-unique clustered indexes. You can download the T-SQL script for this posting here.

In the next posting we will work out the differences between unique/non-unique non-clustered indexes defined on unique clustered indexes. Stay tuned 🙂

-Klaus

Do you want to master PostgreSQL like an expert?

PostgreSQL for the SQL Server Professional

Live Training on April 2 – 3 for only EUR 1490 incl. 20% VAT until February 15, afterwards EUR 1790 incl. 20% VAT