Actual Number of Rows are not always accurate

Today’s blog posting is a quite interesting one, because I want to show you a concrete example where in an Actual Execution Plan the Actual Number of Rows are WRONG! Yes, you have read correctly: I’m talking here about the Actual Number of Rows, and not about the Estimated Number of Rows, which are always somehow wrong, because they are only estimated during the Cardinality Estimation.

Counting rows in the wrong way

A few weeks ago I had worked with a customer, and they had a really interesting phenomenon: SQL Server has returned a lot of rows, but when they looked into the Execution Plan, the Actual Number of Rows were a lot higher. You don’t trust me? Look at the following Actual Execution Plan within SQL Server Management Studio:

Counting rows in the wrong way...

SQL Server returned here within SQL Server Management Studio 110561 rows, but the Actual Number of Rows within the Actual Execution Plan are much higher: 118968!

Wow, this is a quite interesting behavior. And I don’t think that this behavior is just by-design. The problem lies in a Filtered Non-Clustered Columnstore Index that was introduced with SQL Server 2016. You can create a Filtered Non-Clustered Columnstore Index in an OLTP workload to support so-called Realtime Data Analytics Scenarios. Instead of creating a dedicated Data Warehouse database you perform your Analytics workload directly on your OLTP tables, which can be also changed concurrently.

This means now that a Non-Clustered Columnstore Index is also changeable since SQL Server 2016. And in that behavior lies the problem: when SQL Server calculates and returns you the Actual Number of Rows, the deleted rows from the Non-Clustered Columnstore Index are also considered. And therefore the Actual Number of Rows in the Execution Plan are just higher as the real row count that is returned from the query.

The following T-SQL code shows a simple scenario with which you can reproduce this scenario.

-- Create a table copy
SELECT * INTO Sales.SalesOrderDetail2 FROM Sales.SalesOrderDetail
GO

-- Create a Non-Clustered ColumnStore Index for the "cold" data partition
CREATE NONCLUSTERED COLUMNSTORE INDEX idx_ColdData ON Sales.SalesOrderDetail2(ProductID)
WHERE ModifiedDate < '20140601'
GO

-- Let's delete some rows
DELETE FROM Sales.SalesorderDetail2
WHERE ModifiedDate >= '20140501' AND ModifiedDate < '20140601'
GO

-- These rows are not logically deleted
SELECT * FROM sys.column_store_row_groups
WHERE object_id = OBJECT_ID('Sales.SalesOrderDetail2')
GO

-- Uses the Non-Clustered ColumnStore Index to access the data
SELECT ProductID FROM Sales.SalesOrderDetail2
WHERE ModifiedDate < '20140601'
GO

-- Clean up
DROP TABLE Sales.SalesOrderDetail2
GO

Summary

Never, ever trust anyone – especially not a piece of software! I find this behavior quite funny, because until I encountered this specific scenario I always thought that the Actual Number of Rows are somehow calculated on-the-fly during the Query Execution. But it seems (at least in combination with a Non-Clustered Columnstore Index) that this is not really always the case.

Thanks for your time,

-Klaus

3 thoughts on “Actual Number of Rows are not always accurate”

  1. Sunil Agarwal

    When rows are deleted from NCCI, they transition into a ‘delete buffer’, an internal structure. Please try running ‘alter index idx_ColdData ON Sales.SalesOrderDetail2 reorganize’. This should transition rows to ‘delete bitmap’. Typically, this happens in background after certain threshold number of rows. Please check if you see the ‘actual’ number of rows as expected

  2. Sunil Agarwal

    The rows in the delete buffer get filtered outside of columnstore scan. Please dbcc traceon(8666, -1) to see this hidden anti-semijoin

Leave a Comment

Your email address will not be published. Required fields are marked *

Do you want to master PostgreSQL like an expert?

PostgreSQL for the SQL Server Professional

Live Training on April 2 – 3 for only EUR 1490 incl. 20% VAT until February 15, afterwards EUR 1790 incl. 20% VAT