Disclaimer: This is work in progress intended to consolidate information
from various sources. It considers only on-premise SQL Server, for other
platforms please refer to the documentation.
Last updated: 15-Feb-2024
[SQL Server] columnstore indexes (CI)
-
{def} a technology for storing, retrieving and managing data by using a
columnar data format (aka
columnstore)
-
store compressed data on a per-column rather than a per-row basis [5]
- {benefit} designed for analytics and data warehousing workloads
- data warehousing
- {scenario} store fact tables and large dimension tables
-
⇐ tend to require full table scans rather than table seeks
- analytics workloads
-
{scenario} [SQL Server 2016 SP1] can be used for real-time analytics on
operational databases
-
⇐ an updatable nonclustered columnstore index can be created on a
rowstore table
- {benefit} performance increase
- can achieve up to 100x better performance [4]
-
offers an order of magnitude better performance than a rowstore index
- {feature} uses batch mode execution
- improves query performance typically by two to four times
-
have high performance gains for analytic queries that scan large amounts
of data, especially on large tables (>1 million rows)
- {benefit} reduces significantly the data warehouse storage costs
- {feature} data compression
- ⇒ provides high compression rates, typically by 10 times
- ⇒ reduces total I/O from the physical media
-
⇐ queries often select only a few columns from a table
- minimizes or eliminates system I/O bottlenecks
- reduces significantly the memory footprint
- ⇒ query performance can improve
-
because SQL Server can perform more query and data operations in
memory
- {benefit} built in memory
- ⇒ sufficient memory must be available
- {benefit} part of the database engine
- no special hardware is needed
-
{concept}
columnstore
-
{def} data structure logically organized as a table with rows and columns,
and physically stored in a column-wise data format
-
stores values from the same domain which commonly have similar values
-
when a query references a column, then only that column is fetched
from disk [3]
-
⇐ the columns not requested are skipped
- ⇒ they are not loaded into memory
- when a query is executed, the rows must be reconstructed
-
⇒ row reconstruction takes some time and uses some CPU and memory
resources [3]
- [SQL Server 2016] columnstore index on rowstore tables
-
columnstore is updated when data changes in the rowstore table
- both indexes work against the same data
-
{concept}rowstore
-
{def} data that's logically organized as a table with rows and columns,
and physically stored in a row-wise data format
-
⇐ the traditional way to store relational table data
-
refers to a table where the underlying data storage format is either
- a heap
- a clustered index
- a memory-optimized table
-
{concept}
rowstore index
-
performs best on queries that seek into the data, when searching for a
particular value, or for queries on a small range of values
- ⇒ appropriate for transactional workloads
-
because they tend to require mostly table seeks instead of table scans
-
{concept}
rowgroup
-
{def} a group of rows that are compressed into columnstore format at the
same time
-
{constraint} has a maximum number of rows per rowgroup, which is
1,048,576 =2^20 rows
- contains one column segment for every column in the table
- can have more than one delta rowgroup that form the deltastore
-
e.g. when multiple threads create columnstore indexes using parallel
execution plans [5]
-
⇐ each thread will work with its own subset of data, creating
separate rowgroups [5]
-
[partitions] each table partition has its own set of row groups [5]
-
⇐ too many partitions may prevent workloads from benefiting
from a CCI [11]
-
⇐ data aren’t pushed into a compressed columnstore segment until
the rowgroup limit is reached
- {event} rowgroup is compressed
- marked as read-only [16]
-
a compressed rowgroup is considered as fragmented when either
-
row number < rowgroup limit but dictionary size reached the maximum
- nothing can be done to increase the number of rows [15]
- the trim_reason is other than DICTIONARY_SIZE
-
it has nonzero deleted rows that exceeds a minimum threshold [15]
- {event} all data from rowgroup deleted
- transitions from COMPRESSED into TOMBSTONE state
- later removed by the tuple-mover background process
-
{event} rows in the columnstore indexes can be moved to different
locations
- row-id in the nonclustered indexes aren’t updated
-
⇐ the mappings between old and new row locations are stored in an
internal structure (aka mapping index)
- {event} rowgroup build
-
all column data are combined on a per-row group basis, encoded and
compressed [5]
-
the rows within a row group can be rearranged if that helps to achieve
a better compression rate [5]
- {feature} data compression
-
the table is sliced into rowgroups, and each rowgroup is compresses in a
column-wise manner
- the number of rows in the rowgroup must be
- large enough to improve compression rates
- small enough to benefit from in-memory operations
-
having too many small rowgroups decreases columnstore index’s
quality
- uses its own compression mechanism
- ⇒ row or page compression cannot be used on it [3]
- [SQL Server 2016] page compression has been removed
-
⇐ in some cases, page compression disallowed the creation of
columnstore indexes with a very large number of columns [5]
- {feature} compression delay
- computed when a delta rowgroup is closed [7]
-
keeps the ‘active’ rows in delta rowgroup and only transition these rows
to compressed rowgroup after a specified delay [7]
-
⇐ reduces the overall maintenance overhead of NCCI [7]
- ⇒ leads to a larger number of delta rowgroups [7]
-
{best practice} if the workload is primarily inserting data and querying
it, the default COMPRESSION_DELAY of 0 is the recommended option [7]
-
{best practice} [OLTP workload] if > 10% rows are marked deleted in
recently compressed rowgroups, then consider a value that accommodates the
behavior [7]
-
via: create nonclustered columnstore index with (compression_delay= 150)
- {feature} data encoding
-
all values in the data are replaced with 64-bit integers using one of two
encoding algorithms
- {concept} dictionary encoding
-
stores distinct values from the data in a separate structure (aka
dictionary}
- every value in a dictionary has a unique ID assigned [5]
- the ID is used for replacement
- {concept} global dictionary
-
shared across all segments that belong to the same index partition [5]
- {concept} local dictionary
-
created for individual segments using values that are not present in the
global dictionary
- {concept} value-based encoding
-
mainly used for numeric and integer data types that do not have enough
duplicated values [5]
- dictionary encoding would be inefficient [5]
-
converts integer and numeric values to a smaller range of 64-bit
integers in 2 steps
-
{step} [numeric data types] are converted to integers using the
minimum positive exponent (aka magnitude that allows this conversion)
[5]
- {goal} convert all numeric values to integers [5]
-
[integer data types] the smallest negative exponent is chosen that
can be applied to all values without losing their precision [5]
-
{goal} reduce the interval between the minimum and maximum values
stored in the segment [5]
-
{step} the minimum value (aka base value) in the segment is identified
and subtracted it from all other values [5]
- ⇒ makes the minimum value in the segment number 0 [5]
-
after encoding the data are compressed and stored as a LOB allocation
unit
-
{concept}
column segment
- {def} a column of data from within the rowgroup
- is compressed together and stored on physical media
-
SQL Server loads an entire segment to memory when it needs to access its
data
-
{concept}
segment metadata
- store metadata about each segment
- e.g. minimum and maximum values
-
⇐ segments that do not have the required data are skipped [5]
-
{concept}
deltastore
- {def} all of the delta rowgroups of a columnstore index
- its operations are handled behind the scenes
- can be in either states
- {state} open (aka open delta store)
-
accepts new rows and allow modifications and deletions of data
- {state} closed (aka closed data store)
- a delta store is closed when it reaches its rowgroup limit
-
{concept}
delta rowgroup
-
{def} a clustered B-tree index that's used only with columnstore indexes
-
improves columnstore compression and performance by storing rows until the
number of rows reaches the rowgroup limit and are then moved into the
columnstore
- {event} reaches the maximum number of rows
- it transitions from an ‘open’ to ‘closed’ state
-
a closed rowgroup is compressed by the tuple-mover and stored into the
columnstore as COMPRESSED rowgroup
- {event} compressed
-
the existing delta rowgroup transitions into TOMBSTONE state to be
removed later by the tuple-mover when there is no reference to it
-
{concept}
tuple-mover
- background process that checks for closed row group
-
if it finds a closed rowgroup, it compresses the delta rowgroup and
stores it into the columnstore as a COMPRESSED rowgroup
-
{concept}
clustered columnstore index
(CCI)
- is the primary storage for the entire table
- {characteristic) updatable
- has two structures that support data modifications
-
⇐ both use the B-Tree format to store data [5]
-
⇐ created on demand [5]
- delete bitmap
- indicates which rows were deleted from a table
-
upon deletion the row continues to be stored into the rowgroup
-
during query execution SQL Server checks the delete bitmap and
excludes deleted rows from the processing [5]
- delta store
- includes newly inserted rows
-
updating a row triggers the deletion of the existing row and
insertion of a new version of a row to a delta store
- ⇒ the update does not change the row data
- ⇒ the updated row is inserted to a delete bitmap
-
[partitions] each partition can have a single delete bitmap and
multiple delta stores
-
⇐ this makes each partition self-contained and independent from other
partitions
-
⇒ allows performing a partition switch on tables that have
clustered columnstore indexes defined [5]
-
{feature} supports minimal logging for batch sizes >= rowgroup’s limit
[12]
-
[SQL Server 2017] supports non-persisted computed columns in clustered
columnstore indexes [2]
-
store some data temporarily into a clustered index (aka deltastore) and a
btree list of IDs for deleted rows
-
⇐ {benefit} reduces fragmentation of the column segments and improves
performance
-
combines query results from both the columnstore and the deltastore to
return the correct query results
-
[partitions] too many partitions can hurt the performance of a clustered
columnstore index [11]
-
{concept}
nonclustered columnstore index
(NCCI)
- {def} a secondary index that's created on a rowstore table
-
is defined as one or more columns of the table and has an optional
condition that filters the rows
-
designed to be used for workloads involving a mix of transactional and
analytics workload*
- functions the same as a clustered columnstore index
-
⇐ has same performance optimizations (incl. batchmode operators)
- {exception} doesn’t supports persisted computed columns
-
can’t be created on a columnstore index that has a computed column
[2]
-
however behave differently between the various versions of SQL Server
- [SQL Server 2012|2014] {restriction} readonly
-
contains a copy of part or all of the rows and columns in the underlying
table
- include a row-id , which is either the address of
- a row in a heap table
- a clustered index key value
-
includes all columns from the clustered index even when not
explicitly defined in the CREATE statement
-
the not specified columns will not be available in the
sys.index_columns view
-
[SQL Server 2016] multiple nonclustered rowstore indexes can be created
on a columnstore index and perform efficient table seeks on the
underlying columnstore
-
⇒ once created, makes it possible to drop one or more btree
nonclustered indexes
-
enables real-time operational analytics where the OLTP workload uses the
underlying clustered index while analytics run concurrently on the
columnstore index
-
{concept} batch mode execution
(aka
vector-based execution,
vectorized execution)
-
{def} query processing method used to process multiple rows together in
groups of rows, or batches, rather than one row at a time
-
SQL Server can push a predicate to the columnstore index scan operator,
preventing unnecessary rows from being loaded into the batch [5]
- queries can process up to 900 rows together
- enables efficient query execution (by a 3-4x factor) [4]
-
⇐ the size of the batches varies to fit into the CPU cache
-
⇒ reduces the number of times that the CPU needs to request external
data from memory or other components [5]
-
improves the performance of aggregations, which can be calculated on a
per-batch rather than a per-row basis [5]
-
tries to minimize the copy of data between operators by creating and
maintaining a special bitmap that indicates if a row is still valid in
the batch [5]
-
⇐ subsequent operators will ignore the non-valid rows
-
every operator has a queue of work items (batches) to process [5]
-
worker threads from a shared pool pick items from queues and process
them while migrating from operator to operator [5]
-
is closely integrated with, and optimized around, the columnstore storage
format.
- columnstore indexes use batch mode execution
-
⇐ improves query performance typically by two to four times
-
{concept}
tuple mover
-
single-threaded process that works in the background, preserving system
resources
-
converts closed delta stores to row groups that store data in a
column-based storage format [5]
- can be disabled via trace flag T-634
-
⇐ the conversion of closed delta stores to row groups can be forced by
reorganizing an index [5]
- runs in parallel using multiple threads
-
decreases significantly conversion time at a cost of extra CPU load
and memory usage [5]
- via: ALTER INDEX REORGANIZE command
-
it doesn’t prevent other sessions from inserting new data into a table [5]
-
deletions and data modifications would be blocked for the duration of the
operation [5]
-
{recommendation} consider forcing index reorganization manually to
reduce execution, and therefore locking, time [5]
- considered fragmented if it has
- multiple delta rowgroups
- deleted rows
- require maintenance like that of regular B-Tree indexes [5]
- {issue] partially populated row groups
-
{issue} overhead of delta store and delete bitmap scans during query
execution
- rebuilding the columnstore index addresses the issues
-
the strategy depends on the volatility of the data and the ETL processes
implemented in the system [5]
-
{recommendation} rebuild indexes when a table has a considerable volme
of deleted rows and/or a large number of partially populated rowgroups
[5]
-
{recommendation} rebuild partition(s) that still have a large number
of rows in open delta stores after the ETL process has completed,
especially if the ETL process does not use a bulk insert API [5]
-
creating/dropping/disabling/rebuilding functions like any other index
- columnstore statistics
-
a statistics object is created at the time of columnstore index
creation; however, it is neither populated nor updated afterward [5]
-
⇐ SQL Server relies on segment information, B-Tree indexes (when
available), and column-level statistics when deciding if a columnstore
index needs to be used [5]
-
it is beneficial to create missing column-level statistics on the
columns that participate in a columnstore index and are used in query
predicates and as join keys [5]
-
⇐ statistics rarely update automatically on very large tables [5]
- ⇒ statistics must be updated ‘manually’
-
[SQL Server 2019] included into the schema-only clone of a database
functionality [8]
-
enable performance troubleshooting without the need to manual capture
the statistics information
-
columnstore indexes has been added to
sp_estimate_data_compression_savings. In SQL Server 2019 both
-
COLUMNSTORE and COLUMNSTORE_ARCHIVE have been added to allow you to
estimate the space savings if
- either of these indexes are used on a table.
- [in-memory tables]
-
{limitation} a columnstore index must include all the columns and can’t
have a filtered condition [2]
-
{limitation} queries on columnstore indexes run only in InterOP mode,
and not in the in-memory native mode [2]
- {operation} designing columnstore indexes
-
{best practice} understand as much as possible data’s characteristics
- {best practice} identify workload’s characteristics
- {operation} create a clustered columnstore index
- via CREATE CLUSTERED COLUMNSTORE INDEX command
- not needed to specify any columns in the statement
-
⇐ the index will include all table columns
- {operation} index rebuilding
-
forces SQL Server to remove deleted rows physically from the index and
to merge the delta stores’ and row groups’ data [5]
-
all column segments are recreated with row groups fully populated [5]
- [<SQL Server 2019] offline operation
- [SQL Server 2019 Enterprise] online operation
- ⇒ higher availability
-
⇐ pausing and resuming create and rebuild operations are not supported
[11]
- very resource intensive process
- holds a schema modification (Sch-M) lock on the table
- ⇒ prevents other sessions from accessing it [5]
-
⇐ the overhead can be mitigated by using table/index partitioning
-
⇒ indexes will be rebuild on a partition basis for those partition
with volatile data [5]
- {operation} index reorganization
-
[<SQL Server 2019] a reorganize operation is required to merge
smaller COMPRESSED rowgroups, following an internal threshold policy
that determines how to remove deleted rows and combine the compressed
rowgroups
-
[SQL Server 2019] a background merge task also works to merge COMPRESSED
rowgroups from where a large number of rows has been deleted
-
⇐ after merging smaller rowgroups, the index quality should be
improved.
-
the tuple-mover is helped by a background merge task that
automatically compresses smaller OPEN delta rowgroups that have
existed for some time as determined by an internal threshold, or
merges COMPRESSED rowgroups from where a large number of rows has been
deleted
- via: ALTER INDEX REORGANIZE command
- [SQL Server 2016] performs additional defragmentation
-
removes deleted rows from row groups that have 10 or more percent of
the rows logically deleted [5]
-
merges closed row groups together, keeping the total number of rows
less than or equal than rowgroup’s limit [5]
-
⇐ both processes can be done together [5]
-
[SQL Server 2014] the only action performed is compressing and moving
the data from closed delta stores to rowgroups [5]
-
⇐ delete bitmap and open delta stores stay intact [5]
- via: ALTER INDEX REORGANIZE
- uses all available system resources while it is running [5]
- ⇒ speeds up the execution process
-
reduce the time during which other sessions cannot modify or delete
data in a table [5]
- close and compress all open row groups
-
via: ALTER INDEX REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON)
- row groups aren’t merged during this operation [5]
- {operation} estimate compression savings
- [SQL Server 2019] COLUMNSTORE and COLUMNSTORE_ARCHIVE added
-
allows estimating the space savings if either of these indexes are
used on a table [8]
- {limitation} not available in all editions
- via: sp_estimate_data_compression_savings
-
{operation} [bulk loads] when the number of rows is less than deltastore’s
limit, all the rows go directly to the deltastore
-
[large bulk load] most of the rows go directly to the columnstore
without passing through the deltastore
-
some rows at the end of the bulk load might be too few in number to
meet the minimum size of a rowgroup
-
⇒ the final rows go to the deltastore instead of the columnstore
-
bulk insert operations provide the number of rows in the batch as part
of the API call [5]
-
best results are achieved by choosing a batch size that is divisible
by rowgroup’s limit [5]
-
⇐ guarantees that every batch produces one or several fully populated
row groups [5]
- ⇒ reduce the total number of row groups in a table [5]
- ⇒ improves query performance
-
⇐ the batch size shouldn’t exceed rowgroup’s limit [5]
-
row groups can be still created on the fly in a manner to similar
a bulk insert when the size of the insert batch is close to or
exceeds [5]
-
{operation} [non-bulk operations] trickle inserts go directly to a delta
store
- {feature} parallel inserts
-
[SQL Server 2016] requires following conditions for parallel insert on CCI
[6]
- must specify TABLOCK
- no NCI on the clustered columnstore index
- no identity column
- database compatibility is set to 130
-
{recommendation} minimize the use of string columns in facts tables [5]
- string data use more space
-
their encoding involves additional overhead during batch mode execution
[5]
-
queries with predicates on string columns may have less efficient
execution plans that also require significantly larger memory grants as
compared to their non-string counterparts [5]
-
{recommendation} [SQL Server 2012|2014] do not push string predicates down
toward the lowest operators in execution plans.
-
{recommendation} add another dimension table and replace the string value in
the facts table with a synthetic, integer-based ID key that references a new
table [5]
- {operation} upgrading to SQL Server 2016
-
make sure that queries against the tables with columnstore indexes can
utilize parallelism in case if database compatibility level less than 130
[5]
-
{feature} [SQL Server 2019] automated columnstore index maintenance [8]
-
{improvement} [SQL Server 2019] better columnstore metadata memory
management
-
{improvement} [SQL Server 2019] low-memory load path for columnstore tables
-
{improvement} [SQL Server 2019] improved performance for bulk loading to
columnstore indexes
-
{improvement} [SQL Server 2019] server startup process has been made faster
for databases that use in-memory columnstore tables for HTAP
- {feature} DMVs
References:[1] SQL Docs (2020)
Columnstore indexes: Overview [
link]
[2] Microsoft Learn (2024) SQL: What's new in columnstore indexes [
link]
[3] Dejan Sarka et al (2012) Exam 70-463: Implementing a Data
Warehouse with Microsoft SQL Server 2012 (Training Kit)
[4] SQL Docs (2019) Columnstore indexes - Query performance [
link]
[5] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.
[6] Microsoft Learn (2016) Columnstore Index: Parallel load into clustered
columnstore index from staging table [
link]
[7] Microsoft Learn (2016) Columnstore Index Defragmentation using
REORGANIZE Command [
link]
[8] Microsoft (2018) Microsoft SQL Server 2019: Technical white paper [
link]
Acronyms:
CCI - clustered columnstore index
CI - columnstore index
DBCC
- Database Console Commands
DMV - Dynamic Management View
ETL
- Extract, Transform, Load
HTAP - Hybrid Transactional/Analytical
Processing
LOB - Line of Business
NCCI - nonclustered
columnstore index
OLTP - On-Line Transaction Processing
SP -
Service Pack