16 February 2025

💠🛠️🗒️SQL Server: Columnstore Indexes [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources. It considers only on-premise SQL Server, for other platforms please refer to the documentation.

Last updated: 15-Feb-2024

[SQL Server] columnstore indexes (CI)

  • {def} a technology for storing, retrieving and managing data by using a columnar data format (aka columnstore
    • store compressed data on a per-column rather than a per-row basis [5]
  • {benefit} designed for analytics and data warehousing workloads
    • data warehousing
      • {scenario} store fact tables and large dimension tables
        • ⇐ tend to require full table scans rather than table seeks
    • analytics workloads
      • {scenario} [SQL Server 2016 SP1] can be used for real-time analytics on operational databases
        • ⇐ an updatable nonclustered columnstore index can be created on a rowstore table
  • {benefit} performance increase 
    • can achieve up to 100x better performance [4]
    • offers an order of magnitude better performance than a rowstore index
      • {feature} uses batch mode execution
        • improves query performance typically by two to four times
      • have high performance gains for analytic queries that scan large amounts of data, especially on large tables (>1 million rows) 
  • {benefit} reduces significantly the data warehouse storage costs
    • {feature} data compression
      • ⇒ provides high compression rates, typically by 10 times
        • ⇒ reduces total I/O from the physical media
          • ⇐ queries often select only a few columns from a table
          • minimizes or eliminates system I/O bottlenecks
        • reduces significantly the memory footprint
          • ⇒ query performance can improve 
            • because SQL Server can perform more query and data operations in memory
  • {benefit} built in memory
    • ⇒ sufficient memory must be available 
  • {benefit} part of the database engine
    • no special hardware is needed
  • {concept} columnstore 
    • {def} data structure logically organized as a table with rows and columns, and physically stored in a column-wise data format
      • stores values from the same domain which commonly have similar values
        • when a query references a column, then only that column is fetched from disk [3]
          • ⇐ the columns not requested are skipped 
            • ⇒ they are not loaded into memory 
        • when a query is executed, the rows must be reconstructed
          • ⇒ row reconstruction takes some time and uses some CPU and memory resources [3]
        • [SQL Server 2016] columnstore index on rowstore tables
          • columnstore is updated when data changes in the rowstore table
        • both indexes work against the same data
  • {concept}rowstore
    • {def} data that's logically organized as a table with rows and columns, and physically stored in a row-wise data format
      • ⇐ the traditional way to store relational table data
      • refers to a table where the underlying data storage format is either
        • a heap
        • a clustered index
        • a memory-optimized table
  • {concept} rowstore index
    • performs best on queries that seek into the data, when searching for a particular value, or for queries on a small range of values
      • ⇒ appropriate for transactional workloads 
        • because they tend to require mostly table seeks instead of table scans
  • {concept} rowgroup
    • {def} a group of rows that are compressed into columnstore format at the same time
      • {constraint} has a maximum number of rows per rowgroup, which is 1,048,576 =2^20 rows
      • contains one column segment for every column in the table
      • can have more than one delta rowgroup that form the deltastore
        • e.g. when multiple threads create columnstore indexes using parallel execution plans [5]
          • ⇐ each thread will work with its own subset of data, creating separate rowgroups [5]
        • [partitions] each table partition has its own set of row groups [5]
          • ⇐  too many partitions may prevent workloads from benefiting from a CCI [11]
            • ⇐ data aren’t pushed into a compressed columnstore segment until the rowgroup limit is reached
    • {event} rowgroup is compressed
      • marked as read-only [16]
      • a compressed rowgroup is considered as fragmented when either 
        • row number < rowgroup limit but dictionary size reached the maximum
          • nothing can be done to increase the number of rows [15]
          • the trim_reason is other than DICTIONARY_SIZE
        • it has nonzero deleted rows that exceeds a minimum threshold [15]
    • {event} all data from rowgroup deleted 
      • transitions from COMPRESSED into TOMBSTONE state
      • later removed by the tuple-mover background process
    • {event} rows in the columnstore indexes can be moved to different locations
      • row-id in the nonclustered indexes aren’t updated 
        • ⇐ the mappings between old and new row locations are stored in an internal structure (aka mapping index) 
    • {event} rowgroup build
      • all column data are combined on a per-row group basis, encoded and compressed [5]
        • the rows within a row group can be rearranged if that helps to achieve a better compression rate [5]
  • {feature} data compression
    • the table is sliced into rowgroups, and each rowgroup is compresses in a column-wise manner
      • the number of rows in the rowgroup must be 
        • large enough to improve compression rates
        • small enough to benefit from in-memory operations
          • having too many small rowgroups decreases columnstore index’s quality
    • uses its own compression mechanism 
      • ⇒ row or page compression cannot be used on it [3]
      • [SQL Server 2016] page compression has been removed
        • ⇐ in some cases, page compression disallowed the creation of columnstore indexes with a very large number of columns [5]
  • {feature} compression delay
    • computed when a delta rowgroup is closed [7]
    • keeps the ‘active’ rows in delta rowgroup and only transition these rows to compressed rowgroup after a specified delay [7]
      • ⇐ reduces the overall maintenance overhead of NCCI [7]
      • ⇒ leads to a larger number of delta rowgroups [7]
    • {best practice} if the workload is primarily inserting data and querying it, the default COMPRESSION_DELAY of 0 is the recommended option [7]
    • {best practice} [OLTP workload] if > 10% rows are marked deleted in recently compressed rowgroups, then consider a value that accommodates the behavior [7]
      • via: create nonclustered columnstore index with (compression_delay= 150)
  • {feature} data encoding
    • all values in the data are replaced with 64-bit integers using one of two encoding algorithms
    • {concept} dictionary encoding
      • stores distinct values from the data in a separate structure (aka dictionary} 
        • every value in a dictionary has a unique ID assigned [5]
          • the ID is used for replacement
    • {concept} global dictionary
      • shared across all segments that belong to the same index partition [5]
    • {concept} local dictionary
      • created for individual segments using values that are not present in the global dictionary
    • {concept} value-based encoding
      • mainly used for numeric and integer data types that do not have enough duplicated values [5]
        • dictionary encoding would be inefficient [5]
      • converts integer and numeric values to a smaller range of 64-bit integers in 2 steps
        • {step} [numeric data types] are converted to integers using the minimum positive exponent (aka magnitude that allows this conversion) [5]
          • {goal} convert all numeric values to integers [5]
          • [integer data types] the smallest negative exponent is chosen that can be applied to all values without losing their precision [5]
            • {goal} reduce the interval between the minimum and maximum values stored in the segment [5]
        • {step} the minimum value (aka base value) in the segment is identified and subtracted it from all other values [5]
          • ⇒ makes the minimum value in the segment number 0 [5]
      • after encoding the data are compressed and stored as a LOB allocation unit
  • {concept} column segment 
    • {def} a column of data from within the rowgroup
    • is compressed together and stored on physical media
    • SQL Server loads an entire segment to memory when it needs to access its data
  • {concept} segment metadata 
    • store metadata about each segment 
      • e.g. minimum and maximum values
      • ⇐ segments that do not have the required data are skipped [5]
  • {concept} deltastore
    • {def} all of the delta rowgroups of a columnstore index
    • its operations are handled behind the scenes
      • can be in either states
        • {state} open (aka open delta store) 
          • accepts new rows and allow modifications and deletions of data
        • {state} closed (aka closed data store)
          • a delta store is closed when it reaches its rowgroup limit
  • {concept} delta rowgroup 
    • {def} a clustered B-tree index that's used only with columnstore indexes
    • improves columnstore compression and performance by storing rows until the number of rows reaches the rowgroup limit and are then moved into the columnstore
    • {event} reaches the maximum number of rows
      • it transitions from an ‘open’ to ‘closed’ state
      • a closed rowgroup is compressed by the tuple-mover and stored into the columnstore as COMPRESSED rowgroup
    • {event} compressed
      • the existing delta rowgroup transitions into TOMBSTONE state to be removed later by the tuple-mover when there is no reference to it
  • {concept} tuple-mover 
    • background process that checks for closed row group
      • if it finds a closed rowgroup, it compresses the delta rowgroup and stores it into the columnstore as a COMPRESSED rowgroup
  • {concept} clustered columnstore index (CCI) 
    • is the primary storage for the entire table
    • {characteristic) updatable
      • has two structures that support data modifications
        • ⇐ both use the B-Tree format to store data [5]
        • ⇐ created on demand [5]
        • delete bitmap
          • indicates which rows were deleted from a table
          • upon deletion the row continues to be stored into the rowgroup
          • during query execution SQL Server checks the delete bitmap and excludes deleted rows from the processing [5]
        • delta store
          • includes newly inserted rows
          • updating a row triggers the deletion of the existing row and insertion of a new version of a row to a delta store
            • ⇒ the update does not change the row data
            • ⇒ the updated row is inserted to a delete bitmap
        • [partitions] each partition can have a single delete bitmap and multiple delta stores
          • ⇐ this makes each partition self-contained and independent from other partitions
            • ⇒ allows performing a partition switch on tables that have clustered columnstore indexes defined [5]
    • {feature} supports minimal logging for batch sizes >= rowgroup’s limit [12]
    • [SQL Server 2017] supports non-persisted computed columns in clustered columnstore indexes [2]
    • store some data temporarily into a clustered index (aka deltastore) and a btree list of IDs for deleted rows
      • ⇐ {benefit} reduces fragmentation of the column segments and improves performance
      • combines query results from both the columnstore and the deltastore to return the correct query results
    • [partitions] too many partitions can hurt the performance of a clustered columnstore index [11]
  • {concept} nonclustered columnstore index (NCCI)
    • {def} a secondary index that's created on a rowstore table
      • is defined as one or more columns of the table and has an optional condition that filters the rows
      • designed to be used for workloads involving a mix of transactional and analytics workload*
      • functions the same as a clustered columnstore index
        • ⇐ has same performance optimizations (incl. batchmode operators)
        • {exception} doesn’t supports persisted computed columns
          • can’t be created on a columnstore index that has a computed column [2]
        • however behave differently between the various versions of SQL Server
          • [SQL Server 2012|2014] {restriction} readonly
      • contains a copy of part or all of the rows and columns in the underlying table
        • include a row-id , which is either the address of
          • a row in a heap table 
          • a clustered index key value
            • includes all columns from the clustered index even when not explicitly defined in the CREATE statement
              • the not specified columns will not be available in the sys.index_columns view
      • [SQL Server 2016] multiple nonclustered rowstore indexes can be created on a columnstore index and perform efficient table seeks on the underlying columnstore
        • ⇒ once created, makes it possible to drop one or more btree nonclustered indexes
      • enables real-time operational analytics where the OLTP workload uses the underlying clustered index while analytics run concurrently on the columnstore index
  • {concept} batch mode execution (aka vector-based execution, vectorized execution
    • {def} query processing method used to process multiple rows together in groups of rows, or batches, rather than one row at a time
      • SQL Server can push a predicate to the columnstore index scan operator, preventing unnecessary rows from being loaded into the batch [5]
      • queries can process up to 900 rows together
        • enables efficient query execution (by a 3-4x factor) [4]
        • ⇐ the size of the batches varies to fit into the CPU cache
        • ⇒ reduces the number of times that the CPU needs to request external data from memory or other components [5]
      • improves the performance of aggregations, which can be calculated on a per-batch rather than a per-row basis [5]
      • tries to minimize the copy of data between operators by creating and maintaining a special bitmap that indicates if a row is still valid in the batch [5]
        • ⇐ subsequent operators will ignore the non-valid rows
        • every operator has a queue of work items (batches) to process [5]
        • worker threads from a shared pool pick items from queues and process them while migrating from operator to operator [5]
    • is closely integrated with, and optimized around, the columnstore storage format.
      • columnstore indexes use batch mode execution
        • ⇐ improves query performance typically by two to four times
  • {concept} tuple mover
    • single-threaded process that works in the background, preserving system resources
      • runs every five minutes
    • converts closed delta stores to row groups that store data in a column-based storage format [5]
      • can be disabled via trace flag T-634 
      • ⇐ the conversion of closed delta stores to row groups can be forced by reorganizing an index [5]
        • runs in parallel using multiple threads
          • decreases significantly conversion time at a cost of extra CPU load and memory usage [5]
        • via: ALTER INDEX REORGANIZE command
    • it doesn’t prevent other sessions from inserting new data into a table [5]
    • deletions and data modifications would be blocked for the duration of the operation [5]
      • {recommendation} consider forcing index reorganization manually to reduce execution, and therefore locking, time [5]
    • considered fragmented if it has
      • multiple delta rowgroups
      • deleted rows
    • require maintenance like that of regular B-Tree indexes [5]
      • {issue] partially populated row groups
      • {issue} overhead of delta store and delete bitmap scans during query execution
      • rebuilding the columnstore index addresses the issues
      • the strategy depends on the volatility of the data and the ETL processes implemented in the system [5]
        • {recommendation} rebuild indexes when a table has a considerable volme of deleted rows and/or a large number of partially populated rowgroups [5]
        • {recommendation} rebuild partition(s) that still have a large number of rows in open delta stores after the ETL process has completed, especially if the ETL process does not use a bulk insert API [5]
      • creating/dropping/disabling/rebuilding functions like any other index
    • columnstore statistics 
      • a statistics object is created at the time of columnstore index creation; however, it is neither populated nor updated afterward [5]
        • ⇐ SQL Server relies on segment information, B-Tree indexes (when available), and column-level statistics when deciding if a columnstore index needs to be used [5]
        • it is beneficial to create missing column-level statistics on the columns that participate in a columnstore index and are used in query predicates and as join keys [5]
          • ⇐ statistics rarely update automatically on very large tables [5]
            • ⇒ statistics must be updated ‘manually’
      • [SQL Server 2019] included into the schema-only clone of a database functionality [8]
        • enable performance troubleshooting without the need to manual capture the statistics information
      • columnstore indexes has been added to sp_estimate_data_compression_savings. In SQL Server 2019 both 
      • COLUMNSTORE and COLUMNSTORE_ARCHIVE have been added to allow you to estimate the space savings if 
      • either of these indexes are used on a table.
        • via DBCC CLONEDATABASE
    • [in-memory tables] 
      • {limitation} a columnstore index must include all the columns and can’t have a filtered condition [2]
      • {limitation} queries on columnstore indexes run only in InterOP mode, and not in the in-memory native mode [2]
    • {operation} designing columnstore indexes
      • {best practice} understand as much as possible data’s characteristics
      • {best practice} identify workload’s characteristics
    • {operation} create a clustered columnstore index
      • via CREATE CLUSTERED COLUMNSTORE INDEX command
      • not needed to specify any columns in the statement
        • ⇐ the index will include all table columns
    • {operation} index rebuilding 
      • forces SQL Server to remove deleted rows physically from the index and to merge the delta stores’ and row groups’ data [5]
        • all column segments are recreated with row groups fully populated [5]
      • [<SQL Server 2019] offline operation
      • [SQL Server 2019 Enterprise] online operation
        • ⇒ higher availability 
        • ⇐ pausing and resuming create and rebuild operations are not supported [11]
      • very resource intensive process
      • holds a schema modification (Sch-M) lock on the table
        • ⇒ prevents other sessions from accessing it [5]
        • ⇐ the overhead can be mitigated by using table/index partitioning
          • ⇒ indexes will be rebuild on a partition basis for those partition with volatile data [5]
    • {operation} index reorganization 
      • [<SQL Server 2019] a reorganize operation is required to merge smaller COMPRESSED rowgroups, following an internal threshold policy that determines how to remove deleted rows and combine the compressed rowgroups
      • [SQL Server 2019] a background merge task also works to merge COMPRESSED rowgroups from where a large number of rows has been deleted
        • ⇐ after merging smaller rowgroups, the index quality should be improved.
        • the tuple-mover is helped by a background merge task that automatically compresses smaller OPEN delta rowgroups that have existed for some time as determined by an internal threshold, or merges COMPRESSED rowgroups from where a large number of rows has been deleted
        • via: ALTER INDEX REORGANIZE command
      • [SQL Server 2016] performs additional defragmentation
        • removes deleted rows from row groups that have 10 or more percent of the rows logically deleted [5]
        • merges closed row groups together, keeping the total number of rows less than or equal than rowgroup’s limit [5]
        • ⇐ both processes can be done together [5]
      • [SQL Server 2014] the only action performed is compressing and moving the data from closed delta stores to rowgroups [5] 
        • ⇐ delete bitmap and open delta stores stay intact [5]
      • via: ALTER INDEX REORGANIZE
        • uses all available system resources while it is running [5]
          • ⇒ speeds up the execution process 
          • reduce the time during which other sessions cannot modify or delete data in a table [5]
        • close and compress all open row groups
          • via: ALTER INDEX REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON)
          • row groups aren’t merged during this operation [5]
    • {operation} estimate compression savings
      • [SQL Server 2019] COLUMNSTORE and COLUMNSTORE_ARCHIVE added
        • allows estimating the space savings if either of these indexes are used on a table [8]
        • {limitation} not available in all editions 
      • via: sp_estimate_data_compression_savings 
    • {operation} [bulk loads] when the number of rows is less than deltastore’s limit, all the rows go directly to the deltastore
      • [large bulk load] most of the rows go directly to the columnstore without passing through the deltastore
        • some rows at the end of the bulk load might be too few in number to meet the minimum size of a rowgroup
          • ⇒ the final rows go to the deltastore instead of the columnstore
      • bulk insert operations provide the number of rows in the batch as part of the API call [5]
        • best results are achieved by choosing a batch size that is divisible by rowgroup’s limit [5]
          • ⇐ guarantees that every batch produces one or several fully populated row groups [5]
            • ⇒ reduce the total number of row groups in a table [5]
            • ⇒ improves query performance
          • ⇐ the batch size shouldn’t exceed rowgroup’s limit [5]
            • row groups can be still created on the fly in a manner to similar a bulk insert when the size of the insert batch is close to or exceeds [5]
  • {operation} [non-bulk operations] trickle inserts go directly to a delta store
  • {feature} parallel inserts
    • [SQL Server 2016] requires following conditions for parallel insert on CCI [6]
      • must specify TABLOCK
      • no NCI on the clustered columnstore index
      • no identity column
      • database compatibility is set to 130
  • {recommendation} minimize the use of string columns in facts tables [5]
    • string data use more space
    • their encoding involves additional overhead during batch mode execution [5]
    • queries with predicates on string columns may have less efficient execution plans that also require significantly larger memory grants as compared to their non-string counterparts [5]
  • {recommendation} [SQL Server 2012|2014] do not push string predicates down toward the lowest operators in execution plans.
  • {recommendation} add another dimension table and replace the string value in the facts table with a synthetic, integer-based ID key that references a new table [5]
  • {operation} upgrading to SQL Server 2016
    • make sure that queries against the tables with columnstore indexes can utilize parallelism in case if database compatibility level less than 130 [5]
  • {feature} [SQL Server 2019] automated columnstore index maintenance [8]
  • {improvement} [SQL Server 2019] better columnstore metadata memory management
  • {improvement} [SQL Server 2019] low-memory load path for columnstore tables
  • {improvement} [SQL Server 2019] improved performance for bulk loading to columnstore indexes
  • {improvement} [SQL Server 2019] server startup process has been made faster for databases that use in-memory columnstore tables for HTAP
  • {feature} DMVs

References:
[1] SQL Docs (2020) Columnstore indexes: Overview [link]
[2] Microsoft Learn (2024) SQL: What's new in columnstore indexes  [link]
[3] Dejan Sarka et al (2012) Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 (Training Kit)
[4] SQL Docs (2019) Columnstore indexes - Query performance [link]
[5] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.
[6] Microsoft Learn (2016) Columnstore Index: Parallel load into clustered columnstore index from staging table [link]
[7] Microsoft Learn (2016) Columnstore Index Defragmentation using REORGANIZE Command [link]
[8] Microsoft (2018) Microsoft SQL Server 2019: Technical white paper [link]

Acronyms:
CCI - clustered columnstore index
CI - columnstore index
DBCC - Database Console Commands
DMV - Dynamic Management View
ETL - Extract, Transform, Load
HTAP - Hybrid Transactional/Analytical Processing 
LOB - Line of Business
NCCI - nonclustered columnstore index
OLTP - On-Line Transaction Processing
SP - Service Pack

15 February 2025

🧭Business Intelligence: Perspectives (Part XXVII: A Tale of Two Cities II)

Business Intelligence Series
Business Intelligence Series
There’s a saying that applies to many contexts ranging from software engineering to data analysis and visualization related solutions: "fools rush in where angels fear to tread" [1]. Much earlier, an adage attributed to Confucius provides a similar perspective: "do not try to rush things; ignore matters of minor advantage". Ignoring these advices, there's the drive in rapid prototyping to jump in with both feet forward without checking first how solid the ground is, often even without having adequate experience in the field. That’s understandable to some degree – people want to see progress and value fast, without building a foundation or getting an understanding of what’s happening, respectively possible, often ignoring the full extent of the problems.

A prototype helps to bring the requirements closer to what’s intended to achieve, though, as the practice often shows, the gap between the initial steps and the final solutions require many iterations, sometimes even too many for making a solution cost-effective. There’s almost always a tradeoff between costs and quality, respectively time and scope. Sooner or later, one must compromise somewhere in between even if the solution is not optimal. The fuzzier the requirements and what’s achievable with a set of data, the harder it gets to find the sweet spot.

Even if people understand the steps, constraints and further aspects of a process relatively easily, making sense of the data generated by it, respectively using the respective data to optimize the process can take a considerable effort. There’s a chain of tradeoffs and constraints that apply to a certain situation in each context, that makes it challenging to always find optimal solutions. Moreover, optimal local solutions don’t necessarily provide the optimum effect when one looks at the broader context of the problems. Further on, even if one brought a process under control, it doesn’t necessarily mean that the process works efficiently.

This is the broader context in which data analysis and visualization topics need to be placed to build useful solutions, to make a sensible difference in one’s job. Especially when the data and processes look numb, one needs to find the perspectives that lead to useful information, respectively knowledge. It’s not realistic to expect to find new insight in any set of data. As experience often proves, insight is rarer than finding gold nuggets. Probably, the most important aspect in gold mining is to know where to look, though it also requires luck, research, the proper use of tools, effort, and probably much more.

One of the problems in working with data is that usually data is analyzed and visualized in aggregates at different levels, often without identifying and depicting the factors that determine why data take certain shapes. Even if a well-suited set of dimensions is defined for data analysis, data are usually still considered in aggregate. Having the possibility to change between aggregates and details is quintessential for data’s understanding, or at least for getting an understanding of what's happening in the various processes. 

There is one aspect of data modeling, respectively analysis and visualization that’s typically ignored in BI initiatives – process-wise there is usually data which is not available and approximating the respective values to some degree is often far from the optimal solution. Of course, there’s often a tradeoff between effort and value, though the actual value can be quantified only when gathering enough data for a thorough first analysis. It may also happen that the only benefit is getting a deeper understanding of certain aspects of the processes, respectively business. Occasionally, this price may look high, though searching for cost-effective solutions is part of the job!

Previous Post  <<||>> Next Post

References:
[1] Alexander Pope (cca. 1711) An Essay on Criticism

14 February 2025

🏭🧊🗒️Microsoft Fabric: Partitions in Lakehouses [Notes]

 Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 14-Feb-2024

[Microsoft Fabric] Partitions

  • {def} a data organization technique used to split a large dataset into smaller, more manageable nonoverlapping subsets (aka partitions, shards
    • a pattition is defined based on one or more fields
    • each partition contains a subset of the data
    • each partitions can be stored and processed independently
  • {goal} improve performance, scalability, and manageability of large data tables
  • {benefit} allows to split large tables into smaller, manageable partitions based on specific criteria [2]
    • e.g., date ranges, regions, categories, entities
  • {benefit} allows to improve queries' performance as they can target specific partitions [2]
    • reduces the amount of data scanned [2]
    • improves queries' performance [2]
  • {benefit} allows for more efficient data loading [2]
  • {benefit} facilitates the management of big tables [2]
    • maintenance tasks can be performed on individual partitions  [2]
    • obsolete data partitions can be removed with no overhead, adding new partitions on a need basis [2]
  • applies to 
    • backups
    • indexing
    • allows optimizing query performance for specific subsets of data
    • statistics
  • performance can be affected by
    • the choice of partition columns for a delta table [1]
    • the number and size of partitions of the partition column [1]
    • a column with high cardinality (mostly or entirely made of unique values) results in a large number of partitions [1]
      • ⇐ negatively impacts performance of the metadata discovery scan for changes [1]
      • {recommendation} if the cardinality of a column is high, choose another column for partitioning [1]
    • the size of each partition can also affect performance
      • {recommendation} use a column that would result in a partition of at least (or close to) 1 GB [1]
      • {recommendation} follow the best practices for delta tables maintenance [1]
      • a large volume of small-sized parquet files increases the time it takes to sync the changes [1]
        • ⇒ leads to large number of parquet files in a delta table due to [1]
          • over-partitioning
            • partitions with high number of unique values [1]
            • {recommendation} choose a partition column that 
              • doesn't have a high cardinality [1]
              • results in individual partition size of at least 1 GB [1]
          • many small files
            • batch and streaming data ingestion rates might result in small files
              • depends on the frequency and size of changes being written to a lakehouse [1]
            • {recommendation} implement regular lakehouse table maintenance [1] 

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Microsoft Fabric: SQL analytics endpoint performance considerations [link]
[2] Kenneth A Omorodion (2024) Partitioning Data in Microsoft Fabric to Improve Performance
written [link]
[3] Microsoft Learn (2024) Microsoft Fabric: Loading Fabric Lakehouse Tables with partitions [link]
[4] 

Resources
[R1] Microsoft Learn (2024) Microsoft Fabric: Load data to Lakehouse using partition in a Data pipeline [link]

13 February 2025

🧊💠🗒️Data Warehousing: Table Partitioning in SQL Server [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes.

Last updated: 13-Feb-2025

[Data Warehousing] Table Partitioning

  • {defthe spreading of data across multiple tables based on a set of rules to balance large amounts of data across disks or nodes
    • data is distributed based on a function that defines a range of values for each partition [2] 
      • the table is partitioned by applying the partition scheme to the values in a specified column [2]
  • {operation} partition creation
    • [large partitioned table]
      • should be created two auxiliary nonindexed empty tables with the same structure, including constraints and data compression options [4]
        • first table:  create a check constraint that guarantees that all data from the table fits exactly with one empty partition of the fact table
          • the constraint must be created on the partitioning column [4]
          • a columnstore index can be created on the fact table, as long as it is aligned with the table [4]
          • after truncation of <table 2> the <table 1> is prepared to accept the next partition from your fact table for the next minimally logged deletion [4]
        • second table: for minimally logged deletions of large portions of data, a partition from the fact table can be switched to the empty table version without the check constraint [4]
          • then the table can be truncated
      • for minimally logged inserts, new data to the second auxiliary table should be bulk inserted in the auxiliary that has the check constraint [4]
        • INSERT operation can be minimally logged because the table is empty [4]
        • create a columnstore index on this auxiliary table, using the same structure as the columnstore index on your fact table [4]
        • switch data from this auxiliary table to a partition of your fact table [4]
        • drop the columnstore index on the auxiliary table, and change the check constraint to guarantee that all of the data for the next load can be switched to the next empty partition of the fact table [4]
        • the second auxiliary table is prepared for new bulk loads again [4]
  • {operation} [Query Optimizer] partition elimination 
    • process in which SQL Server accesses only those partitions needed to satisfy query filters [4]
  • {operation} partition switching 
    • {definition} process that switches a block of data from one table or partition to another table or partition [4]
    • types of switches
      • reassign all data from a nonpartitioned table to an empty existing partition of a partitioned table [4]
      • switch a partition of one partitioned table to a partition of another partitioned table [4]
      • reassign all data from a partition of a partitioned table to an existing empty nonpartitioned table [4]
  • {benefit} improves query performance [1]
    • by partitioning a table across filegroups [1]
      • specific ranges of data can be placed on different disk spindles [1]
        • can improve I/O performance [1]
          • ⇐ the disk storage is already configured as a RAID 10 or RAID 5 array [1]
            • ⇒ this usually has little benefit [1]
      • using a mix of fast solid state storage for recent, frequently accessed data, and mechanical disks for older, less queried rows [1]
        • use partitioning to balance disk performance against storage costs [1]
      • biggest performance gain from partitioning in a data warehouse is realized when queries return a range of rows that are filtered on the partitioning key [1]
        • the query optimizer can eliminate partitions that are not within the filter range [1]
          • dramatically reduce the number of rows that need to be read [1]
    • reduces contention [3]
      • can reduce the number of rows included in a table scan [3]
  • {benefit} more granular manageability [1]
    • some maintenance operations can be performed at partition level instead of on the whole table [1]
      • e.g. indexes can be created and rebuilt on a per-partition basis [1]
      • e.g. compression can be applied to individual partitions [1]
      • e.g. by mapping partitions to filegroups, partitions can be backed up and restored independently [1]
        • enables to back up older data once and then configure the backed up partitions as read-only [1]
        • future backups can be limited to the partitions that contain new or updated data [1]
  • {benefit} improved data load performance
    • enables loading many rows very quickly by switching a staging table with a partition
      • can dramatically reduce the time taken by ETL data loads [1]
        • with the right planning, it can be achieved with minimal requirements to drop or rebuild indexes [1]
  • {best practice} partition large fact tables
    • tables of around 50 GB or more
    • ⇐ in general, fact tables benefit from partitioning more than dimension tables [1]
  • {best practice} partition on an incrementing date key [1]
    • assures that the most recent data are in the last partition and the earliest data are in the first partition [1]
  • {best practice} design the partition scheme for ETL and manageability [1]
    • the query performance gains realized by partitioning are small compared to the manageability and data load performance benefits [1]
      • ideally partitions should reflect the ETL load frequency
        • because this simplifies the load process [1]
        • merge partitions periodically to reduce the overall number of partitions (for example, at the start of each year [1]
      • could merge the monthly partitions for the previous year into a single partition for the whole year [1]
  • {best practice} maintain an empty partition at the start and end of the table [1]
    • simplifies the loading of new rows [1]
    • when new records need to be added, split the empty partition 
      • ⇐ to create two empty partitions)
    • then switch the staged data with the first empty partition [1]
      • ⇐ loads the data into the table and leaves the second empty partition you created at the end of the table, ready for the next load [1]
    • a similar technique can be used to archive or delete obsolete data at the beginning of the table [1]
  • {best practice} chose the proper granularity
    • it should be aligned to the business requirements [2]
  • {best practice} create at least one filegroup in addition to the primary one
    • set it as the default filegroup
      • data tables are thus separated from system tables [2]
    • creating dedicated filegroups for extremely large fact tables [2]
      • place the fact tables on their own logical disks [2]
    • use a file and a filegroup separate from the fact and dimension tables [2]
      • {exception} staging tables that will be switched with partitions to perform fast loads [2]
        • staging tables must be created on the same filegroup as the partition with which they will be switched [2]
  • {def} partition scheme 
    • a scheme that maps partitions to filegroups
  • {def} partition function 
    • object that maps rows to partitions by using values from specific columns (aka  partitioning columns)
    • performs logical mapping
  • {def} aligned index 
    • index built on the same partition scheme as its base table [4]
      • if all indexes are aligned with their base table, switching a partition is a metadata operation only [4]
        • ⇒ it’s very fast [4]
Previous Post <<||>> Next Post

References:
[1] 20467A - Designing Business Intelligence Solutions with Microsoft SQL Server 2012
[2] 20463C - Implementing a Data Warehouse with Microsoft SQL Server
[3] 10777A - Implementing a Data Warehouse with Microsoft SQL Server 2012
[4] Dejan Sarka et al (2012) Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 (Training Kit)
[5] Microsoft Learn (2009) How to Implement an Automatic Sliding Window in a Partitioned Table on SQL Server 2005 [link]

🏭💠🗒️Microsoft Fabric: SQL Analytics Endpoint [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 12-Feb-2024

[Microsoft Fabric] SQL Analytics Endpoint

  • {def} a service that listens actively for requests providing a SQL-based experience for lakehouse Delta tables [1]
    • enables to query data in the lakehouse using T-SQL language and TDS protocol [1]
    • created by default for each lakehouses and mirrored databases provisioned in a workspace
      • every lakehouse has one SQL analytics endpoint [1]
    • manages the automatically generated tables so the workspace users can't modify them [1]
  • {feature} a background process is responsible for 
    • scanning lakehouse for changes [1]
      • automatically generates a table in the appropriate schema for every Delta table in the Lakehouse
        • tables are created with a minor delay [1]
        • data is refreshed automatically
          • the amount of time it takes to refresh the table is related to how optimized the Delta tables are [1]
        • {alternative} one can manually force a refresh of the automatic metadata scanning in the Fabric portal [1]
    • keeping SQL analytics endpoint up-to-date for all the changes committed to lakehouses in a workspace [1]
      • {characteristic} transparently managed
        • ⇐ the sync process is transparently managed by Microsoft Fabric platform [1]
        • when a change is detected in the lakehouse
          • a background process updates metadata [1]
          • the SQL analytics endpoint reflects the changes committed to lakehouse tables [1]
      • {characteristic} low latency
        • under normal operating conditions, the lag between a lakehouse and SQL analytics endpoint is less than one minute [1]
        • the actual length of time can vary from a few seconds to minutes [1]
      • the metadata synchronization is automatically triggered when the SQL Endpoint is opened [3]
        • ensures the SQL Analytics Endpoint remains up to date without the need for manual syncs [3]
          •  ⇐ provides a seamless experience [3]
  • {feature} can be enriched by adding database objects
    • schemas
    • views
    • procedures
    • other database objects
  • {feature} automatic metadata discovery
    • tracks changes committed to lakehouses [1]
      • is a single instance per Fabric workspace [1]
      • {issue} increased latency for changes to sync between lakehouses and SQL analytics endpoint, it could be due to large number of lakehouses in one workspace [1]
        • {resolution} migrate each lakehouse to a separate workspace [1]
          • allows automatic metadata discovery to scale [1]
      • {issue} changes committed to a lakehouse are not visible in the associated SQL analytics endpoint
        • create a new table in the  [1]
        • many committed rows may not be visible in the endpoint [1]
        • ⇐ may be impacted by ETL processing that generate large volumes of changes
        • {recommendation} initiate an on-demand metadata sync, triggered from the SQL query editor Refresh ribbon option [1]
          • forces an on-demand metadata sync, rather than waiting on the background metadata sync to finish [1]
      • {issue} if there's no maintenance scheduled for the parquet files, this can result in read overhead and this impacts time it takes to sync changes to SQL analytics endpoint [1]
        • {recommendation}schedule regular lakehouse table maintenance operations [1]
      • {limitation} not all Delta features are understood by the automatic sync process [1]
  • {feature} endpoint reprovisioning
    • if the initial provisioning attempt fails, users have the option to try again [2]
      • ⇐ without the need to create an entirely new lakehouse [2]
      • {benefit} empowers users to self-mitigate provisioning issues in convenient way in the UI avoiding the need for complete lakehouse re-creation [2]

References:
[1] Microsoft Learn (2024) Microsoft Fabric: SQL analytics endpoint performance considerations [link]
[2] Microsoft Learn (2024) Microsoft Fabric: What is the SQL analytics endpoint for a lakehouse? [link]
[3] Microsoft Learn (2024) Microsoft Fabric: What’s new in the Fabric SQL Analytics Endpoint? [link]

09 February 2025

🏭🗒️Microsoft Fabric: Data Pipelines [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Feb-2024

[Microsoft Fabric] Data pipeline

  • {def} a logical sequence of activities that orchestrate a process and perform together a task [1]
    • usually by extracting data from one or more sources and loading it into a destination; 
      • ⇐ often transforming it along the way [1]
      • ⇐ allows to manage the activities as a set instead of each one individually [2]
      • ⇐ used to automate ETL processes that ingest transactional data from operational data stores into an analytical data store [1]
      • e.g. lakehouse or data warehouse 
  • {concept} activity
    • {def} an executable task in a pipeline
      • a flow of activities can be defined by connecting them in a sequence [1]
      • its outcome (success, failure, or completion) can be used to direct the flow to the next activity in the sequence [1]
    • {type} data movement activities
      • copies data from a source data store to a sink data store [2]
    • {type} data transformation activities
      • encapsulate data transfer operations
        • incl. simple Copy Data activities that extract data from a source and load it to a destination
        • incl. complex Data Flow activities that encapsulate dataflows (Gen2) that apply transformations to the data as it is transferred
        • incl. notebook activities to run a Spark notebook
        • incl. stored procedure activities to run SQL code
        • incl. delete data activities to delete existing data
    • {type} control flow activities
        • used to 
          • implement loops
          • implement conditional branching
          • manage variables
          • manage parameter values
          • enable to implement complex pipeline logic to orchestrate data ingestion and transformation flow  [1]
        • can be parameterized
          • ⇐enabling to provide specific values to be used each time a pipeline is run  [1]
        • when executed, a run is initiated (aka data pipeline run
          • runs can be initiated on-demand or scheduled to start at a specific frequency
          • use the unique run ID to review run details to confirm they completed successfully and investigate the specific settings used for each execution [1]
      • {benefit} increases pipelines’ reusability
    • {concept} pipeline template
      • predefined pipeline that can be used and customize as required
    • {concept} data pipeline run
      • occurs when a data pipeline is executed
      • the activities in the data pipeline are executed to completion [3] 
      • can be triggered one of two ways
        • on-demand
        • on a schedule
          • the scheduled pipeline will be able to run based on the time and frequency set [3]

    Previous Post <<||>> Next Post

    References:
    [1] Microsoft Learn (2023) Use Data Factory pipelines in Microsoft Fabric [link]
    [2] Microsoft Learn (2024) Microsoft Fabric: Activity overview [link]
    [3] Microsoft Learn (2024) Microsoft Fabric Concept: Data pipeline Runs [link]

    Resources
    [R1] Metadata Driven Pipelines for Microsoft Fabric (link)
    [R2] 

    🌌🏭KQL Reloaded: First Steps (Part XI: Window Functions)

    Window functions are one of the powerful features available in RDBMS as they allow to operate across several lines or a result set and return a result for each line. The good news is that KQL supports several window functions, which allow to address several scenarios. However, the support is limited and needs improvement.

    One of the basic scenarios that takes advantage of windows functions is the creation of a running sum or creating a (dense) rank across a whole dataset. For this purposes, in Kusto one can use the row_cumsum for running sum, respectively the row_number, row_rank_dense and row_rank_min for ranking. In addition, one can refer to the values of a field from previous (prev) and next record. 

    // rank window functions within dataset (not partitioned)
    NewSales
    | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
    | where CustomerKey in (12617, 12618)
    | project CustomerKey, DateKey, ProductName, TotalCost
    | sort by CustomerKey asc, DateKey asc 
    | extend Rank1 = row_rank_dense(DateKey)
    
        , Rank2 = row_number(0)
    , Rank3 = row_rank_min(DateKey) , Sum = row_cumsum(TotalCost) , NextDate = next(DateKey) , PrevDate = prev(DateKey)

    Often, it's needed to operate only inside of a partition and not across the whole dataset. Some of the functions provide additional parameters for this:

    // rank window functions within partitions
    NewSales
    | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
    | where CustomerKey in (12617, 12618)
    | project CustomerKey, DateKey, ProductName, TotalCost
    | sort by CustomerKey asc, DateKey asc 
    | extend RowRank1 = row_rank_dense(DateKey, prev(CustomerKey) != CustomerKey)
        , RowRank2 = row_number(0, prev(CustomerKey) != CustomerKey)
        , Sum = row_cumsum(TotalCost, prev(CustomerKey) != CustomerKey)
        , NextDate = iif(CustomerKey == next(CustomerKey), next(DateKey), datetime(null))
        , PrevDate = iif(CustomerKey == prev(CustomerKey), prev(DateKey),  datetime(null))
    

    In addition, the partitions can be defined explicitly via the partition operator:

    // creating explicit partitions
    NewSales
    | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
    | where CustomerKey in (12617, 12618)
    | project CustomerKey, DateKey, ProductName, TotalCost
    | partition by CustomerKey
    (
        order by DateKey asc
        | extend prev_cost = prev(TotalCost, 1)
    )
    | order by CustomerKey asc, DateKey asc
    | extend DifferenceCost = TotalCost - prev_cost
    

    It will be interesting to see whether Microsoft plans for the introduction of further window functions in KQL to bridge the gap. The experience proved that such functions are quite useful in data analytics, sometimes developers needing to go a long extent for achieving the respective behavior (see visual calcs in Power BI). For example, SQL Server leverages several types of such functions, though it took Microsoft more than several versions to this make progress. More over, developers can introduce their own functions, even if this involves .Net programming. 

    Happy coding!

    Previous Post <<||>> Next Post

    References:
    [1] Microsoft Learn (2024) Kusto: Window functions overview [link]

    🏭🗒️Microsoft Fabric: Kusto Query Language (KQL) [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 9-Feb-2025

    [Microsoft Fabric] Kusto Query Language (KQL)

    • {def} a read-only request to process query language [1]
      • designed for data exploration and summarization [1]
        • very similar to SQL
          • the explain command can be used to transform SQL into KQL code
            • ⇐ not all the SQL syntax can be translated
      • statements are sequenced being executed in the order of their arrangement
        • funnel like processing where data is piped from one operator to the next
          • data is filtered, rearranged or summarized at each step and then fed into the following step
          • statements are sequenced by a pipe (|)
      • returns data in a tabular or graph format
      • designed and developed to take advantage of cloud computing through clustering and scaling compute [2]
        • ideal engine to power fast, real-time dashboards
      • case-sensitive in general 
      • named after the undersea pioneer Jacques Cousteau [2]
      • operation sequence
        • filter data
        • aggregate data
        • order data
        • modify column output
    • supports standard data types 
      • string
        • a sequence of zero or more Unicode characters
        • characters are encoded in UTF-8.
      • int 
        • 32-bit whole-number integer
      • long
        • signed 64-bit whole-number integer
      • real (aka double)
        • 64-bit decimal-based number 
        • and provides high precision with decimal points.
      • decimal
        • a 128-bit decimal number
        • provides the highest precision of decimal points
        • {recommendation} if precision is not needed, use the real type instead [2]
      • bool 
        • a boolean value that can be a true (1), false (0), or null
      • datetime 
        • represents a date in the UTC zone
      • timespan 
        • represents a time interval
          •  days, hours, minutes, seconds, milliseconds, microseconds, tick
        • if no time frame is specified, it will default to day
      • dynamic
        • a special data type that can take 
          • any value from the other data types
          • arrays
          • a {name = value} property bag
      • guid
        • a 128-bit globally unique value
    • statement types
      • tabular expression statement
      • let statement
        • used to 
          • set variable names equal to an expression
          • create views
        • ⇐ used mostly to 
          • help break complex expressions into multiple parts, each represented by a variable
          • sett constants outside the query to aid in readability
      • set statement
        • used to set the query duration
    • {tool}Microsoft Santinel 
      • {def} a cloud native SIEM and SOAR that provides cyberthreat detection, investigation, response, and proactive hunting, with a bird's-eye view across your enterprise [3]
    • {tool} Kusto Explorer
      • {def} user-friendly interface to query and analyze data with KQL [4]
    • {tool} Azure Data Studio 
      • {def} lightweight, cross-platform data management and development tool for data professionals [5]

    References:
    [1] Microsoft (2024) Real-time Analytics: End-to-End Workshop
    [2] Mark Morowczynski et al (2024) The Definitive Guide to KQL: Using Kusto Query Language for Operations, Defending, and Threat Hunting
    [3] Microsoft Learn (2024) Azure: What is Microsoft Sentinel [link]
    [4] Microsoft Learn (2024) Kusto: Kusto.Explorer installation and user interface [link]
    [5] Microsoft Learn (2024) SQL: What is Azure Data Studio? [link]

    Acronyms:
    KQL - Kusto Query Language (
    SIEM - security information and event management
    SOAR - security orchestration, automation, and response
    SQL - Structured Query Language
    UTC - Universal Time Coordinated

    🏭🗒️Microsoft Fabric: Sharding [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 9-Feb-2024

     [Microsoft Fabric] Data Partitioning (aka Sharding)

    • {definition} "a process where small chunks of the database are isolated and can be updated independently of other shards" [2]
    • allows a logical database to be partitioned across multiple physical servers [1]
      • each partition is referred to as a shard
      • the largest tables are partitioned across multiple database servers [1]
        • when operating on a record, the application must determine which shard will contain the data and then send the SQL to the appropriate server [1]
      • partitioning is based on a Key Value
        • e.g. such as a user ID
    • proven technique for achieving data processing on a massive scale [1]
      • solution used at the largest websites 
        • e.g. Facebook, Twitter
        • usually associated with rapid growth
          •  ⇒ the approach needs to be dynamic [1]
      • the only way to scale a relational database to massive web use [1]
        • together with caching and replication [1]
    • {drawback} involves significant operational complexities and compromises [1]
      • the application must contain logic that understands the location of any particular piece of data and the logic to route requests to the correct shard [1]
      • requests that can only be satisfied by accessing more than one shard thus need complex coding as well, whereas on a nonsharded database a single SQL statement might suffice.
    • {drawback} high operational costs [1]
    • {drawback} application complexity
      • it’s up to the application code to route SQL requests to the correct shard [1]
        • ⇒ a dynamic routing layer must be implemented
          • ⇐ most massive websites are adding shards as they grow [1]
          • layer required to maintain Memcached object copies and to differentiate between the master database and read-only replicas [1]
    • {drawback} crippled SQL
      • [sharded database] it is not possible to issue a SQL statement that operates across shards [1]
        • ⇒ usually SQL statements are limited to row-level access [1]
        • ⇒ only programmers can query the database as a whole [1]
        • joins across shards cannot be implemented, nor can aggregate GROUP BY operations [1]
    • {drawback} loss of transactional integrity
      • ACID transactions against multiple shards are not possible and/or not practical [1]
        • ⇐ {exception} there are database systems that support 2PC
          • involves considerable troubleshooting as conflicts and bottlenecks can occur [1]
    • {drawback} operational complexity. 
      • load balancing across shards becomes extremely problematic
        • adding new shards requires a complex rebalancing of data [1]
        • changing the database schema requires a rolling operation across all the shards [1]
          • ⇒ can lead to transitory inconsistencies in the schema [1]
      • a sharded database entails a huge amount of operational effort and administrator skill [1]
    • {concept} CAP (Consistency, Availability, and Partition) theorem 
      • in a distributed database system, one can have at most only two of CAP tolerance [1]
      • consistency
        • every user of the database has an identical view of the data at any given instant [1]
      • availability
        • in the event of a failure, the database remains operational [1]
      • partition tolerance
        • the database can maintain operations in the event of the network’s failing between two segments of the distributed system [1]
    • {concept} partitioning
      • {def} core pattern of building scalable services by dividing state (data) and compute into smaller accessible units to improve scalability and performance [5]
        • ⇐ determines that a particular service partition is responsible for a portion of the complete state of the service.
          • a partition is a set of replicas)
      • {type} [stateless services] a logical unit that contains one or more instances of a service [5]
        • partitioning a stateless service is a very rare scenario
        • scalability and availability are normally achieved by adding more instances
        • {subtype} externally persisted state
          • persists its state externally [5]
            • e.g. databases in Azure SQL Database
        • {subtype} computation-only services
          • service that do not manage any persistent state e.g. calculator or image thumbnailing [5]
      • {type} scalable stateful services
        • partition state (data)
        • a partition of a stateful service as a scale unit that is highly reliable through replicas that are distributed and balanced across the nodes in a cluster
        • the state must be accessed and stored
          • ⇒ bound by
            • network bandwidth limits
            • system memory limits
            • disk storage limits
          • {scenario} run into resource constraints in a running cluster
            • {recommendation} scale out the cluster to accommodate the new requirements [4]
    • {concept}distributed systems platform used to build hyper-scalable, reliable and easily managed applications for the cloud [6]
      • ⇐ addresses the significant challenges in developing and managing cloud applications
      • places the partitions on different nodes [5]
        • allows partitions to grow to a node's resource limit
          • ⇐ partitions are rebalances across nodes [5]
        • {benefit} ensures the continued efficient use of hardware resources [5]
      • {default} makes sure that there is about the same number of primary and secondary replicas on each node
        • ⇒ nodes that hold replicas can serve more traffic and others that serve less traffic [5]
        • hot and cold spots may appear in a cluster
            • ⇐ it should be preferably avoided
          • {recommendation} partition the state so is evenly distributed across all partitions [5]
          • {recommendation} report load from each of the replicas for the service [5]
      • provides the capability to report load consumed by services [5]
        • e.g.  amount of memory, number of records
        • detects which partitions server higher loads than others [5]
          • ⇐ based on the metrics reported
        • rebalances the cluster by moving replicas to more suitable nodes, so that overall no node is overloaded  [5]
        • ⇐ it's not always possible to know how much data will be in a given partition
          • {recommendation} adopt a partitioning strategy that spreads the data evenly across the partitions [5]
            • {benefit} prevents situations described in the voting example [5]
          • {recommendation} report load
            • {benefit} helps smooth out temporary differences in access or load over time [5]
      • {recommendation} choose an optimal number of partitions to begin with
        • ⇐ there's nothing that prevents from starting out with a higher number of partitions than anticipated [5]
          • ⇐ assuming the maximum number of partitions is a valid approach [5]
        • ⇒ one may end up needing more partitions than initially considered [5]
          • ⇐ {constraint} the partition count can't be changed after the fact [5]
            •  ⇒ apply more advanced partition approaches
              • e.g. creating a new service instance of the same service type
              • e.g. implement client-side logic that routes the requests to the correct service instance

    References:
    [1] Guy Harrison (2015) Next Generation Databases: NoSQL, NewSQL, and Big Data
    [2] DAMA International (2017) "The DAMA Guide to the Data Management Body of Knowledge" 2nd Ed
    [3] Microsoft Fabric (2024) External data sharing in Microsoft Fabric [link]
    [4] Microsoft Fabric (2024) Data sharding policy [link]
    [5] Microsoft Fabric (2024) Partition Service Fabric reliable services [link]
    [6] MSDN (2015) Microsoft Azure - Azure Service Fabric and the Microservices Architecture [link]

    Acronyms:
    ACID - atomicity, consistency, isolation, durability
    2PC - Two Phase Commit
    CAP - Consistency, Availability, Partition

    08 February 2025

    🌌🏭KQL Reloaded: First Steps (Part X: Translating SQL to KQL - Correlated Subqueries)

    In SQL Server and other RDBMS databases there are many  scenarios in which one needs information from a fact table based on a dimension table without requiring information from the dimension table. 

    Correlated Subquery via EXISTS

    Before considering the main example, let's start with a simple subquery:

    // subquery in SQL
    --
    explain
    SELECT CustomerKey
    , ProductKey
    , ProductName
    FROM NewSales 
    WHERE ProductKey IN (
        SELECT DISTINCT ProductKey
        FROM Products 
        WHERE ProductSubcategoryName = 'MP4&MP3'
        ) 
    
    // subquery in KQL
    NewSales
    | where ProductKey in (
        (Products
        | where (ProductSubcategoryName == "MP4&MP3")
        | project ProductKey
        | distinct *))
    | project CustomerKey, ProductKey, ProductName
    

    Of course, the ProductKey is unique by design, though there can be dimension, fact tables or subqueries in which the value is not unique.

    Now let's consider the correlated subquery pattern, which should provide the same outcome as above, though in RDBMS there are scenarios in which it provides better performance, especially when the number of values from subquery is high.

    // correlated subquery in SQL
    --
    explain
    SELECT CustomerKey
    , ProductKey
    , ProductName
    FROM NewSales 
    WHERE EXISTS (
        SELECT Products.ProductKey
        FROM Products 
        WHERE NewSales.ProductKey = Products.ProductKey)
    

    Unfortunately, trying to translate the code via explain leads to the following error, which confirms that the syntax is not supported in KQL (see [1]):

    "Error: Reference to missing column 'NewSales.ProductKey'"

     Fortunately, in this case one can use the first version of the query. 

    Correlated Subquery via CROSS APPLY

    Before creating the main query, let's look at the inner query and check whether it gets correctly translate to KQL:

    // subquery logic
    --
    explain
    SELECT sum(TotalCost) TotalCost 
    FROM NewSales 
    WHERE DateKey > '20240101' and DateKey <'20240201'
    

    Now, let's bring the logic within the CROSS APPLY:

    // correlated subquery in SQL
    --
    explain
    SELECT ProductKey
    , ProductName
    , TotalCost
    FROM Products
        CROSS APPLY (
            SELECT sum(TotalCost) TotalCost 
            FROM NewSales 
            WHERE DateKey > '20240101' and DateKey <'20240201'
              AND Products.ProductKey = NewSales.ProductKey
        ) DAT
    

    Running the above code leads to the following error:

    "Sql node of type 'Microsoft.SqlServer.TransactSql.ScriptDom.UnqualifiedJoin' is not implemented"

    Unfortunately, many SQL queries are written following this pattern, especially when an OUTER CROSS APPLY is used, retrieving thus all the records from the dimension table. 

    In this case one can rewrite the query via a RIGHT JOIN:

    // correlated subquery in SQL
    --
    explain
    SELECT PRD.ProductKey
    , PRD.ProductName
    , SAL.TotalCost
    FROM Products PRD
        LEFT JOIN (
            SELECT ProductKey
            , sum(TotalCost) TotalCost 
            FROM NewSales 
            WHERE DateKey > '20240101' and DateKey <'20240201'
            GROUP BY ProductKey
        ) SAL
          ON PRD.ProductKey = SAL.ProductKey
    
    // direct translation of the query
    Products
    | join kind=leftouter 
        (NewSales
            | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
            | summarize TotalCost=sum(TotalCost) by ProductKey
            | project ProductKey, TotalCost
        ) on ($left.ProductKey == $right.ProductKey)
    | project ProductKey, ProductName, TotalCost
    //| where isnull(TotalCost)
    //| summarize record_number = count()
    
    
    // query after restructuring
    NewSales
    | where ((DateKey > todatetime("20240101")) and (DateKey < todatetime("20240201")))
    | summarize TotalCost=sum(TotalCost) by ProductKey
    | join kind=rightouter
        (
            Products
            | project ProductKey, ProductName
        ) on ($left.ProductKey == $right.ProductKey)
    | project ProductKey, ProductName, TotalCost
    //| where isnull(TotalCost)
    //| summarize record_number = count()
    

    During transformations it's important to check whether the number of records changes between the various versions of the query (including the most general version in which filtering constraints were applied).

    Especially when SQL solutions are planned to be migrated to KQL, it's important to know which query patterns can be used in KQL. 

    Happy coding!

    Previous Post <<||>>  Next Post

    References:
    [1] GitHib (2024) Module Ex-01 - Advanced KQL [link]

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.