13 February 2025

🧊💠🗒️Data Warehousing: Table Partitioning in SQL Server [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes.

Last updated: 13-Feb-2025

[Data Warehousing] Table Partitioning

  • {defthe spreading of data across multiple tables based on a set of rules to balance large amounts of data across disks or nodes
    • data is distributed based on a function that defines a range of values for each partition [2] 
      • the table is partitioned by applying the partition scheme to the values in a specified column [2]
  • {operation} partition creation
    • [large partitioned table]
      • should be created two auxiliary nonindexed empty tables with the same structure, including constraints and data compression options [4]
        • first table:  create a check constraint that guarantees that all data from the table fits exactly with one empty partition of the fact table
          • the constraint must be created on the partitioning column [4]
          • a columnstore index can be created on the fact table, as long as it is aligned with the table [4]
          • after truncation of <table 2> the <table 1> is prepared to accept the next partition from your fact table for the next minimally logged deletion [4]
        • second table: for minimally logged deletions of large portions of data, a partition from the fact table can be switched to the empty table version without the check constraint [4]
          • then the table can be truncated
      • for minimally logged inserts, new data to the second auxiliary table should be bulk inserted in the auxiliary that has the check constraint [4]
        • INSERT operation can be minimally logged because the table is empty [4]
        • create a columnstore index on this auxiliary table, using the same structure as the columnstore index on your fact table [4]
        • switch data from this auxiliary table to a partition of your fact table [4]
        • drop the columnstore index on the auxiliary table, and change the check constraint to guarantee that all of the data for the next load can be switched to the next empty partition of the fact table [4]
        • the second auxiliary table is prepared for new bulk loads again [4]
  • {operation} [Query Optimizer] partition elimination 
    • process in which SQL Server accesses only those partitions needed to satisfy query filters [4]
  • {operation} partition switching 
    • {definition} process that switches a block of data from one table or partition to another table or partition [4]
    • types of switches
      • reassign all data from a nonpartitioned table to an empty existing partition of a partitioned table [4]
      • switch a partition of one partitioned table to a partition of another partitioned table [4]
      • reassign all data from a partition of a partitioned table to an existing empty nonpartitioned table [4]
  • {benefit} improves query performance [1]
    • by partitioning a table across filegroups [1]
      • specific ranges of data can be placed on different disk spindles [1]
        • can improve I/O performance [1]
          • ⇐ the disk storage is already configured as a RAID 10 or RAID 5 array [1]
            • ⇒ this usually has little benefit [1]
      • using a mix of fast solid state storage for recent, frequently accessed data, and mechanical disks for older, less queried rows [1]
        • use partitioning to balance disk performance against storage costs [1]
      • biggest performance gain from partitioning in a data warehouse is realized when queries return a range of rows that are filtered on the partitioning key [1]
        • the query optimizer can eliminate partitions that are not within the filter range [1]
          • dramatically reduce the number of rows that need to be read [1]
    • reduces contention [3]
      • can reduce the number of rows included in a table scan [3]
  • {benefit} more granular manageability [1]
    • some maintenance operations can be performed at partition level instead of on the whole table [1]
      • e.g. indexes can be created and rebuilt on a per-partition basis [1]
      • e.g. compression can be applied to individual partitions [1]
      • e.g. by mapping partitions to filegroups, partitions can be backed up and restored independently [1]
        • enables to back up older data once and then configure the backed up partitions as read-only [1]
        • future backups can be limited to the partitions that contain new or updated data [1]
  • {benefit} improved data load performance
    • enables loading many rows very quickly by switching a staging table with a partition
      • can dramatically reduce the time taken by ETL data loads [1]
        • with the right planning, it can be achieved with minimal requirements to drop or rebuild indexes [1]
  • {best practice} partition large fact tables
    • tables of around 50 GB or more
    • ⇐ in general, fact tables benefit from partitioning more than dimension tables [1]
  • {best practice} partition on an incrementing date key [1]
    • assures that the most recent data are in the last partition and the earliest data are in the first partition [1]
  • {best practice} design the partition scheme for ETL and manageability [1]
    • the query performance gains realized by partitioning are small compared to the manageability and data load performance benefits [1]
      • ideally partitions should reflect the ETL load frequency
        • because this simplifies the load process [1]
        • merge partitions periodically to reduce the overall number of partitions (for example, at the start of each year [1]
      • could merge the monthly partitions for the previous year into a single partition for the whole year [1]
  • {best practice} maintain an empty partition at the start and end of the table [1]
    • simplifies the loading of new rows [1]
    • when new records need to be added, split the empty partition 
      • ⇐ to create two empty partitions)
    • then switch the staged data with the first empty partition [1]
      • ⇐ loads the data into the table and leaves the second empty partition you created at the end of the table, ready for the next load [1]
    • a similar technique can be used to archive or delete obsolete data at the beginning of the table [1]
  • {best practice} chose the proper granularity
    • it should be aligned to the business requirements [2]
  • {best practice} create at least one filegroup in addition to the primary one
    • set it as the default filegroup
      • data tables are thus separated from system tables [2]
    • creating dedicated filegroups for extremely large fact tables [2]
      • place the fact tables on their own logical disks [2]
    • use a file and a filegroup separate from the fact and dimension tables [2]
      • {exception} staging tables that will be switched with partitions to perform fast loads [2]
        • staging tables must be created on the same filegroup as the partition with which they will be switched [2]
  • {def} partition scheme 
    • a scheme that maps partitions to filegroups
  • {def} partition function 
    • object that maps rows to partitions by using values from specific columns (aka  partitioning columns)
    • performs logical mapping
  • {def} aligned index 
    • index built on the same partition scheme as its base table [4]
      • if all indexes are aligned with their base table, switching a partition is a metadata operation only [4]
        • ⇒ it’s very fast [4]
Previous Post <<||>> Next Post

References:
[1] 20467A - Designing Business Intelligence Solutions with Microsoft SQL Server 2012
[2] 20463C - Implementing a Data Warehouse with Microsoft SQL Server
[3] 10777A - Implementing a Data Warehouse with Microsoft SQL Server 2012
[4] Dejan Sarka et al (2012) Exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012 (Training Kit)
[5] Microsoft Learn (2009) How to Implement an Automatic Sliding Window in a Partitioned Table on SQL Server 2005 [link]

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.