SQL Troubles: 🏭🗒️Microsoft Fabric: Sharding [Notes]

09 February 2025

🏭🗒️Microsoft Fabric: Sharding [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Feb-2024

[Microsoft Fabric] Data Partitioning (aka Sharding)

{definition} "a process where small chunks of the database are isolated and can be updated independently of other shards" [2]
allows a logical database to be partitioned across multiple physical servers [1]

each partition is referred to as a shard
the largest tables are partitioned across multiple database servers [1]

when operating on a record, the application must determine which shard will contain the data and then send the SQL to the appropriate server [1]

partitioning is based on a Key Value

e.g. such as a user ID

proven technique for achieving data processing on a massive scale [1]

solution used at the largest websites

e.g. Facebook, Twitter
usually associated with rapid growth

⇒ the approach needs to be dynamic [1]

the only way to scale a relational database to massive web use [1]

together with caching and replication [1]

{drawback} involves significant operational complexities and compromises [1]

the application must contain logic that understands the location of any particular piece of data and the logic to route requests to the correct shard [1]
requests that can only be satisfied by accessing more than one shard thus need complex coding as well, whereas on a nonsharded database a single SQL statement might suffice.

{drawback} high operational costs [1]
{drawback} application complexity

it’s up to the application code to route SQL requests to the correct shard [1]

⇒ a dynamic routing layer must be implemented

⇐ most massive websites are adding shards as they grow [1]
layer required to maintain Memcached object copies and to differentiate between the master database and read-only replicas [1]

{drawback} crippled SQL

[sharded database] it is not possible to issue a SQL statement that operates across shards [1]

⇒ usually SQL statements are limited to row-level access [1]
⇒ only programmers can query the database as a whole [1]
joins across shards cannot be implemented, nor can aggregate GROUP BY operations [1]

{drawback} loss of transactional integrity

ACID transactions against multiple shards are not possible and/or not practical [1]

⇐ {exception} there are database systems that support 2PC

involves considerable troubleshooting as conflicts and bottlenecks can occur [1]

{drawback} operational complexity.

load balancing across shards becomes extremely problematic

adding new shards requires a complex rebalancing of data [1]
changing the database schema requires a rolling operation across all the shards [1]

⇒ can lead to transitory inconsistencies in the schema [1]

a sharded database entails a huge amount of operational effort and administrator skill [1]

{concept} CAP (Consistency, Availability, and Partition) theorem

in a distributed database system, one can have at most only two of CAP tolerance [1]
consistency

every user of the database has an identical view of the data at any given instant [1]

availability

in the event of a failure, the database remains operational [1]

partition tolerance

the database can maintain operations in the event of the network’s failing between two segments of the distributed system [1]

{concept} partitioning

{def} core pattern of building scalable services by dividing state (data) and compute into smaller accessible units to improve scalability and performance [5]

⇐ determines that a particular service partition is responsible for a portion of the complete state of the service.

a partition is a set of replicas)

{type} [stateless services] a logical unit that contains one or more instances of a service [5]

partitioning a stateless service is a very rare scenario
scalability and availability are normally achieved by adding more instances
{subtype} externally persisted state

persists its state externally [5]

e.g. databases in Azure SQL Database

{subtype} computation-only services

service that do not manage any persistent state e.g. calculator or image thumbnailing [5]

{type} scalable stateful services

partition state (data)
a partition of a stateful service as a scale unit that is highly reliable through replicas that are distributed and balanced across the nodes in a cluster
the state must be accessed and stored

⇒ bound by

network bandwidth limits
system memory limits
disk storage limits

{scenario} run into resource constraints in a running cluster

{recommendation} scale out the cluster to accommodate the new requirements [4]

{concept}distributed systems platform used to build hyper-scalable, reliable and easily managed applications for the cloud [6]

⇐ addresses the significant challenges in developing and managing cloud applications
places the partitions on different nodes [5]

allows partitions to grow to a node's resource limit

⇐ partitions are rebalances across nodes [5]

{benefit} ensures the continued efficient use of hardware resources [5]

{default} makes sure that there is about the same number of primary and secondary replicas on each node

⇒ nodes that hold replicas can serve more traffic and others that serve less traffic [5]
hot and cold spots may appear in a cluster

⇐ it should be preferably avoided

{recommendation} partition the state so is evenly distributed across all partitions [5]
{recommendation} report load from each of the replicas for the service [5]

provides the capability to report load consumed by services [5]

e.g. amount of memory, number of records
detects which partitions server higher loads than others [5]

⇐ based on the metrics reported

rebalances the cluster by moving replicas to more suitable nodes, so that overall no node is overloaded [5]
⇐ it's not always possible to know how much data will be in a given partition

{recommendation} adopt a partitioning strategy that spreads the data evenly across the partitions [5]

{benefit} prevents situations described in the voting example [5]

{recommendation} report load

{benefit} helps smooth out temporary differences in access or load over time [5]

{recommendation} choose an optimal number of partitions to begin with

⇐ there's nothing that prevents from starting out with a higher number of partitions than anticipated [5]

⇐ assuming the maximum number of partitions is a valid approach [5]

⇒ one may end up needing more partitions than initially considered [5]

⇐ {constraint} the partition count can't be changed after the fact [5]

⇒ apply more advanced partition approaches

e.g. creating a new service instance of the same service type
e.g. implement client-side logic that routes the requests to the correct service instance

Previous Post <<||>> Next Post

References:

[1] Guy Harrison (2015) Next Generation Databases: NoSQL, NewSQL, and Big Data

[2] DAMA International (2017) "The DAMA Guide to the Data Management Body of Knowledge" 2nd Ed

[3] Microsoft Fabric (2024) External data sharing in Microsoft Fabric [link]
[4] Microsoft Fabric (2024) Data sharding policy [link]
[5] Microsoft Fabric (2024) Partition Service Fabric reliable services [link]
[6] MSDN (2015) Microsoft Azure - Azure Service Fabric and the Microservices Architecture [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

ACID - atomicity, consistency, isolation, durability

2PC - Two Phase Commit

CAP - Consistency, Availability, Partition

SQL Troubles

Pages

09 February 2025

🏭🗒️Microsoft Fabric: Sharding [Notes]

No comments:

About Me