Pages

10 November 2024

🏭🗒️Microsoft Fabric: Data Mesh [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 23-May-2024

[Microsoft Fabric] Data Mesh
  • {definition} a type of decentralized data architecture that organizes data based on different business domains [2]
    •   a centrally managed network of decentralized data products
  • {concept} landing zone
    • typically a subscription that needs to be governed by a common policy [7]
      • {downside} creating one landing zone for every project can lead to too many landing zones to manage
        • {alternative} landing zones based on a business domain [7] 
    •  resources must be managed efficiently in a way that each team is given access to only their resources [7]
      •   shared resources might be need with separate management and common access to all [7]
    • need to be linked together into a mesh
      • via peer-to-peer networks
  • {concept} connectivity hub
  • {feature} resource group
    • {definition} a container that holds related resources for an Azure solution 
    • can be associated with a data product
      • when the data product becomes obsolete, the resource group can be deleted [7]
  • {feature} subscription
    • {definition} a logical unit of Azure services that are linked to an Azure account
    • can be associated as a landing zone governed by a policy [7]
  • {feature} tenant (aka Microsoft Fabric tenantMF tenant)
    • a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
    • can contain any number of workspaces
  • {feature} workspaces
    • {definition} a collection of items that brings together different functionality in a single environment designed for collaboration
    • associated with a domain [3]
  • {feature} domains
    • {definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
    • some tenant-level settings for managing and governing data can be delegated to the domain level [2]
  • {feature} subdomains
    • a way for fine tuning the logical grouping data under a domain [1]
    • subdivisions of a domain
  • {concept} deployment template

References
[1] Microsoft Learn: Fabric (2023) Fabric domains (link)
[2] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam (link
[3] Data mesh: A perspective on using Azure Synapse Analytics to build data products, by Amanjeet Singh (link)
[4] Zhamak Dehghani (2021) Data Mesh: Delivering Data-Driven Value at Scale
[5] Marthe Mengen (2024) How do you set up your Data Mesh in Microsoft Fabric? (link)
[6] Administering Microsoft Fabric - Considering Data Products vs Domains vs Workspaces, by Paul Andrew (link)
[7] Aniruddha Deswandikar (2024) Engineering Data Mesh in Azure Cloud

Resources:
[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

🏭🗒️Microsoft Fabric: Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 11-Mar-2024

Warehouse vs SQL analytics endpoint in Microsoft Fabric
Warehouse vs SQL analytics endpoint in Microsoft Fabric [3]

[Microsoft Fabric] Warehouse

  • {def} highly available relational data warehouse that can be used to store and query data in the Lakehouse
    • supports the full transactional T-SQL capabilities 
    • modernized version of the traditional data warehouse
    • unifies capabilities from Synapse Dedicated and Serverless SQL Pools
  • resources are managed elastically to provide the best possible performance
    • ⇒ no need to think about indexing or distribution
    • a new parser gives enhanced CSV file ingestion time
    • metadata is cached in addition to data
    • improved assignment of compute resources to milliseconds
    • multi-TB result sets are streamed to the client
  • leverages a distributed query processing engine
    • provides with workloads that have a natural isolation boundary [3]
      • true isolation is achieved by separating workloads with different characteristics, ensuring that ETL jobs never interfere with their ad hoc analytics and reporting workloads [3]
  • {operation} data ingestion
    • involves moving data from source systems into the data warehouse [2]
      • the data becomes available for analysis [1]
    • via Pipelines, Dataflows, cross-database querying, COPY INTO command
    • no need to copy data from the lakehouse to the data warehouse [1]
      • one can query data in the lakehouse directly from the data warehouse using cross-database querying [1]
  • {operation} data storage
    • involves storing the data in a format that is optimized for analytics [2]
  • {operation} data processing
    • involves transforming the data into a format that is ready for consumption by analytical tools [1]
  • {operation} data analysis and delivery
    • involves analyzing the data to gain insights and delivering those insights to the business [1]
  • {operation} designing a warehouse (aka warehouse design)
    • standard warehouse design can be used
  • {operation} sharing a warehouse (aka warehouse sharing)
    • a way to provide users read access to the warehouse for downstream consumption
      • via SQL, Spark, or Power BI
    • the level of permissions can be customized to provide the appropriate level of access
  • {feature} mirroring 
    • provides a modern way of accessing and ingesting data continuously and seamlessly from any database or data warehouse into the Data Warehousing experience in Fabric
      • any database can be accessed and managed centrally from within Fabric without having to switch database clients
      • data is replicated in a reliable way in real-time and lands as Delta tables for consumption in any Fabric workload
  • {feature} v-order
    • write time optimization to the parquet file format that enables lightning-fast reads under the MF compute engine [5]
  • {feature} caching
    • stores frequently accessed data and metadata in a faster storage layer [6]
  • {feature} snapshots
    • {def} read-only representation of a warehouse at a specific point in time [7]
  • {feature} automatic purging
    • routinely and systematically eliminating expired data periodically [8] 
    • {benefit} proactively helps maintain an efficient and cost-effective data infrastructure [8]
      • via garbage collection
        • background process, that periodically identifies and cleans [8] 
          • all the data and log files of dropped tables
          • aborted transactions
          • temporary tables
          • expired files
        • executes every 24 hours, when the warehouse is active [8] 
        • ensures the data warehouse remains optimized and efficient [8]
    • {goal} storage cost optimization
    • {goal} minimize maintenance overhead
    • {goal} adhering to data retention regulations
  • {concept} SQL analytics endpoint 
    • a warehouse that is automatically generated from a Lakehouse in Microsoft Fabric [3]
  • {concept} virtual warehouse
    • can containing data from virtually any source by using shortcuts [3]
  • {concept} cross database querying 
    • enables to quickly and seamlessly leverage multiple data sources for fast insights and with zero data duplication [3]

    References:
    [1] Microsoft Learn (2023) Fabric: Get started with data warehouses in Microsoft Fabric (link
    [2] Microsoft Learn (2023) Fabric: Microsoft Fabric decision guide: choose a data store (link)
    [3] Microsoft Learn (2024) Fabric: What is data warehousing in Microsoft Fabric? (link)
    [4] Microsoft Learn (2023) Fabric: Better together: the lakehouse and warehouse (link)
    [5] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
    [6] Microsoft Learn (2024) Fabric: Caching in Fabric data warehousing [link]
    [7] Microsoft Learn (2024) Fabric: Warehouse Snapshots in Microsoft Fabric (Preview) [link]
    [8] Microsoft Fabric Updates Blog (2025) Intelligent Data Cleanup: Smart Purging for Smarter Data Warehouses [link

    Resources:
    [R1] Microsoft Learn (2023) Fabric: Data warehousing documentation in Microsoft Fabric (link)
    [R2] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
    [R3] Microsoft Learn (2025) Fabric: Share your data and manage permissions [link]

    Acronyms:
    ETL - Extract, Transfer, Load
    MF - Microsoft Fabric