10 November 2024

🏭🗒️Microsoft Fabric: Data Mesh [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 23-May-2024

[Microsoft Fabric] Data Mesh
  • {definition} a type of decentralized data architecture that organizes data based on different business domains [2]
    •   a centrally managed network of decentralized data products
  • {concept} landing zone
    • typically a subscription that needs to be governed by a common policy [7]
      • {downside} creating one landing zone for every project can lead to too many landing zones to manage
        • {alternative} landing zones based on a business domain [7] 
    •  resources must be managed efficiently in a way that each team is given access to only their resources [7]
      •   shared resources might be need with separate management and common access to all [7]
    • need to be linked together into a mesh
      • via peer-to-peer networks
  • {concept} connectivity hub
  • {feature} resource group
    • {definition} a container that holds related resources for an Azure solution 
    • can be associated with a data product
      • when the data product becomes obsolete, the resource group can be deleted [7]
  • {feature} subscription
    • {definition} a logical unit of Azure services that are linked to an Azure account
    • can be associated as a landing zone governed by a policy [7]
  • {feature} tenant (aka Microsoft Fabric tenantMF tenant)
    • a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
    • can contain any number of workspaces
  • {feature} workspaces
    • {definition} a collection of items that brings together different functionality in a single environment designed for collaboration
    • associated with a domain [3]
  • {feature} domains
    • {definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
    • some tenant-level settings for managing and governing data can be delegated to the domain level [2]
  • {feature} subdomains
    • a way for fine tuning the logical grouping data under a domain [1]
    • subdivisions of a domain
  • {concept} deployment template

References
[1] Microsoft Learn: Fabric (2023) Fabric domains (link)
[2] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam (link
[3] Data mesh: A perspective on using Azure Synapse Analytics to build data products, by Amanjeet Singh (link)
[4] Zhamak Dehghani (2021) Data Mesh: Delivering Data-Driven Value at Scale
[5] Marthe Mengen (2024) How do you set up your Data Mesh in Microsoft Fabric? (link)
[6] Administering Microsoft Fabric - Considering Data Products vs Domains vs Workspaces, by Paul Andrew (link)
[7] Aniruddha Deswandikar (2024) Engineering Data Mesh in Azure Cloud

🏭🗒️Microsoft Fabric: Data Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 11-Mar-2024

Warehouse vs SQL analytics endpoint in Microsoft Fabric
Warehouse vs SQL analytics endpoint in Microsoft Fabric [3]

[Microsoft Fabric] Data Warehouse

  • highly available relational data warehouse that can be used to store and query data in the Lakehouse
    • supports the full transactional T-SQL capabilities 
    • modernized version of the traditional data warehouse
  • unifies capabilities from Synapse Dedicated and Serverless SQL Pools
  • modernized with key improvements
  • resources are managed elastically to provide the best possible performance
    • ⇒ no need to think about indexing or distribution
    • a new parser gives enhanced CSV file ingestion time
    • metadata is now cached in addition to data
    • improved assignment of compute resources to milliseconds
    • multi-TB result sets are streamed to the client
  • leverages a distributed query processing engine
    • provides with workloads that have a natural isolation boundary [3]
      • true isolation is achieved by separating workloads with different characteristics, ensuring that ETL jobs never interfere with their ad hoc analytics and reporting workloads [3]
  • {operation} data ingestion
    • involves moving data from source systems into the data warehouse [2]
      • the data becomes available for analysis [1]
    • via Pipelines, Dataflows, cross-database querying, COPY INTO command
    • no need to copy data from the lakehouse to the data warehouse [1]
      • one can query data in the lakehouse directly from the data warehouse using cross-database querying [1]
  • {operation} data storage
    • involves storing the data in a format that is optimized for analytics [2]
  • {operation} data processing
    • involves transforming the data into a format that is ready for consumption by analytical tools [1]
  • {operation} data analysis and delivery
    • involves analyzing the data to gain insights and delivering those insights to the business [1]
  • {operation} designing a warehouse (aka warehouse design)
    • standard warehouse design can be used
  • {operation} sharing a warehouse (aka warehouse sharing)
    • a way to provide users read access to the warehouse for downstream consumption
      • via SQL, Spark, or Power BI
    • the level of permissions can be customized to provide the appropriate level of access
  • {feature} mirroring 
    • provides a modern way of accessing and ingesting data continuously and seamlessly from any database or data warehouse into the Data Warehousing experience in Fabric
      • any database can be accessed and managed centrally from within Fabric without having to switch database clients
      • data is replicated in a reliable way in real-time and lands as Delta tables for consumption in any Fabric workload
  • {concept}SQL analytics endpoint 
    • a warehouse that is automatically generated from a Lakehouse in Microsoft Fabric [3]
  • {concept}virtual warehouse
    • can containing data from virtually any source by using shortcuts [3]
  • {concept} cross database querying 
    • enables to quickly and seamlessly leverage multiple data sources for fast insights and with zero data duplication [3]
References:
[1] Microsoft Learn: Fabric (2023) Get started with data warehouses in Microsoft Fabric (link
[2] Microsoft Learn: Fabric (2023) Microsoft Fabric decision guide: choose a data store (link)
[3] Microsoft Learn: Fabric (2024) What is data warehousing in Microsoft Fabric? (link)
[4] Microsoft Learn: Fabric (2023) Better together: the lakehouse and warehouse (link)

Resources:
[1] Microsoft Learn: Fabric (2023) Data warehousing documentation in Microsoft Fabric (link)


Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.