Showing posts with label workspaces. Show all posts
Showing posts with label workspaces. Show all posts

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 22-Jan-2025

[Microsoft Fabric] Folders

  • {def} organizational units inside a workspace that enable users to efficiently organize and manage artifacts in the workspace [1]
  • identifiable by its name
    • {constraint} must be unique in a folder or at the root level of the workspace
    • {constraint} can’t include certain special characters [1]
      • C0 and C1 control codes [1]
      • leading or trailing spaces [1]
      • characters: ~"#.&*:<>?/{|} [1]
    • {constraint} can’t have system-reserved names
      • e.g. $recycle.bin, recycled, recycler.
    • {constraint} its length can't exceed 255 characters
  • {operation} create folder
    • can be created in
      • an existing folder (aka nested subfolder) [1]
        • {restriction} a maximum of 10 levels of nested subfolders can be created [1]
        • up to 10 folders can be created in the root folder [1]
        • {benefit} provide a hierarchical structure for organizing and managing items [1]
      • the root
  • {operation} move folder
  • {operation} rename folder
    • same rules applies as for folders’ creation [1]
  • {operation} delete folder
    • {restriction} currently can be deleted only empty folders [1]
      • {recommendation} make sure the folder is empty [1]
  •  {operation} create item in in folder
    • {restriction} certain items can’t be created in a folder
      • dataflows gen2
      • streaming semantic models
      • streaming dataflows
    • ⇐ items created from the home page or the Create hub, are created at the root level of the workspace [1]
  • {operation} move file(s) between folders [1]
  • {operation} publish to folder [1]
    •   Power BI reports can be published to specific folders
      • {restriction} folders' name must be unique throughout an entire workspace, regardless of their location [1]
        • when publishing a report to a workspace that has another report with the same name in a different folder, the report will publish to the location of the already existing report [1]
  • {limitation}may not be supported by certain features
    •   e.g. Git
  • {recommendation} use folders to organize workspaces [1]
  • {permissions}
    • inherit the permissions of the workspace where they're located [1] [2]
    • workspace admins, members, and contributors can create, modify, and delete folders in the workspace [1]
    • viewers can only view folder hierarchy and navigate in the workspace [1]
  • [deployment pipelines] deploying items in folders to a different stage, the folder hierarchy is automatically applied [2]

Previous Post  <<||>>  Next Post

References:
[1] Microsoft Fabric (2024) Create folders in workspaces [link]
[2] Microsoft Fabric (2024) The deployment pipelines process [link]
[3] Microsoft Fabric Updates Blog (2025) Define security on folders within a shortcut using OneLake data access roles [link]
[4] Microsoft Fabric Updates Blog (2025) Announcing the General Availability of Folder in Workspace [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Folder in Workspace in Public Preview [link]
[6] Microsoft Fabric Updates Blog (2025) Getting the size of OneLake data items or folders [link]

10 March 2024

🏭📑Microsoft Fabric: Medallion Architecture [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 10-Mar-2024

Medallion Architecture in Microsoft Fabric [1]


Medallion architecture
  • a recommended data design pattern used to organize data in a lakehouse logically [2]
    • compatible with the concept of data mesh
  • {goal} incrementally and progressively improve the structure and quality of data as it progresses through each stage [1]
    • brings structure and efficiency to a lakehouse environment [2]
    • ensures that data is reliable and consistent as it goes through various checks and changes [2]
    •  complements other data organization methods, rather than replacing them [2]
  • consists of three distinct layers (or zones)
    • {layer} bronze (aka raw zone
      • stores source data in its original format [1]
      • the data in this layer is typically append-only and immutable [1]
      • {recommendation} store the data in its original format, or use Parquet or Delta Lake [1]
      • {recommendation} create a shortcut in the bronze zone instead of copying the data across [1]
        • works with OneLake, ADLS Gen2, Amazon S3, Google
      • {operation} ingest data
        • {characteristic} maintains the raw state of the data source [3]
        • {characteristic} is appended incrementally and grows over time [3]
        • {characteristic} can be any combination of streaming and batch transactions [3]
        • ⇒ retaining the full, unprocessed history
          • ⇒ provides the ability to recreate any state of a given data system [3]
        • additional metadata may be added to data on ingest
            • e.g. source file names, recording the time data was processed
          • {goal} enhanced discoverability [3]
          • {goal} description of the state of the source dataset [3]
          • {goal} optimized performance in downstream applications [3]
    • {layer} silver (aka enriched zone
      • stores data sourced from the bronze layer
      • the raw data has been 
        • cleansed
        • standardized
        • structured as tables (rows and columns)
        • integrated with other data to provide an enterprise view of all business entities
      • {recommendation} use Delta tables 
        • provide extra capabilities and performance enhancements [1]
          • {default} every engine in Fabric writes data in the delta format and use V-Order write-time optimization to the Parquet file format [1]
      • {operation} validate and deduplicate data
      • for any data pipeline, the silver layer may contain more than one table [3]
    • {layer} gold (aka curated zone)
      • stores data sourced from the silver layer [1]
      • the data is refined to meet specific downstream business and analytics requirements [1]
      • tables typically conform to star schema design
        • supports the development of data models that are optimized for performance and usability [1]
      • use lakehouses (one for each zone), a data warehouse, or combination of both
        • the decision should be based on team's preference and expertise of your team. 
        • different analytic engines can be used [1]
    • ⇐ schemas and tables within each layer can take on a variety of forms and degrees of normalization [3]
      • depends on the frequency and nature of data updates and the downstream use cases for the data [3]
  • {pattern} create each zone as a lakehouse
    • business users access data by using the SQL analytics endpoint [1]
  • {pattern} create the bronze and silver zones as lakehouses, and the gold zone as data warehouse
    • business users access data by using the data warehouse endpoint [1]
  • {pattern} create all lakehouses in a single Fabric workspace
    • {recommendation} create each lakehouse in its own workspace [1]
    • provides more control and better governance at the zone level [1]
  • {concept} data transformation 
    • involves altering the structure or content of data to meet specific requirements [2] 
      • via Dataflows (Gen2), notebooks
  • {concept} data orchestration 
    • refers to the coordination and management of multiple data-related processes, ensuring they work together to achieve a desired outcome [2]
      • via data pipelines

Previous Post  <<||>>  Next Post

References:
[1] Microsoft Learn: Fabric (2023) Implement medallion lakehouse architecture in Microsoft Fabric (link)
[2] Microsoft Learn: Fabric (2023) Organize a Fabric lakehouse using medallion architecture design (link)
[3] Microsoft Learn: Azure (2023) What is the medallion lakehouse architecture? (link)

Resources:
[R1] Serverless.SQL (2023) Data Loading Options With Fabric Workspaces, by Andy Cutler (link)
[R2] Microsoft Learn: Fabric (2023) Lakehouse end-to-end scenario: overview and architecture (link)

Acronyms:
ADLS - Azure Data Lake Store Gen2

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.