12 March 2024

🏭🗒️Microsoft Fabric: OneLake (Notes)

Disclaimer: This is work in progress intended to consolidate information from various sources. 
Last updated: 12-Mar-2024

Microsoft Fabric & OneLake
Microsoft Fabric & OneLake

OneLake

  • a single, unified, logical data lake for the whole organization [2]
    • designed to be the single place for all an organization's analytics data [2]
    • provides a single, integrated environment for data professionals and the business to collaborate on data projects [1]
    • stores all data in a single open format [1]
    • its data is governed by default
    • combines storage locations across different regions and clouds into a single logical lake, without moving or duplicating data
      • similar to how Office applications are prewired to use OneDrive
      • saves time by eliminating the need to move and copy data 
  • comes automatically with every Microsoft Fabric tenant [2]
    • automatically provisions with no extra resources to set up or manage [2]
    • used as native store without needing any extra configuration [1
  • accessible by all analytics engines in the platform [1]
    • all the compute workloads in Fabric are preconfigured to work with OneLake
      • compute engines have their own security models (aka compute-specific security) 
        • always enforced when accessing data using that engine [3]
        • the conditions may not apply to users in certain Fabric roles when they access OneLake directly [3]
  • built on top of ADLS  [1]
    • supports the same ADLS Gen2 APIs and SDKs to be compatible with existing ADLS Gen2 applications [2]
    • inherits its hierarchical structure
    • provides a single-pane-of-glass file-system namespace that spans across users, regions and even clouds
  • data can be stored in any format
    • incl. Delta, Parquet, CSV, JSON
    • data can be addressed in OneLake as if it's one big ADLS storage account for the entire organization [2]
  • uses a layered security model built around the organizational structure of experiences within MF [3]
    • derived from Microsoft Entra authentication [3]
    • compatible with user identities, service principals, and managed identities [3]
    • using Microsoft Entra ID and Fabric components, one can build out robust security mechanisms across OneLake, ensuring that you keep your data safe while also reducing copies and minimizing complexity [3]
  • hierarchical in nature 
    • {benefit} simplifies management across the organization
    • its data is divided into manageable containers for easy handling
    • can have one or more capacities associated with it
      • different items consume different capacity at a certain time
      • offered through Fabric SKU and Trials
  • {component} OneCopy
    • allows to read data from a single copy, without moving or duplicating data [1]
  • {concept} Fabric tenant
    • a dedicated space for organizations to create, store, and manage Fabric items.
      • there's often a single instance of Fabric for an organization, and it's aligned with Microsoft Entra ID [1]
        • ⇒ one OneLake per tenant
      • maps to the root of OneLake and is at the top level of the hierarchy [1]
    • can contain any number of workspaces [2]
  • {concept} capacity
    • a dedicated set of resources that is available at a given time to be used [1]
    • defines the ability of a resource to perform an activity or to produce output [1]
  • {concept} domain
    • a way of logically grouping together workspaces in an organization that is relevant to a particular area or field [1]
    • can have multiple [subdomains]
      • {concept} subdomain
        • a way for fine tuning the logical grouping of the data
  • {concept} workspace 
    • a collection of Fabric items that brings together different functionality in a single tenant [1]
      • different data items appear as folders within those containers [2]
      • always lives directly under the OneLake namespace [4]
      • {concept} data item
        • a subtype of item that allows data to be stored within it using OneLake [4]
        • all Fabric data items store their data automatically in OneLake in Delta Parquet format [2]
      • {concept} Fabric item
        • a set of capabilities bundled together into a single component [4] 
        • can have permissions configured separately from the workspace roles [3]
        • permissions can be set by sharing an item or by managing the permissions of an item [3]
    • acts as a container that leverages capacity for the work that is executed [1]
      • provides controls for who can access the items in it [1]
        • security can be managed through Fabric workspace roles
      • enable different parts of the organization to distribute ownership and access policies [2]
      • part of a capacity that is tied to a specific region and is billed separately [2]
      • the primary security boundary for data within OneLake [3]
    • represents a single domain or project area where teams can collaborate on data [3]
  • [encryption] encrypted at rest by default using Microsoft-managed key [3]
    • the keys are rotated appropriately per compliance requirements [3]
    • data is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and it is FIPS 140-2 compliant [3]
    • {limitation} encryption at rest using customer-managed key is currently not supported [3]
  • {general guidance} write access
    • users must be part of a workspace role that grants write access [4] 
    • rule applies to all data items, so scope workspaces to a single team of data engineers [4] 
  • {general guidance}Lake access: 
    • users must be part of the Admin, Member, or Contributor workspace roles, or share the item with ReadAll access [4] 
  • {general guidance} general data access 
    • any user with Viewer permissions can access data through the warehouses, semantic models, or the SQL analytics endpoint for the Lakehouse [4] 
  • {general guidance} object level security:
    • give users access to a warehouse or lakehouse SQL analytics endpoint through the Viewer role and use SQL DENY statements to restrict access to certain tables [4]
  • {feature|preview} trusted workspace access
    • allows to securely access firewall-enabled Storage accounts by creating OneLake shortcuts to Storage accounts, and then use the shortcuts in the Fabric items [5]
    • based on [workspace identity]
    • {benefit} provides secure seamless access to firewall-enabled Storage accounts from OneLake shortcuts in Fabric workspaces, without the need to open the Storage account to public access [5]
    • {limitation} available for workspaces in Fabric capacities F64 or higher
  • {concept} workspace identity
    • a unique identity that can be associated with workspaces that are in Fabric capacities
    • enables OneLake shortcuts in Fabric to access Storage accounts that have [resource instance rules] configured
    • {operation} creating a workspace identity
      • Fabric creates a service principal in Microsoft Entra ID to represent the identity [5]
  • {concept} resource instance rules
    • a way to grant access to specific resources based on the workspace identity or managed identity [5] 
    • {operation} create resource instance rules 
      • created by deploying an ARM template with the resource instance rule details [5]
Acronyms:
ADLS - Azure Data Lake Storage
AES - Advanced Encryption Standard 
ARM - Azure Resource Manager
FIPS - Federal Information Processing Standard
SKU - Stock Keeping Units

References:
[1] Microsoft Learn (2023) Administer Microsoft Fabric (link)
[2] Microsoft Learn (2023) OneLake, the OneDrive for data (link)
[3] Microsoft Learn (2023) OneLake security (link)
[4] Microsoft Learn (2023) Get started securing your data in OneLake (link}
[5] Microsoft Fabric Updates Blog (2024) Introducing Trusted Workspace Access for OneLake Shortcuts, by Meenal Srivastva (link)

Resources:
[1] 


No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.