Disclaimer: This is work in progress intended to consolidate information from various sources.
Last updated: 12-Mar-2024
Microsoft Fabric & OneLake |
OneLake
- a single, unified, logical data lake for the whole organization [2]
- designed to be the single place for all an organization's analytics data [2]
- provides a single, integrated environment for data professionals and the business to collaborate on data projects [1]
- stores all data in a single open format [1]
- its data is governed by default
- combines storage locations across different regions and clouds into a single logical lake, without moving or duplicating data
- similar to how Office applications are prewired to use OneDrive
- saves time by eliminating the need to move and copy data
- comes automatically with every Microsoft Fabric tenant [2]
- automatically provisions with no extra resources to set up or manage [2]
- used as native store without needing any extra configuration [1
- accessible by all analytics engines in the platform [1]
- all the compute workloads in Fabric are preconfigured to work with OneLake
- compute engines have their own security models (aka compute-specific security)
- always enforced when accessing data using that engine [3]
- the conditions may not apply to users in certain Fabric roles when they access OneLake directly [3]
- built on top of ADLS [1]
- supports the same ADLS Gen2 APIs and SDKs to be compatible with existing ADLS Gen2 applications [2]
- inherits its hierarchical structure
- provides a single-pane-of-glass file-system namespace that spans across users, regions and even clouds
- data can be stored in any format
- incl. Delta, Parquet, CSV, JSON
- data can be addressed in OneLake as if it's one big ADLS storage account for the entire organization [2]
- uses a layered security model built around the organizational structure of experiences within MF [3]
- derived from Microsoft Entra authentication [3]
- compatible with user identities, service principals, and managed identities [3]
- using Microsoft Entra ID and Fabric components, one can build out robust security mechanisms across OneLake, ensuring that you keep your data safe while also reducing copies and minimizing complexity [3]
- hierarchical in nature
- {benefit} simplifies management across the organization
- its data is divided into manageable containers for easy handling
- can have one or more capacities associated with it
- different items consume different capacity at a certain time
- offered through Fabric SKU and Trials
- {component} OneCopy
- allows to read data from a single copy, without moving or duplicating data [1]
- {concept} Fabric tenant
- a dedicated space for organizations to create, store, and manage Fabric items.
- there's often a single instance of Fabric for an organization, and it's aligned with Microsoft Entra ID [1]
- ⇒ one OneLake per tenant
- maps to the root of OneLake and is at the top level of the hierarchy [1]
- can contain any number of workspaces [2]
- {concept} capacity
- a dedicated set of resources that is available at a given time to be used [1]
- defines the ability of a resource to perform an activity or to produce output [1]
- {concept} domain
- a way of logically grouping together workspaces in an organization that is relevant to a particular area or field [1]
- can have multiple [subdomains]
- {concept} subdomain
- a way for fine tuning the logical grouping of the data
- {concept} workspace
- a collection of Fabric items that brings together different functionality in a single tenant [1]
- different data items appear as folders within those containers [2]
- always lives directly under the OneLake namespace [4]
- {concept} data item
- a subtype of item that allows data to be stored within it using OneLake [4]
- all Fabric data items store their data automatically in OneLake in Delta Parquet format [2]
- {concept} Fabric item
- a set of capabilities bundled together into a single component [4]
- can have permissions configured separately from the workspace roles [3]
- permissions can be set by sharing an item or by managing the permissions of an item [3]
- acts as a container that leverages capacity for the work that is executed [1]
- provides controls for who can access the items in it [1]
- security can be managed through Fabric workspace roles
- enable different parts of the organization to distribute ownership and access policies [2]
- part of a capacity that is tied to a specific region and is billed separately [2]
- the primary security boundary for data within OneLake [3]
- represents a single domain or project area where teams can collaborate on data [3]
- [encryption] encrypted at rest by default using Microsoft-managed key [3]
- the keys are rotated appropriately per compliance requirements [3]
- data is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and it is FIPS 140-2 compliant [3]
- {limitation} encryption at rest using customer-managed key is currently not supported [3]
- {general guidance} write access
- users must be part of a workspace role that grants write access [4]
- rule applies to all data items, so scope workspaces to a single team of data engineers [4]
- {general guidance}Lake access:
- users must be part of the Admin, Member, or Contributor workspace roles, or share the item with ReadAll access [4]
- {general guidance} general data access
- any user with Viewer permissions can access data through the warehouses, semantic models, or the SQL analytics endpoint for the Lakehouse [4]
- {general guidance} object level security:
- give users access to a warehouse or lakehouse SQL analytics endpoint through the Viewer role and use SQL DENY statements to restrict access to certain tables [4]
- {feature|preview} trusted workspace access
- allows to securely access firewall-enabled Storage accounts by creating OneLake shortcuts to Storage accounts, and then use the shortcuts in the Fabric items [5]
- based on [workspace identity]
- {benefit} provides secure seamless access to firewall-enabled Storage accounts from OneLake shortcuts in Fabric workspaces, without the need to open the Storage account to public access [5]
- {limitation} available for workspaces in Fabric capacities F64 or higher
- {concept} workspace identity
- a unique identity that can be associated with workspaces that are in Fabric capacities
- enables OneLake shortcuts in Fabric to access Storage accounts that have [resource instance rules] configured
- {operation} creating a workspace identity
- Fabric creates a service principal in Microsoft Entra ID to represent the identity [5]
- {concept} resource instance rules
- a way to grant access to specific resources based on the workspace identity or managed identity [5]
- {operation} create resource instance rules
- created by deploying an ARM template with the resource instance rule details [5]
Acronyms:
ADLS - Azure Data Lake Storage
AES - Advanced Encryption Standard
AES - Advanced Encryption Standard
ARM - Azure Resource Manager
FIPS - Federal Information Processing Standard
SKU - Stock Keeping Units
SKU - Stock Keeping Units
References:
[1] Microsoft Learn (2023) Administer Microsoft Fabric (link)
[2] Microsoft Learn (2023) OneLake, the OneDrive for data (link)
[3] Microsoft Learn (2023) OneLake security (link)
[4] Microsoft Learn (2023) Get started securing your data in OneLake (link}
[5] Microsoft Fabric Updates Blog (2024) Introducing Trusted Workspace Access for OneLake Shortcuts, by Meenal Srivastva (link)
Resources:
[1]
No comments:
Post a Comment