Showing posts with label shortcut. Show all posts
Showing posts with label shortcut. Show all posts

08 December 2024

🏭🗒️Microsoft Fabric: Shortcuts [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 29-May-2025

[Microsoft Fabric] Shortcut

  • {def} object that points to other internal or external storage location (aka shortcut) [1] and that can be used for data access
    • serves as virtual pointer to data stored in other locations [6]
    • {goal} unifies existing data without copying or moving it [2]
      • ⇒ data can be used multiple times without being duplicated [2]
      • {benefit} helps to eliminate edge copies of data [1]
      • {benefit} reduces process latency associated with data copies and staging [1]
    • is a mechanism that allows to unify data across domains, clouds, and accounts through a namespace [1]
      • ⇒ allows creating a single virtual data lake for the entire enterprise [1]
      • ⇐ available in all Fabric experiences [1]
      • ⇐ behave like symbolic links [1]
    • independent object from the target [1]
    • appear as folder [1]
    • can be used by workloads or services that have access to OneLake [1]
    • transparent to any service accessing data through the OneLake API [1]
    • can point to 
      • OneLake locations
      • ADLS Gen2 storage accounts
      • Amazon S3 storage accounts
      • Dataverse
      • on-premises or network-restricted locations via PDF 
  • {capability} create shortcut to consolidate data across artifacts or workspaces, without changing data's ownership [2]
  • {capability} data can be compose throughout OneLake without any data movement [2]
  • {capability} allow instant linking of data already existing in Azure and in other clouds, without any data duplication and movement [2]
    • ⇐ makes OneLake the first multi-cloud data lake [2]
  • {capability} provides support for industry standard APIs
    • ⇒ OneLake data can be directly accessed via shortcuts by any application or service [2]
  • {operation} creating a shortcut
    • can be created in 
      • lakehouses
      • KQL databases
        • ⇐ shortcuts are recognized as external tables [1]
    • can be created via 
      • Fabric UI 
      • REST API
    • can be created across items [1]
      • the item types don't need to match [1]
        • e.g. create a shortcut in a lakehouse that points to data in a data warehouse [1]
    • [lakehouse] tables folder
      • represents the managed portion of the lakehouse 
        • shortcuts can be created only at the top level [1]
          • ⇒ shortcuts aren't supported in other subdirectories [1]
        • if shortcut's target contains data in the Delta\Parquet format, the lakehouse automatically synchronizes the metadata and recognizes the folder as a table [1]
    • [lakehouse] files folder
      • represents the unmanaged portion of the lakehouse [1]
      • there are no restrictions on where shortcuts can be created [1]
        • ⇒ can be created at any level of the folder hierarchy [1]
        • ⇐ table discovery doesn't happen in the Files folder [1]
    • [lakehouse] all shortcuts are accessed in a delegated mode when querying through the SQL analytics endpoint [5]
      • the delegated identity is the Fabric user that owns the lakehouse [5]
        • {default} the owner is the user that created the lakehouse and SQL analytics endpoint [5]
          •  ⇐ can be changed in select cases 
          • the current owner is displayed in the Owner column in Fabric when viewing the item in the workspace item list
        • ⇒ the querying user is able to read from shortcut tables if the owner has access to the underlying data, not the user executing the query [5]
          • ⇐ the querying user only needs access to select from the shortcut table [5]
      •  {feature} OneLake data access roles 
        • {enabled} access to a shortcut is determined by whether the SQL analytics endpoint owner has access to see the target lakehouse and read the table through a OneLake data access role [5]
        • {disabled} shortcut access is determined by whether the SQL analytics endpoint owner has the Read and ReadAll permission on the target path [5]
  • {operation} renaming a shortcut
  • {operation} moving a shortcut
  • {operation} deleting a shortcut 
    • doesn't affect the target [1]
      • ⇐ only the shortcut object is deleted [1]
    • shortcuts don't perform cascading deletes [1]
    • moving, renaming, or deleting a target path can break the shortcut [1]
  • {operation} delete file/folder
    • file or folder within a shortcut can be deleted when the permissions in the shortcut target allows it [1]
  • {permissions} users must have permissions in the target location to read the data [1]
    • when a user accesses data through a shortcut to another OneLake location, the identity of the calling user is used to authorize access to the data in the target path of the shortcut [1] (aka passthrough auth model [6])
      • ensures that any user accessing the shortcut is only able to see whatever they have access to in the target [6]
      • the security from the target ‘flows across’ the shortcut to restrict access in the source lakehouse [6]
      • OneLake to OneLake shortcuts support only passthrough mode [6]
        • ensures that the source system retains full control over its data [6]
          • ⇐ there’s no need to replicate or redefine access controls for the shortcut [6]
          • {benefit} reduces administrative overhead since security policies only need to be maintained in one place [6]
          • {constraint} security cannot be modified directly from the downstream item [6]
            • ensures that the source system retains full control over its data [6]
              • any changes to access permissions must be made at the source location [6]
              • the source remains the single point of truth for access control [6]
                • ⇐ ensures consistency
                • ⇐ minimes the risk of misconfiguration [6]
    • {type} delegated auth mode
      • shortcuts access data by using some intermediate credential
        • e.g. another user or an account key
        • allow for permission management to be separated or ‘delegated’ to another team or downstream user to manage [6]
          • always break the flow of security from one system to another [6]
          • all delegated shortcuts in OneLake can have OneLake security roles defined for them [6]
        • all shortcuts from OneLake to external systems are delegated [6]
          • e.g. AWS S3 or Google Cloud Storage
          • allows users to connect to the external system without being given direct access [6]
          • OneLake security can then be configured on the shortcut to limit what data in the external system can be accessed [6]
    • when accessing shortcuts through Power BI semantic models or T-SQL, the calling user’s identity is not passed through to the shortcut target [1]
      •  the calling item owner’s identity is passed instead, delegating access to the calling user [1]
    • OneLake manages all permissions and credentials
  • {type} OneLake to OneLake shortcuts
    • ideal for ensuring the hub retains control over sensitive or regulated data [6]
      • each downstream team 
        • can then only consume the data they are allowed to [6]
        • has the freedom to create its own reports or combine the hub data with other data that they own [6]
    • {concept} hub-and-spoke model
      • allows to manage the data access across multiple teams or departments [6]
      • {component} hub
        • the central data repository where core datasets are stored [6]
        • security policies are meticulously defined to ensure robust control [6]
      • {component} spokes
        • individual teams or departments access the hub’s data through shortcuts [6]
      • {advantage} enables centralized governance while allowing decentralized consumption and use of data [6]
      • can be leveraged in various ways to create efficient and secure data architectures [6]
  • {type} delegated shortcuts
    • allow to share data securely centralize data across clouds, without copying it [6]
      • the data that already exists in various cloud storage accounts is consolidated in OneLake through the use of delegated shortcuts [6]
      • a new lakehouse is created as the consolidation point [6]
      • each external data source is connected via a delegated shortcut [6]
        • the admin can define OneLake security roles to govern access
        • granularity: row, column, schemas or shortcuts [6]
      • ⇒ no user will have direct access to the external data ⇐  they will be limited to only what the admin allows through OneLake security [6]
      • ⇐ once the data is consolidated, it can be combined with the hub-and-spoke model to create a composite architecture that keeps both upstream and downstream data safe [6]
  • {feature} shortcut caching 
    • {def} mechanism used to reduce egress costs associated with cross-cloud data access [1]
      • when files are read through an external shortcut, the files are stored in a cache for the Fabric workspace [1]
        • subsequent read requests are served from cache rather than the remote storage provider [1]
        • cached files have a retention period of 24 hours
        • each time the file is accessed the retention period is reset [1]
        • if the file in remote storage provider is more recent than the file in the cache, the request is served from remote storage provider and the updated file will be stored in cache [1]
        • if a file hasn’t been accessed for more than 24hrs it is purged from the cache [1]
    • {restriction} individual files greater than 1 GB in size are not cached [1]
    • {restriction} only GCS, S3 and S3 compatible shortcuts are supported [1]
  • {feature} query acceleration
    • caches data as it lands in OneLake, providing performance comparable to ingesting data in Eventhouse [4]
  • {limitation} maximum number of shortcuts [1] 
    • per Fabric item: 100,000
    • in a single OneLake path: 10
    • direct shortcuts to shortcut links: 5
  • {limitation} ADLS and S3 shortcut target paths can't contain any reserved characters from RFC 3986 section 2.2 [1]
  • {limitation} shortcut names, parent paths, and target paths can't contain "%" or "+" characters [1]
  • {limitation} shortcuts don't support non-Latin characters[1]
  • {limitation} Copy Blob API not supported for ADLS or S3 shortcuts[1]
  • {limitation} copy function doesn't work on shortcuts that directly point to ADLS containers
    • {recommended} create ADLS shortcuts to a directory that is at least one level below a container [1]
  • {limitation} additional shortcuts can't be created inside ADLS or S3 shortcuts [1]
  • {limitation} lineage for shortcuts to Data Warehouses and Semantic Models is not currently available[1]
  • {limitation} it may take up to a minute for the Table API to recognize new shortcuts [1]
  • introduce unique considerations when it comes to security [6]

References:
[1] Microsoft Learn (2024) Fabric: OneLake shortcuts [link]
[2] Microsoft Learn (2024) Fabric Analyst in a Day [course notes]
[3] Microsoft Learn (2024) Use OneLake shortcuts to access data across capacities: Even when the producing capacity is paused! [link]
[4] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]
[5] Microsoft Learn (2024) Microsoft Fabric: How to secure a lakehouse for Data Warehousing teams [link]
[6] Microsoft Fabric Update Blog (2025) Understanding OneLake Security with Shortcuts [link

Acronyms:
ADLS - Azure Data Lake Storage
API - Application Programming Interface
AWS - Amazon Web Services
GCS - Google Cloud Storage
KQL - Kusto Query Language
OPDG - on-premises data gateway
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.