SQL Troubles: OneLake

Showing posts with label OneLake. Show all posts

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Mar-2025

Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventhouses

[def]
a service that empowers users to extract insights and visualize data in motion

offers an end-to-end solution for

event-driven scenarios

⇐ rather than schedule-driven solutions [1]

a workspace of databases

can be shared across projects [1]

allows to manage multiple databases at once

sharing capacity and resources to optimize performance and cost
provides unified monitoring and management across all databases and per database [1]

provide a solution for handling and analyzing large volumes of data

particularly in scenarios requiring real-time analytics and exploration [1]
designed to handle real-time data streams efficiently [1]

lets organizations ingest, process, and analyze data in near real-time [1]

provide a scalable infrastructure that allows organizations to handle growing volumes of data, ensuring optimal performance and resource use.

preferred engine for semistructured and free text analysis
tailored to time-based, streaming events with structured, semistructured, and unstructured data [1]
allows to get data

from multiple sources,
in multiple pipelines

e.g. Eventstream, SDKs, Kafka, Logstash, data flows, etc.

multiple data formats [1]

data is automatically indexed and partitioned based on ingestion time

designed to optimize cost by suspending the service when not in use [1]

reactivating the service, can lead to a latency of a few seconds [1]

for highly time-sensitive systems that can't tolerate this latency, use Minimum consumption setting [1]

enables the service to be always available at a selected minimum level [1]

customers pay for

the minimum compute level selected [1]
the actual consumption when the compute level is above the minimum set [1]

the specified compute is available to all the databases within the eventhouse [1]

{scenario} solutions that includes event-based data

e.g. telemetry and log data, time series and IoT data, security and compliance logs, or financial records [1]

KQL databases

can be created within an eventhouse [1]
can either be a standard database, or a database shortcut [1]
an exploratory query environment is created for each KQL Database, which can be used for exploration and data management [1]
data availability in OneLake can be enabled on a database or table level [1]

Eventhouse page

serves as the central hub for all your interactions within the Eventhouse environment [1]
Eventhouse ribbon

provides quick access to essential actions within the Eventhouse

explorer pane

provides an intuitive interface for navigating between Eventhouse views and working with databases [1]

main view area

displays the system overview details for the eventhouse [1]

{feature} Eventhouse monitoring

offers comprehensive insights into the usage and performance of the eventhouse by collecting end-to-end metrics and logs for all aspects of an Eventhouse [2]
part of workspace monitoring that allows you to monitor Fabric items in your workspace [2]
provides a set of tables that can be queried to get insights into the usage and performance of the eventhouse [2]

can be used to optimize the eventhouse and improve the user experience [2]

{feature} query logs table

contains the list of queries run on an Eventhouse KQL database

for each query, a log event record is stored in the EventhouseQueryLogs table [3]

can be used to

analyze query performance and trends [3]
troubleshoot slow queries [3]
identify heavy queries consuming large amount of system resources [3]
identify the users/applications running the highest number of queries[3]

{feature} OneLake availability

{benefit} allows to create one logical copy of a KQL database data in an eventhouse by turning on the feature [4]

users can query the data in the KQL database in Delta Lake format via other Fabric engines [4]

e.g. Direct Lake mode in Power BI, Warehouse, Lakehouse, Notebooks, etc.

{prerequisite} a workspace with a Microsoft Fabric-enabled capacity [4]
{prerequisite} a KQL database with editing permissions and data [4]
{constraint} rename tables
{constraint} alter table schemas
{constraint} apply RLS to tables
{constraint} data can't be deleted, truncated, or purged
when turned on, a mirroring policy is enabled

can be used to monitor data latency or alter it to partition delta tables [4]

{feature} robust adaptive mechanism

intelligently batches incoming data streams into one or more Parquet files, structured for analysis [4]
⇐ important when dealing with trickling data [4]

⇐ writing many small Parquet files into the lake can be inefficient resulting in higher costs and poor performance [4]

delays write operations if there isn't enough data to create optimal Parquet files [4]

ensures Parquet files are optimal in size and adhere to Delta Lake best practices [4]
ensures that the Parquet files are primed for analysis and balances the need for prompt data availability with cost and performance considerations [4]
{default} the write operation can take up to 3 hours or until files of sufficient size are created [4]

typically the files have 200-256 MB
the value can be adjusted between 5 minutes and 3 hours [4]

{warning} adjusting the delay to a shorter period might result in a suboptimal delta table with a large number of small files [4]

can lead to inefficient query performance [4]

{restriction} the resultant table in OneLake is read-only and can't be optimized after creation [4]

delta tables can be partitioned to improve query speed [4]

each partition is represented as a separate column using the PartitionName listed in the Partitions list [4]

⇒ OneLake copy has more columns than the source table [4]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Microsoft Fabric: Eventhouse overview [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Eventhouse monitoring [link]

[3] Microsoft Learn (2025) Microsoft Fabric: Query logs [link]

[4] Microsoft Learn (2025) Microsoft Fabric: Eventhouse OneLake Availability [link]

[5] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Eventhouse Monitoring (Preview) [link]

[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
KQL - Kusto Query Language
SDK - Software Development Kit
RLS - Row Level Security
RTI - Real-Time Intelligence

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 22-Jan-2025

[Microsoft Fabric] Folders

{def} organizational units inside a workspace that enable users to efficiently organize and manage artifacts in the workspace [1]
identifiable by its name

{constraint} must be unique in a folder or at the root level of the workspace
{constraint} can’t include certain special characters [1]

C0 and C1 control codes [1]
leading or trailing spaces [1]
characters: ~"#.&*:<>?/{|} [1]

{constraint} can’t have system-reserved names

e.g. $recycle.bin, recycled, recycler.

{constraint} its length can't exceed 255 characters

{operation} create folder

can be created in

an existing folder (aka nested subfolder) [1]

{restriction} a maximum of 10 levels of nested subfolders can be created [1]
up to 10 folders can be created in the root folder [1]
{benefit} provide a hierarchical structure for organizing and managing items [1]

the root

{operation} move folder
{operation} rename folder

same rules applies as for folders’ creation [1]

{operation} delete folder

{restriction} currently can be deleted only empty folders [1]

{recommendation} make sure the folder is empty [1]

{operation} create item in folder

{restriction} certain items can’t be created in a folder

dataflows gen2
streaming semantic models
streaming dataflows

⇐ items created from the home page or the Create hub, are created at the root level of the workspace [1]

{operation} move file(s) between folders [1]
{operation} publish to folder [1]

Power BI reports can be published to specific folders

{restriction} folders' name must be unique throughout an entire workspace, regardless of their location [1]

when publishing a report to a workspace that has another report with the same name in a different folder, the report will publish to the location of the already existing report [1]

{limitation}may not be supported by certain features

e.g. Git

{recommendation} use folders to organize workspaces [1]
{permissions}

inherit the permissions of the workspace where they're located [1] [2]
workspace admins, members, and contributors can create, modify, and delete folders in the workspace [1]
viewers can only view folder hierarchy and navigate in the workspace [1]

[deployment pipelines] deploying items in folders to a different stage, the folder hierarchy is automatically applied [2]

Previous Post <<||>> Next Post

References:
[1] Microsoft Fabric (2024) Create folders in workspaces [link]
[2] Microsoft Fabric (2024) The deployment pipelines process [link]
[3] Microsoft Fabric Updates Blog (2025) Define security on folders within a shortcut using OneLake data access roles [link]
[4] Microsoft Fabric Updates Blog (2025) Announcing the General Availability of Folder in Workspace [link]
[5] Microsoft Fabric Updates Blog (2025) Announcing Folder in Workspace in Public Preview [link]
[6] Microsoft Fabric Updates Blog (2025) Getting the size of OneLake data items or folders [link]

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

08 December 2024

🏭🗒️Microsoft Fabric: Shortcuts [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 29-May-2025

[Microsoft Fabric] Shortcut

{def} object that points to other internal or external storage location (aka shortcut) [1] and that can be used for data access

serves as virtual pointer to data stored in other locations [6]
{goal} unifies existing data without copying or moving it [2]

⇒ data can be used multiple times without being duplicated [2]
{benefit} helps to eliminate edge copies of data [1]
{benefit} reduces process latency associated with data copies and staging [1]

is a mechanism that allows to unify data across domains, clouds, and accounts through a namespace [1]

⇒ allows creating a single virtual data lake for the entire enterprise [1]
⇐ available in all Fabric experiences [1]
⇐ behave like symbolic links [1]

independent object from the target [1]
appear as folder [1]
can be used by workloads or services that have access to OneLake [1]
transparent to any service accessing data through the OneLake API [1]

can point to

OneLake locations
ADLS Gen2 storage accounts
Amazon S3 storage accounts
Dataverse
on-premises or network-restricted locations via PDF

{capability} create shortcut to consolidate data across artifacts or workspaces, without changing data's ownership [2]
{capability} data can be compose throughout OneLake without any data movement [2]
{capability} allow instant linking of data already existing in Azure and in other clouds, without any data duplication and movement [2]

⇐ makes OneLake the first multi-cloud data lake [2]

{capability} provides support for industry standard APIs

⇒ OneLake data can be directly accessed via shortcuts by any application or service [2]

{operation} creating a shortcut

can be created in

lakehouses
KQL databases

⇐ shortcuts are recognized as external tables [1]

can be created via

Fabric UI
REST API

can be created across items [1]

the item types don't need to match [1]

e.g. create a shortcut in a lakehouse that points to data in a data warehouse [1]

[lakehouse] tables folder

represents the managed portion of the lakehouse

shortcuts can be created only at the top level [1]

⇒ shortcuts aren't supported in other subdirectories [1]

if shortcut's target contains data in the Delta\Parquet format, the lakehouse automatically synchronizes the metadata and recognizes the folder as a table [1]

[lakehouse] files folder

represents the unmanaged portion of the lakehouse [1]
there are no restrictions on where shortcuts can be created [1]

⇒ can be created at any level of the folder hierarchy [1]
⇐ table discovery doesn't happen in the Files folder [1]

[lakehouse] all shortcuts are accessed in a delegated mode when querying through the SQL analytics endpoint [5]

the delegated identity is the Fabric user that owns the lakehouse [5]

{default} the owner is the user that created the lakehouse and SQL analytics endpoint [5]

⇐ can be changed in select cases
the current owner is displayed in the Owner column in Fabric when viewing the item in the workspace item list

⇒ the querying user is able to read from shortcut tables if the owner has access to the underlying data, not the user executing the query [5]

⇐ the querying user only needs access to select from the shortcut table [5]

{feature} OneLake data access roles

{enabled} access to a shortcut is determined by whether the SQL analytics endpoint owner has access to see the target lakehouse and read the table through a OneLake data access role [5]
{disabled} shortcut access is determined by whether the SQL analytics endpoint owner has the Read and ReadAll permission on the target path [5]

{operation} renaming a shortcut
{operation} moving a shortcut
{operation} deleting a shortcut

doesn't affect the target [1]

⇐ only the shortcut object is deleted [1]

shortcuts don't perform cascading deletes [1]
moving, renaming, or deleting a target path can break the shortcut [1]

{operation} delete file/folder

file or folder within a shortcut can be deleted when the permissions in the shortcut target allows it [1]

{permissions} users must have permissions in the target location to read the data [1]

when a user accesses data through a shortcut to another OneLake location, the identity of the calling user is used to authorize access to the data in the target path of the shortcut [1] (aka passthrough auth model [6])

ensures that any user accessing the shortcut is only able to see whatever they have access to in the target [6]
the security from the target ‘flows across’ the shortcut to restrict access in the source lakehouse [6]
OneLake to OneLake shortcuts support only passthrough mode [6]

ensures that the source system retains full control over its data [6]

⇐ there’s no need to replicate or redefine access controls for the shortcut [6]
{benefit} reduces administrative overhead since security policies only need to be maintained in one place [6]
{constraint} security cannot be modified directly from the downstream item [6]

ensures that the source system retains full control over its data [6]

any changes to access permissions must be made at the source location [6]
the source remains the single point of truth for access control [6]

⇐ ensures consistency
⇐ minimes the risk of misconfiguration [6]

{type} delegated auth mode

shortcuts access data by using some intermediate credential

e.g. another user or an account key
allow for permission management to be separated or ‘delegated’ to another team or downstream user to manage [6]

always break the flow of security from one system to another [6]
all delegated shortcuts in OneLake can have OneLake security roles defined for them [6]

all shortcuts from OneLake to external systems are delegated [6]

e.g. AWS S3 or Google Cloud Storage
allows users to connect to the external system without being given direct access [6]
OneLake security can then be configured on the shortcut to limit what data in the external system can be accessed [6]

when accessing shortcuts through Power BI semantic models or T-SQL, the calling user’s identity is not passed through to the shortcut target [1]

the calling item owner’s identity is passed instead, delegating access to the calling user [1]

OneLake manages all permissions and credentials

{type} OneLake to OneLake shortcuts

ideal for ensuring the hub retains control over sensitive or regulated data [6]

each downstream team

can then only consume the data they are allowed to [6]
has the freedom to create its own reports or combine the hub data with other data that they own [6]

{concept} hub-and-spoke model

allows to manage the data access across multiple teams or departments [6]
{component} hub

the central data repository where core datasets are stored [6]
security policies are meticulously defined to ensure robust control [6]

{component} spokes

individual teams or departments access the hub’s data through shortcuts [6]

{advantage} enables centralized governance while allowing decentralized consumption and use of data [6]
can be leveraged in various ways to create efficient and secure data architectures [6]

{type} delegated shortcuts

allow to share data securely centralize data across clouds, without copying it [6]

the data that already exists in various cloud storage accounts is consolidated in OneLake through the use of delegated shortcuts [6]
a new lakehouse is created as the consolidation point [6]
each external data source is connected via a delegated shortcut [6]

the admin can define OneLake security roles to govern access
granularity: row, column, schemas or shortcuts [6]

⇒ no user will have direct access to the external data ⇐ they will be limited to only what the admin allows through OneLake security [6]
⇐ once the data is consolidated, it can be combined with the hub-and-spoke model to create a composite architecture that keeps both upstream and downstream data safe [6]

{feature} shortcut caching

{def} mechanism used to reduce egress costs associated with cross-cloud data access [1]

when files are read through an external shortcut, the files are stored in a cache for the Fabric workspace [1]

subsequent read requests are served from cache rather than the remote storage provider [1]
cached files have a retention period of 24 hours
each time the file is accessed the retention period is reset [1]
if the file in remote storage provider is more recent than the file in the cache, the request is served from remote storage provider and the updated file will be stored in cache [1]
if a file hasn’t been accessed for more than 24hrs it is purged from the cache [1]

{restriction} individual files greater than 1 GB in size are not cached [1]
{restriction} only GCS, S3 and S3 compatible shortcuts are supported [1]

{feature} query acceleration

caches data as it lands in OneLake, providing performance comparable to ingesting data in Eventhouse [4]

{limitation} maximum number of shortcuts [1]

per Fabric item: 100,000
in a single OneLake path: 10
direct shortcuts to shortcut links: 5

{limitation} ADLS and S3 shortcut target paths can't contain any reserved characters from RFC 3986 section 2.2 [1]
{limitation} shortcut names, parent paths, and target paths can't contain "%" or "+" characters [1]
{limitation} shortcuts don't support non-Latin characters[1]
{limitation} Copy Blob API not supported for ADLS or S3 shortcuts[1]
{limitation} copy function doesn't work on shortcuts that directly point to ADLS containers

{recommended} create ADLS shortcuts to a directory that is at least one level below a container [1]

{limitation} additional shortcuts can't be created inside ADLS or S3 shortcuts [1]
{limitation} lineage for shortcuts to Data Warehouses and Semantic Models is not currently available[1]
{limitation} it may take up to a minute for the Table API to recognize new shortcuts [1]
introduce unique considerations when it comes to security [6]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) Fabric: OneLake shortcuts [link]
[2] Microsoft Learn (2024) Fabric Analyst in a Day [course notes]

[3] Microsoft Learn (2024) Use OneLake shortcuts to access data across capacities: Even when the producing capacity is paused! [link]

[4] Microsoft Learn (2024) Fabric: Query acceleration for OneLake shortcuts - overview (preview) [link]

[5] Microsoft Learn (2024) Microsoft Fabric: How to secure a lakehouse for Data Warehousing teams [link]

[6] Microsoft Fabric Update Blog (2025) Understanding OneLake Security with Shortcuts [link]

Acronyms:

ADLS - Azure Data Lake Storage
API - Application Programming Interface

AWS - Amazon Web Services

GCS - Google Cloud Storage

KQL - Kusto Query Language

OPDG - on-premises data gateway

07 December 2024

🏭 💠Data Warehousing: Microsoft Fabric (Part VI: SQL Databases for OLTP scenarios) [new feature]

Data Warehousing Series

One interesting announcements at Ignite is the availability in public preview of SQL databases in Microsoft Fabric, "a versatile and developer-friendly transactional database built on the foundation of Azure SQL database". With this Fabric can address besides OLAP also OLTP scenarios, evolving thus from analytics to a data platform [1]. According to the announcement, besides the AI-optimized architectural aspects, the feature makes the SQL Azure simple, autonomous and secure by design [1], and these latest aspects are considered in this post.

Simplicity revolves around the deployment and configuration of databases, the creation of a new database requiring giving a name and the database is created in seconds [1]. It’s a considerable improvement compared with the relatively complex setup needed for on-premise configurations, though sometimes more flexibility in configuration is needed upfront or over database’s lifetime. To get a database ready for testing one can import a sample database or get specific data via data flows and/or pipelines [1]. As development tools one can use Visual Studio Code or SSMS [1], and probably more tools will be available in time.

The integration with both GitHub and Azure DevOps allows to configure each database under source control, which is needed for many scenarios especially when multiple resources make changes to the database objects [1]. Frankly, that’s mainly important during the development phase, respectively in scenarios in which multiple people make in parallel changes to the logic. It will be interesting to see how much overhead or challenges the feature adds to development and how smoothly everything works together!

The most important aspect for many solutions is the replication of data in near-real time to the (open-source) delta parquet format in OneLake and thus making the data available for analytics almost immediately [1]. Probably, from this aspect many cloud-based applications can benefit, even if the performance might not be as good as in other well-established architectures. However, there are many other scenarios in which one needs to maintain and use data for OLTP/OLAP purposes. This invites adequate testing and a good weighting of the advantages and disadvantages involved.

A SQL database is a native item in Fabric, and therefore it utilizes Fabric capacity units like other Fabric workloads [1]. One can use the Fabric SKU estimator (still in private preview) to estimate the costs [2], though it will be interesting to see how cost-effective the solutions are. Probably, especially when the infrastructure is already available outside of Fabric, it will be easier and cost-effective to use the mirroring functionality. One should test and have a better estimator before moving blindly from the existing infrastructure to Fabric.

SQL databases in Fabric are autonomous by design, while allowing to get the best performance and availability by default [1]. High availability is reached through zone redundancy, while performance is achieved by scaling automatically the storage and compute to accommodate the workloads [1]. The auto-optimization capability is achieved with the help of the latest Intelligent Query Processing (IQP) enhancements, respectively the creation of missing indexes to improve query performance [1]. It will be interesting to see how the whole process works, given that the maintenance of indexes usually involves some challenges (e.g. identifying covering indexes, indexes needed only for temporary workloads, duplicated indexes).

SQL databases in Fabric are automatically configured for high availability with zone redundancy, while storage and compute scale automatically to accommodate the user workload [1]. The database is auto-optimized through the latest IQP enhancements while the system creates any missing indexes to improve query performance. All data is replicated to OneLake by default [1]. Finally, the database always receives the latest security updates with auto-patching, while automatic backups help in disaster recovery scenarios [1], which can be of real help for database administrators.

References:
[1] Microsoft Fabric Updates Blog (2024) Announcing SQL database in Microsoft Fabric Public Preview [link]
[2] Microsoft Fabric Updates Blog (2024) Announcing New Recruitment for the Private Preview of Microsoft Fabric SKU Estimator [link]

12 March 2024

🏭🗒️Microsoft Fabric: OneLake [Notes]

Last updated: 12-Mar-2024

Microsoft Fabric & OneLake

[Microsoft Fabric] OneLake

a single, unified, logical data lake for the whole organization [2]

designed to be the single place for all an organization's analytics data [2]
provides a single, integrated environment for data professionals and the business to collaborate on data projects [1]
stores all data in a single open format [1]
its data is governed by default
combines storage locations across different regions and clouds into a single logical lake, without moving or duplicating data

similar to how Office applications are prewired to use OneDrive
saves time by eliminating the need to move and copy data

comes automatically with every Microsoft Fabric tenant [2]

automatically provisions with no extra resources to set up or manage [2]
used as native store without needing any extra configuration [1

accessible by all analytics engines in the platform [1]

all the compute workloads in Fabric are preconfigured to work with OneLake

compute engines have their own security models (aka compute-specific security)

always enforced when accessing data using that engine [3]
the conditions may not apply to users in certain Fabric roles when they access OneLake directly [3]

built on top of ADLS [1]

supports the same ADLS Gen2 APIs and SDKs to be compatible with existing ADLS Gen2 applications [2]
inherits its hierarchical structure
provides a single-pane-of-glass file-system namespace that spans across users, regions and even clouds

data can be stored in any format

incl. Delta, Parquet, CSV, JSON
data can be addressed in OneLake as if it's one big ADLS storage account for the entire organization [2]

uses a layered security model built around the organizational structure of experiences within MF [3]

derived from Microsoft Entra authentication [3]
compatible with user identities, service principals, and managed identities [3]
using Microsoft Entra ID and Fabric components, one can build out robust security mechanisms across OneLake, ensuring that you keep your data safe while also reducing copies and minimizing complexity [3]

hierarchical in nature

{benefit} simplifies management across the organization
its data is divided into manageable containers for easy handling
can have one or more capacities associated with it

different items consume different capacity at a certain time
offered through Fabric SKU and Trials

{component} OneCopy

allows to read data from a single copy, without moving or duplicating data [1]

{concept} Fabric tenant

a dedicated space for organizations to create, store, and manage Fabric items.

there's often a single instance of Fabric for an organization, and it's aligned with Microsoft Entra ID [1]

⇒ one OneLake per tenant

maps to the root of OneLake and is at the top level of the hierarchy [1]

can contain any number of workspaces [2]

{concept} capacity

a dedicated set of resources that is available at a given time to be used [1]
defines the ability of a resource to perform an activity or to produce output [1]

{concept} domain

a way of logically grouping together workspaces in an organization that is relevant to a particular area or field [1]
can have multiple [subdomains]

{concept} subdomain

a way for fine tuning the logical grouping of the data

{concept} workspace

a collection of Fabric items that brings together different functionality in a single tenant [1]

different data items appear as folders within those containers [2]
always lives directly under the OneLake namespace [4]
{concept} data item

a subtype of item that allows data to be stored within it using OneLake [4]
all Fabric data items store their data automatically in OneLake in Delta Parquet format [2]

{concept} Fabric item

a set of capabilities bundled together into a single component [4]
can have permissions configured separately from the workspace roles [3]
permissions can be set by sharing an item or by managing the permissions of an item [3]

acts as a container that leverages capacity for the work that is executed [1]

provides controls for who can access the items in it [1]

security can be managed through Fabric workspace roles

enable different parts of the organization to distribute ownership and access policies [2]
part of a capacity that is tied to a specific region and is billed separately [2]
the primary security boundary for data within OneLake [3]

represents a single domain or project area where teams can collaborate on data [3]

[encryption] encrypted at rest by default using Microsoft-managed key [3]

the keys are rotated appropriately per compliance requirements [3]
data is encrypted and decrypted transparently using 256-bit AES encryption, one of the strongest block ciphers available, and it is FIPS 140-2 compliant [3]
{limitation} encryption at rest using customer-managed key is currently not supported [3]

{general guidance} write access

users must be part of a workspace role that grants write access [4]
rule applies to all data items, so scope workspaces to a single team of data engineers [4]

{general guidance}Lake access:

users must be part of the Admin, Member, or Contributor workspace roles, or share the item with ReadAll access [4]

{general guidance} general data access

any user with Viewer permissions can access data through the warehouses, semantic models, or the SQL analytics endpoint for the Lakehouse [4]

{general guidance} object level security:

give users access to a warehouse or lakehouse SQL analytics endpoint through the Viewer role and use SQL DENY statements to restrict access to certain tables [4]

{feature|preview} trusted workspace access

allows to securely access firewall-enabled Storage accounts by creating OneLake shortcuts to Storage accounts, and then use the shortcuts in the Fabric items [5]
based on [workspace identity]
{benefit} provides secure seamless access to firewall-enabled Storage accounts from OneLake shortcuts in Fabric workspaces, without the need to open the Storage account to public access [5]
{limitation} available for workspaces in Fabric capacities F64 or higher

{concept} workspace identity

a unique identity that can be associated with workspaces that are in Fabric capacities
enables OneLake shortcuts in Fabric to access Storage accounts that have [resource instance rules] configured
{operation} creating a workspace identity

Fabric creates a service principal in Microsoft Entra ID to represent the identity [5]

{concept} resource instance rules

a way to grant access to specific resources based on the workspace identity or managed identity [5]
{operation} create resource instance rules

created by deploying an ARM template with the resource instance rule details [5]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2023) Administer Microsoft Fabric (link)
[2] Microsoft Learn (2023) OneLake, the OneDrive for data (link)
[3] Microsoft Learn (2023) OneLake security (link)
[4] Microsoft Learn (2023) Get started securing your data in OneLake (link}
[5] Microsoft Fabric Updates Blog (2024) Introducing Trusted Workspace Access for OneLake Shortcuts, by Meenal Srivastva (link)

Resources:

[R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:

ADLS - Azure Data Lake Storage
AES - Advanced Encryption Standard

ARM - Azure Resource Manager

FIPS - Federal Information Processing Standard
SKU - Stock Keeping Units

13 February 2024

🧭Business Intelligence: A One-Man Show (Part II: In the Cusps of Complexity)

Business Intelligence Series

I watched today on YouTube Power BI Tips' "One Person to Do Everything" episode I missed last week. The main topic is based on Christopher Laubenthal's article "Why one person can't do everything in the data space". Author's arguments are based on an analogy between the various data areas and a college's functional structure. Reading the article, I must say that it takes a poorly chosen analogy to mess messy things more!

One of the most confusing things is that there are so many data-related context-dependent roles with considerable overlapping, that it becomes more and more difficult to understand what they cover. The author considers the roles of Data Architect, Data Engineer, Database Administrator (DBA), Data Analyst, Information Designer and Data Scientist. However, to the every aspect of a data architecture there are also developers on the database (backend) and reporting side (front-end). Conversely, there are other data professionals on the management side for the various knowledge areas of Data Management: Data Governance, Data Strategy, Data Security, Data Operations, etc. There are also roles at the border between the business and the technical side like Data Stewards, Business Analysts, Data Citizen, etc.

There are two main aspects here. According to the historical perspective, many of these roles appeared when a new set of requirements or a new layer appeared in the architecture. Firstly, it was maybe the DBA, who was supposed to primarily administer the database. Being a keeper of the data and having some knowledge of the data entities, it was easy for him/her to export data for the various reporting needs. In time such activities were taken over by a second category of data professionals. Then the data were moved to Decision Support Systems and later to Data Warehouses and Data Lakes/Lakehoses, this evolution requiring other professionals to address the challenges of each layer. Every activity performed on the data requires a certain type of knowledge that can result in the end in a new denomination.

The second perspective results from the management of data and the knowledge areas associated with it. If in small organizations with one or two systems in place one doesn't need to talk about Data Operations, in big organizations, where a data center or something similar is maybe in place, Data Operations can easily become a topic on its own, a management structure needing to be in place for its "effective and efficient" management. And the same can happen in the other knowledge areas and their interaction with the business. It's an inherent tendency of answering to complexity with complexity, which on the long term can be in the detriment of any business. In extremis, organizations tend to have a whole team in each area, which can further increase the overall complexity by a small to not that small magnitude.

Fortunately, one of the benefits of technological advancement is that much of the complexity can be moved somewhere else, and these are the areas where the cloud brings the most advantages. Parts or all architecture can be deployed into the cloud, being managed by cloud providers and third-parties on an on-demand basis at stable costs. Moreover, with the increasing maturity and integration of the various layers, the impact of the various roles in the overall picture is reduced considerably as areas like governance, security or operations are built-in as services, requiring thus less resources.

With Microsoft Fabric, all the data needed for reporting becomes in theory easily available in the OneLake. Unfortunately, there is another type of complexity that is dumped on other professionals' shoulders and these aspects need to be furthered considered.

Previous Post <<|||>> Next Post

Resources:
[1] Christopher Laubenthal (2024) "Why One Person Can’t Do Everything In Data" (link)
[2] Power BI tips (2024) Ep.292: One Person to Do Everything (link)

SQL Troubles

Pages

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

22 January 2025

🏭🗒️Microsoft Fabric: Folders [Notes]

08 December 2024

🏭🗒️Microsoft Fabric: Shortcuts [Notes]

07 December 2024

🏭 💠Data Warehousing: Microsoft Fabric (Part VI: SQL Databases for OLTP scenarios) [new feature]

12 March 2024

🏭🗒️Microsoft Fabric: OneLake [Notes]

13 February 2024

🧭Business Intelligence: A One-Man Show (Part II: In the Cusps of Complexity)

About Me