Showing posts with label Metadata. Show all posts
Showing posts with label Metadata. Show all posts

26 February 2025

💎🤖SQL Reloaded: Copilot Stories (Part II: Create a Schema for Customer Addresses)

Since the first post on a related topic, Copilot became available in Microsoft 365, and this opens the door for prompt engineering experiments. A prompt is a "natural language instruction that tells a large language model (LLM) to perform a task" [1], process also known as instruction tuning. Prompt engineering is thus "the process of creating and refining the prompt used by the model" [1].

For example, one can use Prompt Engineering to learn how to perform certain tasks in a programming language like SQL: learning how to define a table, write a query and database objects, etc. In a first experiment I created a set of prompts for creating a set of objects to store and perform CRUD (Create, Read, Update, Delete) operations. The experiment focused on Customer Addresses, Countries, Regions. Let's look at the prompts and their outputs. The following prompts depict the attempts in the order in which they were considered, so feel free to experiment!

Before Copilot, I used macros in Excel or stored procedures to generate similar code. which allowed me to reduce considerably the development time.

Prompt:
Give me the definition of a SQL Server stored procedure for updating the Customers from the dbo schema based on input parameters. 

Output:

CREATE PROCEDURE UpdateCustomer
    @CustomerID INT,
    @CustomerName NVARCHAR(100),
    @ContactName NVARCHAR(100),
    @Country NVARCHAR(50)
AS
BEGIN
    UPDATE dbo.Customers
    SET 
        CustomerName = @CustomerName,
        ContactName = @ContactName,
        Country = @Country
    WHERE 
        CustomerID = @CustomerID;
END;

Prompt:
Give me the definition of a SQL Server stored procedure for inserting Customers in the dbo schema based on input parameters.

Output:

CREATE PROCEDURE InsertCustomer
    @CustomerName NVARCHAR(100),
    @ContactName NVARCHAR(100),
    @Country NVARCHAR(50)
AS
BEGIN
    INSERT INTO dbo.Customers (CustomerName, ContactName, Country)
    VALUES (@CustomerName, @ContactName, @Country);
END;

Prompt:
Give me the definition of a SQL Server stored procedure for deleting a Customers from the dbo schema based on Customer Id.

Output:

CREATE PROCEDURE DeleteCustomer
    @CustomerID INT
AS
BEGIN
    DELETE FROM dbo.Customers
    WHERE CustomerID = @CustomerID;
END;

Prompt:
Give me the definition of the SQL Server table used for the above examples.

Output:

CREATE TABLE dbo.Customers (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    CustomerName NVARCHAR(100) NOT NULL,
    ContactName NVARCHAR(100),
    Country NVARCHAR(50)
);

Prompt:
Extend the definition of the previous table with Address information. 

Output:

CREATE TABLE dbo.Customers (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    CustomerName NVARCHAR(100) NOT NULL,
    ContactName NVARCHAR(100),
    Country NVARCHAR(50),
    AddressLine1 NVARCHAR(200),
    AddressLine2 NVARCHAR(200),
    City NVARCHAR(100),
    State NVARCHAR(100),
    PostalCode NVARCHAR(20)
);

Prompt:
Remove the Address information from the above table into a separate Addresses table with the Customer Id as foreign key.

Output:

-- Modified Customers Table:
CREATE TABLE dbo.Customers (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    CustomerName NVARCHAR(100) NOT NULL,
    ContactName NVARCHAR(100),
    Country NVARCHAR(50)
);

-- New Addresses Table:
CREATE TABLE dbo.Addresses (
    AddressID INT IDENTITY(1,1) PRIMARY KEY,
    CustomerID INT,
    AddressLine1 NVARCHAR(200),
    AddressLine2 NVARCHAR(200),
    City NVARCHAR(100),
    State NVARCHAR(100),
    PostalCode NVARCHAR(20),
    FOREIGN KEY (CustomerID) REFERENCES dbo.Customers(CustomerID)
);

Prompt:
Create a Countries table that can be referenced from the Addresses table with Country ISO codes.

Output:

--Create the Countries Table:
CREATE TABLE dbo.Countries (
    CountryCode CHAR(2) PRIMARY KEY, -- ISO 3166-1 alpha-2 code
    CountryName NVARCHAR(100) NOT NULL
);

--Modify the Addresses Table:
CREATE TABLE dbo.Addresses (
    AddressID INT IDENTITY(1,1) PRIMARY KEY,
    CustomerID INT,
    AddressLine1 NVARCHAR(200),
    AddressLine2 NVARCHAR(200),
    City NVARCHAR(100),
    State NVARCHAR(100),
    PostalCode NVARCHAR(20),
    CountryCode CHAR(2),
    FOREIGN KEY (CustomerID) REFERENCES dbo.Customers(CustomerID),
    FOREIGN KEY (CountryCode) REFERENCES dbo.Countries(CountryCode)
);

Prompt:
Extend the Countries table with ISO 3 characters code and ISO Number.

Output:

CREATE TABLE dbo.Countries (
    CountryCode CHAR(2) PRIMARY KEY, -- ISO 3166-1 alpha-2 code
    CountryName NVARCHAR(100) NOT NULL,
    ISO3Code CHAR(3), -- ISO 3166-1 alpha-3 code
    ISONumericCode INT -- ISO 3166-1 numeric code
);

Prompt:
Create a Regions table with Region code and description that allows to group the countries in territories. A country can be assigned to multiple regions.

Output:

-- Create the Regions Table:
CREATE TABLE dbo.Regions (
    RegionCode CHAR(3) PRIMARY KEY,
    RegionDescription NVARCHAR(100) NOT NULL
);

-- Create the Countries Table (including the previous columns):
CREATE TABLE dbo.Countries (
    CountryCode CHAR(2) PRIMARY KEY, -- ISO 3166-1 alpha-2 code
    CountryName NVARCHAR(100) NOT NULL,
    ISO3Code CHAR(3), -- ISO 3166-1 alpha-3 code
    ISONumericCode INT -- ISO 3166-1 numeric code
);

-- Create the CountryRegions Junction Table:
CREATE TABLE dbo.CountryRegions (
    CountryCode CHAR(2),
    RegionCode CHAR(3),
    PRIMARY KEY (CountryCode, RegionCode),
    FOREIGN KEY (CountryCode) REFERENCES dbo.Countries(CountryCode),
    FOREIGN KEY (RegionCode) REFERENCES dbo.Regions(RegionCode)
);

Prompt:
Create a view based on Customers that includes all the above tables. 

Output:

CREATE VIEW CustomerDetails AS
SELECT 
    c.CustomerID,
    c.CustomerName,
    c.ContactName,
    a.AddressLine1,
    a.AddressLine2,
    a.City,
    a.State,
    a.PostalCode,
    co.CountryCode,
    co.CountryName,
    co.ISO3Code,
    co.ISONumericCode,
    r.RegionCode,
    r.RegionDescription
FROM 
    dbo.Customers c
LEFT JOIN 
    dbo.Addresses a ON c.CustomerID = a.CustomerID
LEFT JOIN 
    dbo.Countries co ON a.CountryCode = co.CountryCode
LEFT JOIN 
    dbo.CountryRegions cr ON co.CountryCode = cr.CountryCode
LEFT JOIN 
    dbo.Regions r ON cr.RegionCode = r.RegionCode;

Prompt:
In the above view use 3 letter encoding for table's name. 

Output:

CREATE VIEW CustomerDetails AS
SELECT 
    cus.CustomerID,
    cus.CustomerName,
    cus.ContactName,
    add.AddressLine1,
    add.AddressLine2,
    add.City,
    add.State,
    add.PostalCode,
    cou.CountryCode,
    cou.CountryName,
    cou.ISO3Code,
    cou.ISONumericCode,
    reg.RegionCode,
    reg.RegionDescription
FROM 
    dbo.Customers cus
LEFT JOIN 
    dbo.Addresses add ON cus.CustomerID = add.CustomerID
LEFT JOIN 
    dbo.Countries cou ON add.CountryCode = cou.CountryCode
LEFT JOIN 
    dbo.CountryRegions cr ON cou.CountryCode = cr.CountryCode
LEFT JOIN 
    dbo.Regions reg ON cr.RegionCode = reg.RegionCode;

Unfortunately, the definition uses the "add" reserved word for addresses as prefix. Therefore, the following prompt:

Prompt:
Modify the above view to use 3-letter aliases that aren't as reserved words. 

Output:

CREATE VIEW dbo.CustomerDetails AS
SELECT 
    cus.CustomerID, -- Customer ID
    cus.CustomerName, -- Customer name
    cus.ContactName, -- Contact name
    adr.AddressLine1, -- First line of the address
    adr.AddressLine2, -- Second line of the address
    adr.City, -- City of the address
    adr.State, -- State or region of the address
    adr.PostalCode, -- Postal or ZIP code of the address
    cou.CountryCode, -- Country code
    cou.CountryName, -- Country name
    cou.ISO3Code, -- ISO 3166-1 alpha-3 code
    cou.ISONumericCode, -- ISO 3166-1 numeric code
    reg.RegionCode, -- Region code
    reg.RegionDescription -- Region description
FROM 
    dbo.Customers cus
LEFT JOIN 
    dbo.Addresses adr ON cus.CustomerID = adr.CustomerID
LEFT JOIN 
    dbo.Countries cou ON adr.CountryCode = cou.CountryCode
LEFT JOIN 
    dbo.CountryRegions cr ON cou.CountryCode = cr.CountryCode
LEFT JOIN 
    dbo.Regions reg ON cr.RegionCode = reg.RegionCode;

Prompt:

Modify the schema of the above objects to include the schema name.

Prompt:
Change the definition of the above tables and test whether the tables already exists.

Prompt:
Annotate the definition of the above objects with a comment.

Prompt:
Generate the scripts for inserting 10 customers with different addresses in the above tables.

Prompt:
Aggregate the update and insert commands into the same stored procedure.

Output:

CREATE PROCEDURE dbo.UpsertCustomer
    @CustomerID INT = NULL, -- Optional parameter for insert
    @CustomerName NVARCHAR(100),
    @ContactName NVARCHAR(100),
    @Country NVARCHAR(50)
AS
BEGIN
    IF EXISTS (SELECT 1 FROM dbo.Customers WHERE CustomerID = @CustomerID)
    BEGIN
        -- Update existing customer
        UPDATE dbo.Customers
        SET 
            CustomerName = @CustomerName,
            ContactName = @ContactName,
            Country = @Country
        WHERE 
            CustomerID = @CustomerID;
    END
    ELSE
    BEGIN
        -- Insert new customer
        INSERT INTO dbo.Customers (CustomerName, ContactName, Country)
        VALUES (@CustomerName, @ContactName, @Country);
    END
END;

I used to create similar code in the past to consolidate the logic for inserts and updates within the same stored procedure. This allows to reduce the number of objects created in a database and minimize the impact of some other issues (e.g. code duplication for validation).

I also use the comma before each line of text where it applies (and not at the end). However, this can be addressed with the following prompt:

Prompt: 
Rewrite the above database objects by butting the comma in front of each line where it applies.

Prompt:
Modify the view so each join condition is written on its own line.

Prompt:
Modify the view with create or alter command.

Prompt:
Modify the view to check for null values.

Comments:
1) I could use the scripts also in a SQL database, though one must pay attention to the order in which the steps must be run.
2) All these are basic steps; so, a natural question: how far can we go? One can generate further objects, and knowledge about the data model seems to be available in Copilot, though normal data models are much more complex than this! Probably, there's a limit beyond which Copilot will start to be inefficient as long the data model is not available explicitly. SQL Server Copilot would help probably to overcome such limitations, though the feature is still in limited preview.
3) I wonder whether given a set of best practices, Copilot will be able to use them and create the CRUD objects accordingly. More experiments are needed though.
4) It will be interesting to observe how much the prompts generated by other people lead to similar or different outcomes. (I observed that nontechnical prompts generated in the past different outcomes.)
5) The fact that I was able to change some formatting (e.g. comma before each line) for all objects with just a prompt frankly made my day! I can't stress enough how many hours of work the unformatted code required in the past! It will be interesting to check if this can be applied also to a complex database schema.

Happy coding!

Previous Post <<||>> Next Post

References:
[1] Microsoft 365 (2024) Overview of prompts [link]

Resources:
[R1] Microsoft 365 (2025) Microsoft 365 Copilot release notes [link]
[R2] Microsoft 365 (2025) What’s new in Microsoft 365 Copilot | Jan 2025 [link]
[R3] Microsoft 365 (2025) Copilot is now included in Microsoft 365 Personal and Family [link]

Acronyms:
LLM - large language model
CRUD - Create, Read, Update, Delete

13 February 2025

🏭💠🗒️Microsoft Fabric: SQL Analytics Endpoint [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 12-Feb-2024

[Microsoft Fabric] SQL Analytics Endpoint

  • {def} a service that listens actively for requests providing a SQL-based experience for lakehouse Delta tables [1]
    • enables to query data in the lakehouse using T-SQL language and TDS protocol [1]
    • created by default for each lakehouses and mirrored databases provisioned in a workspace
      • every lakehouse has one SQL analytics endpoint [1]
    • manages the automatically generated tables so the workspace users can't modify them [1]
  • {feature} a background process is responsible for 
    • scanning lakehouse for changes [1]
      • automatically generates a table in the appropriate schema for every Delta table in the Lakehouse
        • tables are created with a minor delay [1]
        • data is refreshed automatically
          • the amount of time it takes to refresh the table is related to how optimized the Delta tables are [1]
        • {alternative} one can manually force a refresh of the automatic metadata scanning in the Fabric portal [1]
    • keeping SQL analytics endpoint up-to-date for all the changes committed to lakehouses in a workspace [1]
      • {characteristic} transparently managed
        • ⇐ the sync process is transparently managed by Microsoft Fabric platform [1]
        • when a change is detected in the lakehouse
          • a background process updates metadata [1]
          • the SQL analytics endpoint reflects the changes committed to lakehouse tables [1]
      • {characteristic} low latency
        • under normal operating conditions, the lag between a lakehouse and SQL analytics endpoint is less than one minute [1]
        • the actual length of time can vary from a few seconds to minutes [1]
      • the metadata synchronization is automatically triggered when the SQL Endpoint is opened [3]
        • ensures the SQL Analytics Endpoint remains up to date without the need for manual syncs [3]
          •  ⇐ provides a seamless experience [3]
  • {feature} can be enriched by adding database objects
    • schemas
    • views
    • procedures
    • other database objects
  • {feature} automatic metadata discovery
    • tracks changes committed to lakehouses [1]
      • is a single instance per Fabric workspace [1]
      • {issue} increased latency for changes to sync between lakehouses and SQL analytics endpoint, it could be due to large number of lakehouses in one workspace [1]
        • {resolution} migrate each lakehouse to a separate workspace [1]
          • allows automatic metadata discovery to scale [1]
      • {issue} changes committed to a lakehouse are not visible in the associated SQL analytics endpoint
        • create a new table in the  [1]
        • many committed rows may not be visible in the endpoint [1]
        • ⇐ may be impacted by ETL processing that generate large volumes of changes
        • {recommendation} initiate an on-demand metadata sync, triggered from the SQL query editor Refresh ribbon option [1]
          • forces an on-demand metadata sync, rather than waiting on the background metadata sync to finish [1]
      • {issue} if there's no maintenance scheduled for the parquet files, this can result in read overhead and this impacts time it takes to sync changes to SQL analytics endpoint [1]
        • {recommendation}schedule regular lakehouse table maintenance operations [1]
      • {limitation} not all Delta features are understood by the automatic sync process [1]
  • {feature} endpoint reprovisioning
    • if the initial provisioning attempt fails, users have the option to try again [2]
      • ⇐ without the need to create an entirely new lakehouse [2]
      • {benefit} empowers users to self-mitigate provisioning issues in convenient way in the UI avoiding the need for complete lakehouse re-creation [2]

References:
[1] Microsoft Learn (2024) Microsoft Fabric: SQL analytics endpoint performance considerations [link]
[2] Microsoft Learn (2024) Microsoft Fabric: What is the SQL analytics endpoint for a lakehouse? [link]
[3] Microsoft Learn (2024) Microsoft Fabric: What’s new in the Fabric SQL Analytics Endpoint? [link]

09 February 2025

🏭🗒️Microsoft Fabric: Data Pipelines [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Feb-2024

[Microsoft Fabric] Data pipeline

  • {def} a logical sequence of activities that orchestrate a process and perform together a task [1]
    • usually by extracting data from one or more sources and loading it into a destination; 
      • ⇐ often transforming it along the way [1]
      • ⇐ allows to manage the activities as a set instead of each one individually [2]
      • ⇐ used to automate ETL processes that ingest transactional data from operational data stores into an analytical data store [1]
      • e.g. lakehouse or data warehouse 
  • {concept} activity
    • {def} an executable task in a pipeline
      • a flow of activities can be defined by connecting them in a sequence [1]
      • its outcome (success, failure, or completion) can be used to direct the flow to the next activity in the sequence [1]
    • {type} data movement activities
      • copies data from a source data store to a sink data store [2]
    • {type} data transformation activities
      • encapsulate data transfer operations
        • incl. simple Copy Data activities that extract data from a source and load it to a destination
        • incl. complex Data Flow activities that encapsulate dataflows (Gen2) that apply transformations to the data as it is transferred
        • incl. notebook activities to run a Spark notebook
        • incl. stored procedure activities to run SQL code
        • incl. delete data activities to delete existing data
    • {type} control flow activities
        • used to 
          • implement loops
          • implement conditional branching
          • manage variables
          • manage parameter values
          • enable to implement complex pipeline logic to orchestrate data ingestion and transformation flow  [1]
        • can be parameterized
          • ⇐enabling to provide specific values to be used each time a pipeline is run  [1]
        • when executed, a run is initiated (aka data pipeline run
          • runs can be initiated on-demand or scheduled to start at a specific frequency
          • use the unique run ID to review run details to confirm they completed successfully and investigate the specific settings used for each execution [1]
      • {benefit} increases pipelines’ reusability
    • {concept} pipeline template
      • predefined pipeline that can be used and customize as required
    • {concept} data pipeline run
      • occurs when a data pipeline is executed
      • the activities in the data pipeline are executed to completion [3] 
      • can be triggered one of two ways
        • on-demand
        • on a schedule
          • the scheduled pipeline will be able to run based on the time and frequency set [3]

    Previous Post <<||>> Next Post

    References:
    [1] Microsoft Learn (2023) Use Data Factory pipelines in Microsoft Fabric [link]
    [2] Microsoft Learn (2024) Microsoft Fabric: Activity overview [link]
    [3] Microsoft Learn (2024) Microsoft Fabric Concept: Data pipeline Runs [link]

    Resources
    [R1] Metadata Driven Pipelines for Microsoft Fabric (link)
    [R2] 

    06 February 2025

    🌌🏭KQL Reloaded: First Steps (Part V: Database Metadata)

    When working with a new data repository, one of the first things to do is to look at database's metadata, when available, and try to get a birds eye view of what's available, how big is the databases in terms of size, tables and user-defined objects, how the schema was defined, how the data are stored, eventually how often backup are taken, what users have access and to what, etc. 

    So, after creating some queries in KQL and figuring out how things work, I tried to check what metadata are available, how it can be accessed, etc. The target is not to provide a full list of the available metadata, but to understand what information is available, in what format, how easy is to extract the important metadata, etc. 

    So, the first set of metadata is related to database:

    // get database metadata metadata
    .show databases (ContosoSales)
    
    // get database metadata metadata (multiple databases)
    .show databases (ContosoSales, Samples)
    
    // get database schema metadata
    .show databases (ContosoSales) schema
    
    // get database schema metadata (multiple databases) 
    .show databases (ContosoSales, Samples) schema
    
    // get database schema violations metadata
    .show database ContosoSales schema violations
    
    // get database entities metadata
    .show databases entities with (showObfuscatedStrings=true)
    | where DatabaseName == "ContosoSales"
    
    // get database metadata 
    .show databases entities with (resolveFunctionsSchema=true)
    | where DatabaseName == "ContosoSales" and EntityType == "Table"
    //| summarize count () //get the number of tables
    
    // get a function's details
    .show databases entities with (resolveFunctionsSchema=true)
    | where DatabaseName == "ContosoSales" 
        and EntityType == "Function" 
        and EntityName == "SalesWithParams"
    
    // get external tables metadata
    .show external tables
    
    // get materialized views metadata
    .show materialized-views
    
    // get query results metadata
    .show stored_query_results
    
    // get entities groups metadata
    .show entity_groups
    

    Then, it's useful to look at the database objects. 

    // get all tables 
    .show tables 
    //| count

    // get tables metadata
    .show tables (Customers, NewSales)
    
    // get tables schema
    .show table Customers cslschema
    
    // get schema as json
    .show table Customers schema as json
    
    // get table size: Customers
    Customers
    | extend sizeEstimateOfColumn = estimate_data_size(*)
    | summarize totalSize_MB=round(sum(sizeEstimateOfColumn)/1024.00/1024.00,2)
    

    Unfortunately, the public environment has restrictions in what concerns the creation of objects, while for the features available one needs to create some objects to query the corresponding metadata.

    Furthermore, it would be interesting to understand who has access to the various repositories, what policies were defined, and so on. 

    // get principal roles
    .show database ContosoSales principal roles
    
    // get principal roles for table
    .show table Customers principal roles
    
    // get principal roles for function:
    .show function SalesWithParams principal roles
    
    // get retention policies
    .show table Customers policy retention
    
    // get sharding policies
    .show table Customers policy sharding
    

    There are many more objects one can explore. It makes sense to document the features, respectively the objects used for the various purposes.

    In addition, one should check also the best practices available for the data repository (see [2]).

    Happy coding!

    Previous Post <<||>> Next Post

    References:
    [1] Microsoft Learn (2024) Management commands overview [link]
    [2] Microsoft Learn (2024) Kusto: Best practices for schema management [link]

    22 January 2025

    🏭🗒️Microsoft Fabric: Clone Tables in Warehouses [Notes]

    Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

    Last updated: 22-Jan-2025

    [Microsoft Fabric] Zero-copy Clone

    • {def} a replica of an existing OneLake table created by copying existing table's metadata and referencing its data files [1]
      • the metadata is copied while the underlying data of the table stored as parquet files is not copied [1]
      • its creation is like creating a delta table [1]
      • DML/DDL changes on the source 
        • are not reflected in the clone table [1]
        • are not reflected on the source [1]
      • can be created within or across schemas in a warehouse [1]
      • created based on either:
        • current point-in-time
          • based on the present state of the table [1]
        • previous point-in-time
          • based on a point-in-time up to seven days in the past
            • the table clone contains the data as it appeared at a desired past point in time
            • all CRUD operations are retained for seven calendar days
          • created with a timestamp based on UTC
    • {characteristic} autonomous existence
      • the original source and the clones can be deleted without any constraints [1]
      • once a clone is created, it remains in existence until deleted by the user [1]
    • {characteristic} inherits 
      • object-level SQL security from the source table of the clone [1]
        • DENY permission can be set on the table clone if desired [1]
          • the workspace roles provide read access by default [1]
      • all attributes that exist at the source table, whether the clone was created within the same schema or across different schemas in a warehouse [1]
      • the primary and unique key constraints defined in the source table [1]
    • a read-only delta log is created for every table clone that is created within the Warehouse [1]
    • {benefit} facilitates development and testing processes 
      • by creating copies of tables in lower environments [1]
    • {benefit} provides consistent reporting and zero-copy duplication of data for analytical workloads and ML modeling and testing [1]
    • {benefit} provides the capability of data recovery in the event of a failed release or data corruption by retaining the previous state of data [1]
    • {benefit} helps create historical reports that reflect the state of data as it existed as of a specific point-in-time in the past [1]
    • {limitation} table clones across warehouses in a workspace are not currently supported [1]
    • {limitation} table clones across workspaces are not currently supported [1]
    • {limitation} clone table is not supported on the SQL analytics endpoint of the Lakehouse [1]
    • {limitation} clone of a warehouse or schema is currently not supported [1]
    • {limitation} table clones submitted before the retention period of seven days cannot be created [1]
    • {limitation} cloned tables do not currently inherit row-level security or dynamic data masking [1]
    • {limitation} changes to the table schema prevent a clone from being created prior to the table schema change [1]
    • {best practice} create the clone tables in dedicated schema(s)
    • [syntax] CREATE TABLE <schema.clone_table_name> AS CLONE OF <schema.table_name>

    Previous Post  <<||>> Next Post

    References:
    [1] Microsoft Learn (2023) Clone table in Microsoft Fabric [link]
    [2] Microsoft Learn (2024) Tutorial: Clone tables in the Fabric portal [link]
    [3] Microsoft Learn (2024) Tutorial: Clone a table with T-SQL in a Warehouse [link]
    [4] Microsoft Learn (2024) SQL: CREATE TABLE AS CLONE OF [link]

    17 January 2025

    💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part VIII: Permissions) [new feature]

    Data-based solutions usually target a set of users who (ideally) have restricted permissions to the functionality. Therefore, as part of the process are defined several personas that target different use cases, for which the permissions must be restricted accordingly. 

    In the simplest scenario the user must have access to the underlying objects for querying the data. Supposing that an Entra User was created already, the respective user must be given access also in the Fabric database (see [1], [2]). From database's main menu follow the path to assign read permissions:
    Security >> Manage SQL Security >> (select role: db_datareader)

    Manage SQL Security
    Manage SQL Security

    Manage access >> Add >> (search for User)

    Manage access
    Manage access

    (select user) >> Share database >> (select additional permissions) >> Save

    Manage additional permissions
    Manage additional permissions

    The easiest way to test whether the permissions work before building the functionality is to login over SQL Server Management Studio (SSMS) and check the access using the Microsoft Entra MFA. Ideally, one should have a User's credentials that can be used only for testing purposes. After the above setup was done, the new User was able to access the data. 

    A second User can be created for testing with the maximum of permissions allowed on the SQL database side, which is useful for troubleshooting. Alternatively, one can use only one User for testing and assign or remove the permissions as needed by the test scenario. 

    It's a good idea to try to understand what's happening in the background. For example, the expectation was that for the Entra User created above also a SQL user is created, which doesn't seem to be the case, at least per current functionality available. 

     Before diving deeper, it's useful to retrieve User's details: 

    -- retrieve current user
    SELECT SUser_Name() sys_user_name
    , User_Id() user_id 
    , USER_NAME() user_name
    , current_user [current_user]
    , user [user]; 
    
    Output:
    sys_user_name user_id user_name current_user user
    JamesClavell@[domain].onmicrosoft.com 0 JamesClavell@[domain].onmicrosoft.com JamesClavell@[domain].onmicrosoft.com JamesClavell@[domain].onmicrosoft.com

    Retrieving the current User is useful especially when testing in parallel functionality with different Users. Strangely, User's ID is 0 when only read permissions were assigned. However, a valid User identifier is added for example when to the User is assigned also the db_datawriter role. Removing afterwards the db_datawriter role to the User keeps as expected User's ID. For troubleshooting purposes, at least per current functionality, it might be a good idea to create the Users with a valid User ID (e.g. by assigning temporarily the db_datawriter role to the User). 

    The next step is to look at the Users with access to the database:

    -- database access 
    SELECT USR.uid
    , USR.name
    --, USR.sid 
    , USR.hasdbaccess 
    , USR.islogin
    , USR.issqluser
    --, USR.createdate 
    --, USR.updatedate 
    FROM sys.sysusers USR
    WHERE USR.hasdbaccess = 1
      AND USR.islogin = 1
    ORDER BY uid
    
    Output:
    uid name hasdbaccess islogin issqluser
    1 dbo 1 1 1
    6 CharlesDickens@[...].onmicrosoft.com 1 1 0
    7 TestUser 1 1 1
    9 JamesClavell@[...].onmicrosoft.com 1 1 0

    For testing purposes, besides the standard dbo role and two Entra-based roles, it was created also a SQL role to which was granted access to the SalesLT schema (see initial post):

    -- create the user
    CREATE USER TestUser WITHOUT LOGIN;
    
    -- assign access to SalesLT schema 
    GRANT SELECT ON SCHEMA::SalesLT TO TestUser;
      
    -- test impersonation (run together)
    EXECUTE AS USER = 'TestUser';
    
    SELECT * FROM SalesLT.Customer;
    
    REVERT; 

    Notes:
    1) Strangely, even if access was given explicitly only to the SalesLT schema, the TestUser User has access also to sys.sysusers and other DMVs. That's valid also for the access over SSMS
    2) For the above created User there are no records in the sys.user_token and sys.login_token DMVs, in contrast with the user(s) created for administering the SQL database. 

    Let's look at the permissions granted explicitly:

    -- permissions granted explicitly
    SELECT DPR.principal_id
    , DPR.name
    , DPR.type_desc
    , DPR.authentication_type_desc
    , DPE.state_desc
    , DPE.permission_name
    FROM sys.database_principals DPR
         JOIN sys.database_permissions DPE
    	   ON DPR.principal_id = DPE.grantee_principal_id
    WHERE DPR.principal_id != 0 -- removing the public user
    ORDER BY DPR.principal_id
    , DPE.permission_name;
    
    Result:
    principal_id name type_desc authentication_type_desc state_desc permission_name
    1 dbo SQL_USER INSTANCE GRANT CONNECT
    6 CharlesDickens@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT AUTHENTICATE
    6 CharlesDickens@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT CONNECT
    7 TestUser SQL_USER NONE GRANT CONNECT
    7 TestUser SQL_USER NONE GRANT SELECT
    9 JamesClavell@[...].onmicrosoft.com EXTERNAL_USER EXTERNAL GRANT CONNECT

    During troubleshooting it might be useful to check current user's permissions at the various levels via sys.fn_my_permissions:

    -- retrieve database-scoped permissions for current user
    SELECT *
    FROM sys.fn_my_permissions(NULL, 'Database');
    
    -- retrieve schema-scoped permissions for current user
    SELECT *
    FROM sys.fn_my_permissions('SalesLT', 'Schema');
    
    -- retrieve object-scoped permissions for current user
    SELECT *
    FROM sys.fn_my_permissions('SalesLT.Customer', 'Object')
    WHERE permission_name = 'SELECT';
    

    Notes:
    1) See also [1] and [4] in what concerns the limitations that apply to managing permissions in SQL databases.

    Happy coding!

    Previous Post <<||>> Next Post

    References:
    [1] Microsoft Learn (2024) Microsoft Fabric: Share your SQL database and manage permissions [link]
    [2] Microsoft Learn (2024) Microsoft Fabric: Share data and manage access to your SQL database in Microsoft Fabric  [link]
    [3] Microsoft Learn (2024) Authorization in SQL database in Microsoft Fabric [link]
    [4] Microsoft Learn (2024) Authentication in SQL database in Microsoft Fabric [link]

    [5] Microsoft Fabric Learn (2025) Manage access for SQL databases in Microsoft Fabric with workspace roles and item permissions [link

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.