Showing posts with label masking. Show all posts
Showing posts with label masking. Show all posts

25 March 2025

🏭🗒️Microsoft Fabric: Security in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 25-Mar-2024

[Microsoft Fabric] Security in Warehouse
  • {def} suite of technologies aimed at safeguarding sensitive information in Fabric [1]
    • leverages SQL engine’s security features [1]
      • allows for security mechanism at the warehouse level [1]
      • ⇐ the warehouse and SQL analytics endpoint items also allow for the defining of native SQL security [4]
        • the permissions configured only apply to the queries executed against the respective surfaces [4]
      • the access to OneLake data is controlled separately through OneLake data access roles [4]
        • {recommendation} to ensure users with SQL specific permissions don't see data they don't have SQL access to, don't include those users in a OneLake data access role [4]
    • supports a range of data protection features that enable administrators to shield sensitive data from unauthorized access [1]
      • ⇐ across warehouses and SQL analytics endpoints without necessitating changes to applications [1]
    • {type} object-level security (OLS)
      • permissions governing DML operations [1]
        • applies to tables and views
        • ⇐ when denied, the user will be prevented from performing the respective operation
        • SELECT
          • allows users to view the data within the object [1]
        • INSERT
          • allows users to insert data in the object [1]
        • UPDATE
          • allows users to update data within the object [1]
        • DELETE
          • allows users to delete the data within the object [1]
      • permissions can be granted, revoked or denied on database objects [1]
        •  tables and views
        • GRANT
          • permission is granted to user or role [1]
        • DENY
          • permissions is denied to user or role [1]
        • REVOKE
          • permissions is revoked to user or role [1]
        • ALTER
          • grants the user the ability to change the definition of the object [1]
        • CONTROL
          • grants the user all rights to the object [1]
      • {principle} least privilege
        • users and applications should only be given the permissions needed in order for them to complete the task
    • {type} column-level security (CLS)
      • allows to restrict column access to sensitive data [1]
        • provides granular control over who can access specific pieces of data [1]
          •  enhances the overall security of the data warehouse [1]
      • steps
        • identify the sensitive columns [1]
        • define access roles [1]
        • assign roles to users [1]
        • implement access control [1]
          • restrict access to ta column based on user's role [1]
    • {type} row-level security (RLS)
      • provides granular control over access to rows in a table based on group membership or execution context [1]
        • using WHERE clause filters [1]
      • works by associating a function (aka security predicate) with a table [1]
        • defined to return true or false based on certain conditions [1]
          • ⇐ typically involving the values of one or more columns in the table [1]
          • when a user attempts to access data in the table, the security predicate function is invoked [1]
            • if the function returns true, the row is accessible to the user; otherwise, the row doesn't show up in the query results [1]
        • the predicate can be as simple/complex as required [1]
        • the process is transparent to the user and is enforced automatically by SQL Server
          • ⇐ ensures consistent application of security rules [1]
      • implemented in two main steps:
        • filter predicates 
          • an inline table-valued function that filters the results based on the predicate defined [1]
        • security policy
          • invokes an inline table-valued function to protect access to the rows in a table [1]
            • because access control is configured and applied at the warehouse level, application changes are minimal - if any [1]
            • users can directly have access to the tables and can query their own data [1]
      • {recommendation} create a separate schema for predicate functions, and security policies [1]
      • {recommendation} avoid type conversions in predicate functions [1]
      • {recommendation} to maximize performance, avoid using excessive table joins and recursion in predicate functions [1]
    • {type} dynamic data masking (DDM) 
      • allows to limits data exposure to nonprivileged users by obscuring sensitive data [1]
        • e.g. email addresses 
      • {benefit} enhance the security and manageability of the data [1]
      • {feature} real-time masking
        • when querying sensitive data, DDM applies dynamic masking to it in real time [1]
          • the actual data is never exposed to unauthorized users, thus enhancing the security of your data [1]
        • straightforward to implement [1]
        • doesn’t require complex coding, making it accessible for users of all skill levels [1]
        • {benefit} the data in the database isn’t changed when DDM is applied
          •   the actual data remains intact and secure, while nonprivileged users only see a masked version of the data [1]
      • {operation} define masking rule
        • set up at column level [1]
        • offers a suite of features [1]
          • comprehensive and partial masking capabilities [1]
          • supports several masking types
            • help prevent unauthorized viewing of sensitive data [1]
              • by enabling administrators to specify how much sensitive data to reveal [1]
                •   minimal effect on the application layer [1]
            • applied to query results, so the data in the database isn't changed 
              •   allows many applications to mask sensitive data without modifying existing queries  [1]
          • random masking function designed for numeric data [1]
        • {risk} unprivileged users with query permissions can infer the actual data since the data isn’t physically obfuscated [1]
      • {recommendation} DDM should be used as part of a comprehensive data security strategy [1]
        • should include
          • the proper management of object-level security with SQL granular permissions [1]
          • adherence to the principle of minimal required permissions [1]
    • {concept} Dynamic SQL 
      • allows T-SQL statements to be generated within a stored procedure or a query itself [1]
        • executed via sp_executesql stored procedure
      • {risk} SQL injection attacks
        • use  QUOTENAME to sanitize inputs [1]
  • write access to a warehouse or SQL analytics endpoint
    • {approach} granted through the Fabric workspace roles
      • the role automatically translates to a corresponding role in SQL that grants equivalent write access [4]
      • {recommendation} if a user needs write access to all warehouses and endpoints, assign the user to a workspace role [4]
        • use the Contributor role unless the user needs to assign other users to workspace roles [4]
      • {recommendation} grant direct access through SQL permissions if the user only needs to write to specific warehouses or endpoints [4]
    • {approach} grant read access to the SQL engine, and grant custom SQL permissions to write to some or all the data [4]
  • write access to a warehouse or SQL analytics endpoint
    • {approach} grant read access through the ReadData permission, granted as part of the Fabric workspace roles [4]
      •  ReadData permission maps the user to a SQL role that gives SELECT permissions on all tables in the warehouse or lakehouse
        • helpful if the user needs to see all or most of the data in the lakehouse or warehouse [4]
        • any SQL DENY permissions set on a particular lakehouse or warehouse still apply and limit access to tables [4]
        • row and column level security can be set on tables to restrict access at a granular level [4]
    • {approach} grant read access to the SQL engine, and grant custom SQL permissions to read to some or all the data [4]
    • if the user needs access only to a specific lakehouse or warehouse, the share feature provides access to only the shared item [4]
      • during the share, users can choose to give only Read permission or Read + ReadData 
        • granting Read permission allows the user to connect to the warehouse or SQL analytics endpoint but gives no table access [4]
        • granting users the ReadData permissions gives them full read access to all tables in the warehouse or SQL analytics endpoint
      • ⇐ additional SQL security can be configured to grant or deny access to specific tables [4]

    References:
    [1] Microsoft Learn (2024) Secure a Microsoft Fabric data warehouse [link]
    [2] Data Mozart (2025) Lock Up! Understanding Data Access Options in Microsoft Fabric, by Nikola Ilic [link]
    [3] Microsoft Learn (2024) Security in Microsoft Fabric [link]
    [4] Microsoft Learn (2024) Microsoft Fabric: How to secure a lakehouse for Data Warehousing teams [link]

    Resources:
    [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]
    [R2] Microsoft Learn (2025) Fabric: Security for data warehousing in Microsoft Fabric [link]
    [R3] Microsoft Learn (2025) Fabric: Share your data and manage permissions [link]

    Acronyms:
    CLS - Column-Level Security
    DDM - Dynamic Data Masking
    DML - Data Manipulation Language 
    MF - Microsoft Fabric
    OLS - Object-Level Security
    RLS - Row-Level Security
    SQL - Structured Query Language

    06 March 2025

    💎🤖SQL Reloaded: Copilot Stories (Part IX: Working with the Stuff Function)

    One of the underused string functions in SQL Server is Stuff, which returns part of a character, binary, text, or image expression. This post evaluates the answers provided by Microsoft 365 Copilot in the use of the respective function.

    Disclaimer: The below code was generated by Copilot via prompts and is used for demonstration purposes!

    Prompt:
    Give me 10 scenarios with examples in which the Stuff function can be used in SQL Server.

    Output:

    /* Example 1: Inserting a Substring */
    
    -- Insert a substring into a specific position within a string
    SELECT STUFF('Hello World', 6, 0, 'SQL ') AS Result; -- Result: 'Hello SQL World'
    
    /* Example 2: Replacing a Substring */
    
    -- Replace a part of a string with another substring.
    SELECT STUFF('Hello World', 7, 5, 'SQL Server') AS Result; -- Result: 'Hello SQL Server'
    
    /* Example 3: Removing Characters */
    
    -- Remove a specific number of characters from a string
    SELECT STUFF('Hello World', 6, 5, '') AS Result; -- Result: 'Hello'
    
    /* Example 4: Masking Sensitive Data */
    
    -- Mask a credit card number
    SELECT STUFF('1234-5678-9012-3456', 6, 9, '****-****') AS MaskedCard; -- Result: '1234-****-****-3456'
    
    /* Example 5: Formatting Phone Numbers */
    
    -- insert formatting characters into a phone number.
    SELECT STUFF(STUFF('1234567890', 4, 0, '-'), 8, 0, '-') AS FormattedPhone; -- Result: '123-456-7890'
    
    /* Example 6: Creating Initials */
    
    -- extract initials from a full name.
    SELECT STUFF(STUFF('John Doe', 2, 3, '.'), 5, 3, '.') AS Initials; -- Result: 'J.D.'
    
    /* Example 7: Dynamic SQL Generation */
    
    -- modify parts of a dynamic SQL query.
    DECLARE @sql NVARCHAR(MAX) = 'SELECT * FROM Table WHERE Column = Value';
    SELECT STUFF(@sql, CHARINDEX('Value', @sql), LEN('Value'), 'NewValue') AS ModifiedSQL; -- Result: 'SELECT * FROM Table WHERE Column = NewValue'
    
    /* Example 8: URL Manipulation */
    
    -- Insert or replace parts of a URL.
    SELECT STUFF('http://example.com/page', 8, 0, 'www.') AS ModifiedURL; -- Result: 'http://www.example.com/page'
    
    /* Example 9: Handling CSV Data */
    
    -- Modify parts of a CSV string
    SELECT STUFF('1,2,3,4,5', 3, 1, '10') AS ModifiedCSV; -- Result: '1,10,3,4,5'
    
    /* Example 10: Custom String Formatting */ 
    
    -- Insert custom formatting into a string
    SELECT STUFF('20250305', 5, 0, '-') AS FormattedDate; -- Result: '2025-0305'
    

    The first 6 examples are typical usages of the Stuff function while the other might raise the eyebrows of some developers.

    Example 1: The example is not entirely correct (counting the characters starts at 1).

    Example 2: In exchange, this example is correct.

    Example 3: The example is unfortunately incorrect (same issue like in the first example). The technique can be useful when only one contiguous region is to be replaced once. Otherwise, one should use the Replace function. 

    Example 4: Starting with SQL Server 2016 one can use dynamic masking for similar purposes, which should provide more flexibility in many cases. 

    Example 5: Formatting phone number can be more complex than this because there can be different formats, country codes or any other special formats.

    Example 6: Names can be quite complex, each part having different size.

    Example 7: The example might work in certain cases, though it's enough for a field to contain the keyword "value" for the whole logic to generate errors. 

    Example 8: The example is valid, though web addresses can use also other transfer protocols (ftp, ftps, sftp, scp, etc.).

    Example 9: A more reliable method is splitting the string in parts and replacing the value from the nth position then inserting a text when the length of the values is not known.

    Example 10: Even if the technique is valid, the example isn't entirely valid even if it might be met in practice. See 20th example below for a better approach. 

    Prompt:
    Give me 10 more scenarios with examples in which the Stuff function can be used in SQL Server.

    Output:

    /* Example 11: Reversing a string */
    
    -- Reverse a string by replacing characters in a specific order.
    SELECT STUFF(STUFF(STUFF('abcde', 1, 1, 'e'), 2, 1, 'd'), 3, 1, 'c') AS ReversedString; -- Result: 'edcba'
    
    /* Example 12: Generating unique identifiers */
    
    -- Modify parts of a GUID to create a custom identifier.
    SELECT STUFF('123e4567-e89b-12d3-a456-426614174000', 1, 8, 'abcdef12') AS CustomGUID;-- Result: 'abcdef12-e89b-12d3-a456-426614174000'
    
    /* Example 13: Combining columns */
    
    -- Combine multiple columns into a single formatted string.
    SELECT STUFF(FirstName + ' ' + LastName, LEN(FirstName) + 1, 0, ', ') AS FullName
    FROM Employees;-- Result: 'John, Doe'
    
    /* Example 14: Trimming strings */
    
    -- Remove leading or trailing characters from a string.
    SELECT STUFF('   Hello World   ', 1, 3, '') AS TrimmedString;-- Result: 'Hello World   '
    
    /* Example 15: Creating abbreviations */
    
    -- Generate abbreviations from a full string.
    SELECT STUFF('North Atlantic Treaty Organization', 6, 22, 'ATO') AS Abbreviation;-- Result: 'NATO'
    
    /* Example 16: Modifying JSON Data */
    
    -- Insert or replace parts of a JSON string 
    SELECT STUFF('{"name":"John","age":30}', 9, 4, 'Doe') AS ModifiedJSON;-- Result: '{"name":"Doe","age":30}'
    
    /* Example 17: Handling XML Data */
    
    -- Modify parts of an XML string.
    SELECT STUFF('<name>John</name>', 7, 4, 'Doe') AS ModifiedXML;-- Result: '<name>Doe</name>'
    
    /* Example 18: Generating random strings */
    
    -- Insert random characters into a string.
    SELECT STUFF('abcdef', 3, 0, CHAR(65 + ABS(CHECKSUM(NEWID())) % 26)) AS RandomString;-- Result: 'abXcdef' (X is a random character)
    
    /* Example 19: Creating custom error messages  */
    
    -- Insert dynamic values into error messages.
    DECLARE @ErrorMessage NVARCHAR(100) = 'Error: Value {0} is not valid.';
    SELECT STUFF(@ErrorMessage, CHARINDEX('{0}', @ErrorMessage), 3, '123') AS CustomErrorMessage;-- Result: 'Error: Value 123 is not valid.'
    
    /* Example 20: Formatting dates */
    
    -- Insert separators into a date string.
    SELECT STUFF(STUFF('20250305', 5, 0, '-'), 8, 0, '-') AS FormattedDate; -- Result: '2025-03-05'
    

    Example 11: Copilot got the use partially wrong given that each character except the pivoting character needs to be replaced accordingly. So for 6 characters one needs to replace the Stuff function 5 times!

    Example 12:  Implementing custom GUID is a process more complex than this as one needs to take care of not generating duplicates.

    Example 13: This is an example on how to handle changes dynamically.

    Example 14: One should use the Trim function whenever possible, respectively the combination LTrim and RTrim, if Trim is not available (it was introduced in SQL 2017).

    Example 15: The example is incorrect. One should consider the length of the string in the formula. 

    Example 16:  One must know the values in advance, otherwise the example doesn't hold. Moreover, the same issue like in the first example occurs.

    Example 17:  The example in not dynamic.

    Example 18:  It's an interesting technique for generating "random" characters given that unique values are generated across a dataset.

    Example 19: One can write error messages with multiple placeholders, though the Replace function is simpler to use. 

    Example 20: It's easier to cast the value as datetime and apply the required formatting accordingly. Not testing whether the value is a date can lead to curious results.

    I met some of the usages exemplified above, though I used the Stuff function seldom (see a previous post), when no other functions were available. Frankly, Copilot could prove to be a useful tool for learning SQL or other programming language in similar ways.

    Happy coding!

    Previous Post <<||>> Next Post 

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.