Showing posts with label data lineage. Show all posts
Showing posts with label data lineage. Show all posts

06 April 2024

🏭🗒️Microsoft Fabric: Data Governance [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 23-May-2024

[Microsoft Fabric] Data Governance

  • {definition}set of capabilities that help organizations to manage, protect, monitor, and improve the discoverability of data, so as to meet data governance (and compliance) requirements and regulations [2]
  • several built-in governance features are available to manage and control the data within Fabric (MF)  [1]
  • {feature} endorsement [aka content endorsement
    • {definition} formal process performed by admins to endorse MF items
    • {benefit} allows admins to designate specific MF items as trusted and approved for use across the organization [1]
      • establishes trust in data assets by promoting and certifying specific MF items [1]
        • users know which assets they can trust and rely on for accurate information [1]
      • endorsed assets are identified with a badge that indicates they have been reviewed and approved [1]
    • {scope} applies to all MF items except dashboards [1]
    • {benefit} helps admin manage the overall growth of items across your environment [1]
  • {feature} promoting [aka content promoting
    • {definition} formal process performed by contributors or admins to promote content
    • promoted content appears with a Promoted badge in the MF portal [1]
      • workspace members with the contributor or admin role can promote content within a workspace [1]
      • MF admin can promote content across the organization [1]
  • {feature} certification [aka content certification]
    • {definition} formal process that involves a review of the content by a designated reviewer and managed by the admin [1]
      • can be customized to meet organization’s needs [1]
      • users can request item certification from an admin [1]
        • via Request certification from the More menu [1]
      • the certified content appears with a Certified badge in the Fabric portal [1]
    • {benefit} allows organizations to label items considered to be quality items [1]
      • an organization can certify items to identify them an as authoritative sources for critical information [1]
        • ⇐ all Fabric items except Power BI dashboards can be certified [1]
    • {benefit} allows to specify certifiers who are experts in the domain [1]
    • domain level settings
      • enable or disable certification of items that belong to the domain [1]
    • provides a URL to documentation that is relevant to certification in the domain [1]
  • {feature} tenant (aka Microsoft Fabric tenant, MF tenant)
    • a single instance of Fabric for an organization that is aligned with a Microsoft Entra ID
    • can contain any number of workspaces
  • {feature} workspaces
    • {definition} a collection of items that brings together different functionality in a single environment designed for collaboration
    • can be assigned to teams or departments based on governance requirements and data boundaries [2]
    • are associated with domains [3]
      • ⇐ {benefit} allows to group data into business domains
      • all the items in the workspace are then associated with the domain, and they receive a domain attribute as part of their metadata [3]
        • ⇐ {benefit} enables a better consumption experience [1]
        • {benefit} enables better discoverability and governance [2]
  • {feature} domains [Notes]
    • {definition} a way of logically grouping together data in an organization that is relevant to a particular area or field [1]
    • allows to group data by business domains
      • ⇒{benefit} allows business domains to manage their data according to their specific regulations, restrictions, and needs [3]
    • {feature} subdomains
      • {definition} a way for fine tuning the logical grouping data under a domain [1]
        • ⇐ subdivisions of a domain
  • {feature} labeling
    • default labeling, label inheritance, and programmatic labeling, 
    • {benefit} help achieve maximal sensitivity label coverage across MF [2]
    • once labeled, data remains protected even when it's exported out of MF via supported export paths [2]
    • [Purview Audit] compliance admins can monitor activities on sensitivity labels
  • {feature|preview} folders
    • {definition} a way of logically grouping MF items
  • {feature|preview} tags
    • {benefit} allow managing Fabric items for enhanced compliance, discoverability, and reuse
  • {feature} scanner API
    • a set of admin REST APIs 
    • {benefit} allows to scan MF items for sensitive data [1]
    • can be used to scan both structured and unstructured data [1]
    • {concept} metadata scanning
      • facilitates governance of data by enabling cataloging and reporting on all the metadata of organization's Fabric items [1]
      • it needs to be set up by Admin before metadata scanning can be run [1]
  • {concept} data lineage
    • {definition} 
    • {benefit} allows to track the flow of data through Fabric [1]
    • {benefit} allows to see where data comes from, how it's transformed, and where it goes [1]
    • {benefit} helps understand the data available in Fabric, and how it's being used [1]
  • {concept} Fabric item (aka MF item)
    • {definition} a set of capabilities within an experience
      • form the building blocks of the Fabric platform
    • {type} data warehouse
    • {type} data pipeline
    • {type} semantic model
    • {type} reports
    • {type} dashboards
    • {type} notebook
    • {type} lakehouse
    • {type} metric set

Acronyms:
API - Application Programming Interface
MF - Microsoft Fabric

Resources:
[1] Microsoft Learn (2023) Administer Microsoft Fabric (link)
[2] Microsoft Learn - Fabric (2024) Governance overview and guidance (link)
[3] Microsoft Learn: Fabric (2023) Fabric domains (link)
[4] Establishing Data Mesh architectural pattern with Domains and OneLake on Microsoft Fabric, by Maheswaran Arunachalam (link

12 April 2017

⛏️Data Management: Data Lineage (Definitions)

 "A mechanism for recording information to determine the source of any piece of data, and the transformations applied to that data using Data Transformation Services (DTS). Data lineage can be tracked at the package and row levels of a table and provides a complete audit trail for information stored in a data warehouse. Data lineage is available only for packages stored in Microsoft Repository." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"This information is used by Data Transformation Services (DTS) when it works in conjunction with Meta Data Services. This information records the history of package execution and data transformations for each piece of data." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"This is also called data provenance. It deals with the origin of data; it is all about documenting where data is, how it has been derived, and how it flows so you can manage and secure it appropriately as it is further processed by applications." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

"This provides the functionality to determine where data comes from, how it is transformed, and where it is going. Data lineage metadata traces the lifecycle of information between systems, including the operations that are performed on the data." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Data lineage refers to a set of identifiable points that can be used to understand details of data movement and transformation (e.g., transactional source field names, file names, data processing job names, programming rules, target table fields). Lineage describes the movement of data through systems from its origin or provenance to its use in a particular application. Lineage is related to both the data chain and the information life cycle. Most people concerned with the lineage of data want to understand two aspects of it: the data’s origin and the ways in which the data has changed since it was originally created. Change can take place within one system or between systems." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.