24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 27-Jul-2025

-- create schema
CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name>

-- create a materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> 
[(
    CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP 
)] 
[PARTITIONED BY (col1, col2, ... )] 
[COMMENT “description or comment”] 
[TBLPROPERTIES (“key1”=”val1”, “key2”=”val2”, 
AS 
SELECT ...
FROM ...
-- WHERE ...
--GROUP BY ...

[Microsoft Fabric] Materialized Lake Views (MLVs)

  • {def} persisted, continuously updated view of data [1]
    • {benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations
      • simplifies the implementation of multi-stage Lakehouse processing [1]
        • ⇐ aids in the creation, management, and monitoring of views [3]
        • ⇐ improves transformations through a declarative approach [3]
        • streamline data workflows
        • enable developers to focus on business logic [1]
          • ⇐ not on infrastructural or data quality-related issues [1]
        • the views can be created in a notebook [2]
    • {benefit} allows developers visualize lineage across all entities in lakehouse, view the dependencies, and track its execution progress [3]
      • can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
      • empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]
        • faster development cycles 
        • trustworthy data
        • quicker insights
  • {goal} process only the new or changed data instead of reprocessing everything each time [1]
    • ⇐  leverages Delta Lake’s CDF under the hood
      • ⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]
  • {operation} creation
    • allows defining transformations at each layer [1]
      • e.g. aggregation, projection, filters
    • allows specifying certain checks that the data must meet [1]
      • incorporate data quality constraints directly into the pipeline definition
    • via CREATE MATERIALIZED LAKE VIEW
      • the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]
  • {operation} refresh
    • refreshes only when its source has new data [1]
      • if there’s no change, it can skip running entirely (saving time and resources) [1]
    • via REFRESH MATERIALIZED LAKE VIEW [workspace.lakehouse.schema].MLV_Identifier [FULL];
  • {operation} list views from schema [3]
    • via SHOW MATERIALIZED LAKE VIEWS <IN/FROM> Schema_Name;
  • {opetation} retrieve definition
    • via SHOW CREATE MATERIALIZED LAKE VIEW MLV_Identifier;
  • {operstion} update definition
    • via ALTER MATERIALIZED LAKE VIEW MLV_Identifier RENAME TO MLV_Identifier_New;
  • {operstion} drop view
    • via DROP MATERIALIZED LAKE VIEW MLV_Identifier;
    • {warning} dropping or renaming a materialized lake view affects the lineage view and scheduled refresh [3]
    • {recommendation} update the reference in all dependent materialized lake views [3]
  • {operation} schedule view run
    • lets users set how often the MLV should be refreshed based on business needs and lineage execution timing [5]
    • depends on
      • data update frequency: the frequency with which the data is updated [5]
      • query performance requirements: Business requirement to refresh the data in defined frequent intervals [5]
      • system load: optimizing the time to run the lineage without overloading the system [5]
  • {operation} view run history
    • users can access the last 25 runs including lineage and run metadata
      • available from the dropdown for monitoring and troubleshooting
  • {concept} lineage
    • the sequence of MLV that needs to be executed to refresh the MLV once new data is available [5]
  • {feature} automatically generate a visual report that shows trends on data quality constraints 
    • {benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]
  • {feature} can be combined with Shortcut Transformation feature for CSV ingestion 
    • {benefit} facilitate the building of end-to-end Medallion architectures
  • {feature} dependency graph
    • allows to see the dependencies existing between the various objects [2]
      • ⇐ automatically generated [2]
  • {feature} data quality
    • {benefit} allows to compose precise queries to exclude poor quality data from the source tables [5]
    • [medallion architecture] ensuring data quality is essential at every stage of the architecture [5]
    • maintained by setting constraints when defining the MLVs [5]
    • {action} FAIL
      • stops refreshing an MLV if any constraint is violated [5]
      • {default} halt is at the first instance
        • even without specifying the FAIL keyword [5]
        • takes precedence over DROP
    • {action} DROP
      • processes the MLV and removes records that don't meet the specified constrain [5]
        • provides the count of removed records in the lineage view [5]
    • {constraint} updating data quality constraints after creating an MLV isn't supported [5]
      • ⇐ the MLV must be recreated
    • {constraint} the use of functions and pattern search with operators in constraint condition is restricted [5]
      • e.g. LIKE, regex 
    • {known issue} the creation and refresh of an MLV with a FAIL action in constraint may result in a "delta table not found" error
      • {recommendation} recreate the MLV and avoid using the FAIL action [5] 
      • {feature} data quality report
        • built-in Power BI dashboard that shows several aggregated metrics [2]
    • {feature} monitor hub
      • centralized portal to browse MLV runs in the lakehouse [7]
      • {operation} view runs' status [7]
      • {operation} search and filter the runs [7]
        • based on different criteria
      • {operation} cancel in-progress run [7]
      • {operation} drill down run execution details [7]
    • doesn't support
      • {feature|planned} PySpark [3]
      • {feature|planned} incremental refresh [3]
      • {feature|planned} integration with Data Activator [3]
      • {feature|planned} API [3]
      • {feature|planned} cross-lakehouse lineage and execution [3]
      • {limitation} Spark properties set at the session level aren't applied during scheduled lineage refresh [4]
      • {limitation} creation with delta time-travel [4]
      • {limitation} DML statements [4]
      • {limitation} UDFs in CTAS [4] 
      • {limitation} temporary views can't be used to define MLVs [4]
    •  {recommendation} create all MLVs) for a workspace within a single Lakehouss [8]
      • allows to ensure a consolidated and clear representation in the DAG view [8]

    References:
    [1] Microsoft Fabric Update Blog (2025) Simplifying Medallion Implementation with Materialized Lake Views in Fabric [link|aka]
    [2] Power BI Tips (2025) Microsoft Fabric Notebooks with Materialized Views - Quick Tips [link]
    [3] Microsoft Learn (2025) What are materialized lake views in Microsoft Fabric? [link]
    [4] Microsoft Learn (2025) Materialized lake views Spark SQL reference [link]
    [5] Microsoft Learn (2025) Manage Fabric materialized lake views lineage [link] 
    [6] Microsoft Learn (2025) Data quality in materialized lake views [link]
    [7] Microsoft Learn (2025) Monitor materialized lake views [link
    [8] Microsoft Learn (2025) Development and Monitoring [link]

    Resources:
    [R1] Databricks (2025) Use materialized views in Databricks SQL [link]
    [R2] Microsoft Learn (2025) Implement medallion architecture with materialized lake views [link]
    [R3] Mastering Declarative Data Transformations with Materialized Lake Views [link]

    Acronyms:
    API - Application Programming Interface
    CDF - Change Data Feed
    CTAS - Create Tables As Select
    DML - Data Manipulation Language 
    ETL - Extract, Transfer, Load
    MF - Microsoft Fabric
    MLV - Materialized Lake views
    UDF - User-defined functions

    No comments:

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.