Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!
Last updated: 9-Jun-2025
-- create schema CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name> -- create a materialized view CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> [( CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP )]
[PARTITIONED BY (col1, col2, ... )] [COMMENT “description or comment”] [TBLPROPERTIES (“key1”=”val1”, “key2”=”val2”,
AS SELECT ... FROM ... -- WHERE ... --GROUP BY ...
[Microsoft Fabric] Materialized Lake Views (MLVs)
- {def} persisted, continuously updated view of data [1]
- {benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations
- simplifies the implementation of multi-stage Lakehouse processing [1]
- ⇐ aids in the creation, management, and monitoring of views [3]
- ⇐ improves transformations through a declarative approach [3]
- streamline data workflows
- enable developers to focus on business logic [1]
- ⇐ not on infrastructural or data quality-related issues [1]
- the views can be created in a notebook [2]
- {benefit} allows developers visualize lineage across all entities in lakehouse, view the dependencies, and track its execution progress [3]
- can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
- empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]
- faster development cycles
- trustworthy data
- quicker insights
- {goal} process only the new or changed data instead of reprocessing everything each time [1]
- ⇐ leverages Delta Lake’s CDF under the hood
- ⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]
- {operation} creation
- allows defining transformations at each layer [1]
- e.g. aggregation, projection, filters
- allows specifying certain checks that the data must meet [1]
- incorporate data quality constraints directly into the pipeline definition
- via CREATE MATERIALIZED LAKE VIEW
- the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]
- {operation} refresh
- refreshes only when its source has new data [1]
- if there’s no change, it can skip running entirely (saving time and resources) [1]
- via REFRESH MATERIALIZED LAKE VIEW [workspace.lakehouse.schema].MLV_Identifier [FULL];
- {operation} list views from schema [3]
- via SHOW MATERIALIZED LAKE VIEWS <IN/FROM> Schema_Name;
- {opetation} retrieve definition
- via SHOW CREATE MATERIALIZED LAKE VIEW MLV_Identifier;
- {operstion} update definition
- via ALTER MATERIALIZED LAKE VIEW MLV_Identifier RENAME TO MLV_Identifier_New;
- {operstion} drop view
- via DROP MATERIALIZED LAKE VIEW MLV_Identifier;
- {warning} dropping or renaming a materialized lake view affects the lineage view and scheduled refresh [3]
- {recommendation} update the reference in all dependent materialized lake views [3]
- {operation} schedule view run
- lets users set how often the MLV should be refreshed based on business needs and lineage execution timing [5]
- depends on
- data update frequency: the frequency with which the data is updated [5]
- query performance requirements: Business requirement to refresh the data in defined frequent intervals [5]
- system load: optimizing the time to run the lineage without overloading the system [5]
- {operation} view run history
- users can access the last 25 runs including lineage and run metadata
- available from the dropdown for monitoring and troubleshooting
- {concept} lineage
- the sequence of MLV that needs to be executed to refresh the MLV once new data is available [5]
- {feature} automatically generate a visual report that shows trends on data quality constraints
- {benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]
- {feature} can be combined with Shortcut Transformation feature for CSV ingestion
- {benefit} facilitate the building of end-to-end Medallion architectures
- {feature} dependency graph
- allows to see the dependencies existing between the various objects [2]
- ⇐ automatically generated [2]
- {feature} data quality report
- built-in Power BI dashboard that shows several aggregated metrics [2]
- doesn't support
- {feature|planned} PySpark [3]
- {feature|planned} incremental refresh [3]
- {feature|planned} integration with Data Activator [3]
- {feature|planned} API [3]
- {feature|planned} cross-lakehouse lineage and execution [3]
- {limitation} Spark properties set at the session level aren't applied during scheduled lineage refresh [4]
- {limitation} creation with delta time-travel [4]
- {limitation} DML statements [4]
- {limitation} UDFs in CTAS [4]
- {limitation} temporary views can't be used to define MLVs [4]
MLV - Materialized Lake views