SQL Troubles: views

Showing posts with label views. Show all posts

24 May 2025

🏭🗒️Microsoft Fabric: Materialized Lake Views (MLV) [Notes] 🆕🗓️

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 27-Jul-2025

-- create schema
CREATE SCHERA IF NOT EXISTS <lakehouse_name>.<schema_name>

-- create a materialized view
CREATE MATERIALIZED VIEW IF NOT EXISTS <lakehouse_name>.<schema_name>.<view_name> 
[(
    CONSTRAINT <constraint_name> CHECK (<constraint>) ON MISMATCH DROP 
)]

[PARTITIONED BY (col1, col2, ... )] 
[COMMENT “description or comment”] 
[TBLPROPERTIES (“key1”=”val1”, “key2”=”val2”,

AS 
SELECT ...
FROM ...
-- WHERE ...
--GROUP BY ...

[Microsoft Fabric] Materialized Lake Views (MLVs)

{def} persisted, continuously updated view of data [1]

{benefit} allows to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations

simplifies the implementation of multi-stage Lakehouse processing [1]

⇐ aids in the creation, management, and monitoring of views [3]
⇐ improves transformations through a declarative approach [3]
streamline data workflows
enable developers to focus on business logic [1]

⇐ not on infrastructural or data quality-related issues [1]

the views can be created in a notebook [2]

{benefit} allows developers visualize lineage across all entities in lakehouse, view the dependencies, and track its execution progress [3]

can have data quality constraints enforced and visualized for every run, showing completion status and conformance to data quality constraints defined in a single view [1]
empowers developers to set up complex data pipelines with just a few SQL statements and then handle the rest automatically [1]

faster development cycles
trustworthy data
quicker insights

{goal} process only the new or changed data instead of reprocessing everything each time [1]

⇐ leverages Delta Lake’s CDF under the hood

⇒ it can update just the portions of data that changed rather than recompute the whole view from scratch [1]

{operation} creation

allows defining transformations at each layer [1]

e.g. aggregation, projection, filters

allows specifying certain checks that the data must meet [1]

incorporate data quality constraints directly into the pipeline definition

via CREATE MATERIALIZED LAKE VIEW

the SQL syntax is declarative and Fabric figures out how to produce and maintain it [1]

{operation} refresh

refreshes only when its source has new data [1]

if there’s no change, it can skip running entirely (saving time and resources) [1]

via REFRESH MATERIALIZED LAKE VIEW [workspace.lakehouse.schema].MLV_Identifier [FULL];

{operation} list views from schema [3]

via SHOW MATERIALIZED LAKE VIEWS <IN/FROM> Schema_Name;

{opetation} retrieve definition

via SHOW CREATE MATERIALIZED LAKE VIEW MLV_Identifier;

{operstion} update definition

via ALTER MATERIALIZED LAKE VIEW MLV_Identifier RENAME TO MLV_Identifier_New;

{operstion} drop view

via DROP MATERIALIZED LAKE VIEW MLV_Identifier;
{warning} dropping or renaming a materialized lake view affects the lineage view and scheduled refresh [3]

{recommendation} update the reference in all dependent materialized lake views [3]

{operation} schedule view run

lets users set how often the MLV should be refreshed based on business needs and lineage execution timing [5]
depends on

data update frequency: the frequency with which the data is updated [5]
query performance requirements: Business requirement to refresh the data in defined frequent intervals [5]
system load: optimizing the time to run the lineage without overloading the system [5]

{operation} view run history

users can access the last 25 runs including lineage and run metadata

available from the dropdown for monitoring and troubleshooting

{concept} lineage

the sequence of MLV that needs to be executed to refresh the MLV once new data is available [5]

{feature} automatically generate a visual report that shows trends on data quality constraints

{benefit} allows to easily identify the checks that introduce maximum errors and the associated MLVs for easy troubleshooting [1]

{feature} can be combined with Shortcut Transformation feature for CSV ingestion

{benefit} facilitate the building of end-to-end Medallion architectures

{feature} dependency graph

allows to see the dependencies existing between the various objects [2]

⇐ automatically generated [2]

{feature} data quality

{benefit} allows to compose precise queries to exclude poor quality data from the source tables [5]
[medallion architecture] ensuring data quality is essential at every stage of the architecture [5]
maintained by setting constraints when defining the MLVs [5]
{action} FAIL

stops refreshing an MLV if any constraint is violated [5]
{default} halt is at the first instance

even without specifying the FAIL keyword [5]
takes precedence over DROP

{action} DROP

processes the MLV and removes records that don't meet the specified constrain [5]

provides the count of removed records in the lineage view [5]

{constraint} updating data quality constraints after creating an MLV isn't supported [5]

⇐ the MLV must be recreated

{constraint} the use of functions and pattern search with operators in constraint condition is restricted [5]

e.g. LIKE, regex

{known issue} the creation and refresh of an MLV with a FAIL action in constraint may result in a "delta table not found" error

{recommendation} recreate the MLV and avoid using the FAIL action [5]

{feature} data quality report

built-in Power BI dashboard that shows several aggregated metrics [2]

{feature} monitor hub

centralized portal to browse MLV runs in the lakehouse [7]
{operation} view runs' status [7]
{operation} search and filter the runs [7]

based on different criteria

{operation} cancel in-progress run [7]
{operation} drill down run execution details [7]

doesn't support

{feature|planned} PySpark [3]
{feature|planned} incremental refresh [3]
{feature|planned} integration with Data Activator [3]
{feature|planned} API [3]
{feature|planned} cross-lakehouse lineage and execution [3]
{limitation} Spark properties set at the session level aren't applied during scheduled lineage refresh [4]
{limitation} creation with delta time-travel [4]
{limitation} DML statements [4]
{limitation} UDFs in CTAS [4]
{limitation} temporary views can't be used to define MLVs [4]

Previous Post <<||>> Next Post

References:

[1] Microsoft Fabric Update Blog (2025) Simplifying Medallion Implementation with Materialized Lake Views in Fabric [link|aka]

[2] Power BI Tips (2025) Microsoft Fabric Notebooks with Materialized Views - Quick Tips [link]

[3] Microsoft Learn (2025) What are materialized lake views in Microsoft Fabric? [link]

[4] Microsoft Learn (2025) Materialized lake views Spark SQL reference [link]

[5] Microsoft Learn (2025) Manage Fabric materialized lake views lineage [link]

[6] Microsoft Learn (2025) Data quality in materialized lake views [link]

[7] Microsoft Learn (2025) Monitor materialized lake views [link]

Resources:
[R1] Databricks (2025) Use materialized views in Databricks SQL [link]

[R2] Microsoft Learn (2025) Implement medallion architecture with materialized lake views [link]

Acronyms:

API -

CDF - Change Data Feed

CTA -

DML -

ETL - Extract, Transfer, Load

MF - Microsoft Fabric
MLV - Materialized Lake views

UDF - User-defined functions

24 April 2025

💎🏭SQL Reloaded: Microsoft Fabric's Lakehouses at Work (Part I: Proof-of-Concept)

Introduction

One way to work with the data files existing in organization is to import them into a lakehouse and build a data model based on them that can be reused in the various solutions (incl. Power BI). As a reminder, a lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location.

The tutorials available on lakehouses are pretty useful for getting an idea how to start, though probably each seasoned professional has his/her way of doing things, at least for testing the capabilities before building a proper solution. The target is thus, to create the minimum for testing the capabilities needed for a proof-of-concept solution.

The files used in this post are available on Microsoft's GitHub. Download the files and store them to be easily available for the next steps. The following files were considered for the current post: customers.csv, orders.csv and products.csv.

Create the Lakehouse

It's recommended to have a naming convention in place for the various items created in a workspace, e.g. a combination between item type (lakehouse, warehouse), system type (Prod, UAT, Dev, PoC) and eventually department (e.g. FIN, SCM, HR, etc.). One should try to balance between meaning and usefulness. Ideally, one should use 2 maximum 4 letters for each part encoded in the name. For example, the following scripts were created in the LH_SCM_PoC lakehouse.

More complex naming conventions can include the system (e.g. D365, CRM, EBS) or the company. The target is to easily identify the systems, independently how complex the rules are. Given that it can become challenging to rename the schemas afterwards, ideally the naming convention should be available from the early stages.

Create the Schema

A lakehouse comes with a dbo schema available by default, though it's recommended to create meaningful schema(s) as needed. The schemas should ideally reflect the domain of the data (e.g. departments or other key areas) and the schemas shouldn't change when the objects are deployed between the different environments. Upon case, one should consider creating multiple schemas that reflect the model's key areas. The names should be simple and suggestive.

-- create schema
CREATE Schema Orders

Create a Staging Area

The next step is to create a staging area where the files in scope can be made available and then further loaded in the lakehouse. One needs to compromise between creating a deep hierarchical structure that reflects the business structure and the need to easily identify, respectively manage the files. An hierarchical structure with 1-2 level could provide the needed compromise, though each additional level tends to increase the complexity.

One should also consider rules for archiving or even deleting the files.

Upload the Files

Microsoft Fabric allows users to upload multiple files together into a single step. Ideally the files should have proper names for each column, otherwise overheads deriving from this may appear later in the process.

When the files are available in multiple folders in a complex hierarchical structure, a set of shortcuts could help in their management.

Load the Data

A file's data can be loaded into the lakehouse on the fly by providing a valid table name:
Files >> SCM_Orders >> (select file) >> Load to Tables >> new table >> Load file to new table >> (provide information) >> Load

Load file to new table

Of course, the table's name must be unique within the Schema and the further properties must define files' definition.

One should consider loading first a couple of tables, performing a rough validation of the data imported, and only after that the remaining tables can be imported. This allows to identify the issues that typically lead to reimports of the data (wrong formatting, invalid column names, duplicated files, etc.) or rework.

If the files have different characteristics (e.g. delimiters, number of attributes/records, special data types), one should consider this in the initial scope and have at least one example from each category.

Review the Metadata

Once the files were made available, there's the tendency to start directly with the development without analyzing the data, or equally important, the metadata available. To review the metadata of the tables newly created, one can use the objects from the standard INFORMATION_SCHEMA (see post):

-- retrieve the list of tables
SELECT * 
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'orders'

ORDER BY TABLE_SCHEMA

Further on, one can review columns' definition:

-- retrieve column metadata
SELECT TABLE_CATALOG
, TABLE_SCHEMA
, TABLE_NAME
, COLUMN_NAME
, ORDINAL_POSITION
, DATA_TYPE
, CHARACTER_MAXIMUM_LENGTH
, NUMERIC_PRECISION
, NUMERIC_SCALE
, DATETIME_PRECISION
, CHARACTER_SET_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'orders'
ORDER BY ORDINAL_POSITION

It's a good idea to save the metadata to a file and use it later for reviews, respectively for metadata management, when no other solution is in place for the same (e.g. Purview). That's useful also for the people with limited or no access to the workspace.

Alternatively, one can use a notebook with the following SQL-based commands:

%%sql

DESCRIBE TABLE LH_SCM_PoC.orders.sales;

DESCRIBE TABLE EXTENDED LH_SCM_PoC.orders.sales;

One can even provide meaningful descriptions for each table and its attributes via scripts like the ones below, however it might be a good idea to do this in the later phases of the PoC, when the logic become stable:

%%sql

-- modify a table's COMMENT
COMMENT ON TABLE LH_SCM_PoC.orders.sales IS 'Customer sales orders';

-- modify columns' COMMENT for an existing table
ALTER TABLE LH_SCM_DWH.orders.sales  
ALTER COLUMN SalesOrderNumber COMMENT 'Sales Order Number';

Data Validation

Before diving into building any business logic, besides identifying the primary, foreign keys and further attributes used in bringing the data together, it's recommended to get an overview of data's intrinsic and extrinsic characteristics relevant to the analysis. Some of the rules used typically for studying the quality of data apply to some extent also in here, though one needs to prioritize accordingly, otherwise one replicates the effort that's typically part of the Data Management initiatives.

In addition, it's important to check how much the identified issues impact the business logic, respectively on whether the issues can be corrected to match the expectations. Often, no logic can compensate for major data quality issues, and this can also affect PoC's results as soon as the outcomes are validated against the expectations!

Data Understanding

Further on, it makes sense to get a high-level understanding of the data by looking at the distribution of values, respectively at the records participating in the joins. Of course, more similar queries can be built, though again, one should try to focus on the most important aspects!

The analysis could for example consider the following points:

/* validation of Products */

-- review duplicated product numbers (should be 0)
SELECT ProductName
, count(*) RecordCount
FROM orders.products
GROUP BY ProductName
HAVING count(*)>1

-- review most (in)expensive products
SELECT top 100 ProductID
, ProductName
, Category
, ListPrice 
FROM orders.products
ORDER BY ListPrice DESC --ASC

-- review category distribution
SELECT Category
, count(*) RecordCount 
FROM orders.products
GROUP BY Category
ORDER BY RecordCount DESC

-- review price ranges (
SELECT Len(floor(ListPrice)) RangeCount
, count(*) RecordCount 
FROM orders.products
GROUP BY Len(floor(ListPrice)) 
ORDER BY RangeCount DESC

/* validation of Customers */

-- duplicated email address 
SELECT CST.CustomerID
, CST.FirstName
, CST.LastName 
, CST.EmailAddress 
, DUP.RecordCount
FROM (-- duplicates
	SELECT EmailAddress
	, count(*) RecordCount 
	FROM orders.customers 
	GROUP BY EmailAddress 
	HAVING count(*)>1
	) DUP
	JOIN orders.customers CST
	   ON DUP.EmailAddress = CST.EmailAddress
ORDER BY DUP.RecordCount DESC
, DUP.EmailAddress 

-- duplicated Customer names (not necessarily duplicates)
SELECT CST.CustomerID
, CST.FirstName
, CST.LastName 
, CST.EmailAddress 
, DUP.RecordCount
FROM (-- duplicates
	SELECT FirstName
	, LastName
	, count(*) RecordCount 
	FROM orders.customers 
	GROUP BY FirstName
	, LastName 
	HAVING count(*)>1
	) DUP
	JOIN orders.customers CST
	   ON DUP.FirstName = CST.FirstName
      AND DUP.LastName = CST.LastName
ORDER BY DUP.RecordCount DESC
, DUP.FirstName
, DUP.LastName

/* validation of Orders */

-- review a typical order
SELECT SalesOrderID
, OrderDate
, CustomerID
, LineItem
, ProductID
, OrderQty
, LineItemTotal
FROM orders.orders
WHERE SalesOrderID = 71780
ORDER BY SalesOrderID 
, LineItem

-- review orders' distribution by month
SELECT Year(OrderDate) Year
, Month(OrderDate) Month
, count(*) RecordCount
FROM orders.orders
GROUP BY Year(OrderDate) 
, Month(OrderDate) 
ORDER BY Year
, Month

-- checking for duplicates
SELECT SalesOrderID
, LineItem
, count(*) RecordCount
FROM orders.orders ord 
GROUP BY SalesOrderID
, LineItem
HAVING count(*)>1

-- checking for biggest orders
SELECT SalesOrderID
, count(*) RecordCount
FROM orders.orders ord 
GROUP BY SalesOrderID
HAVING count(*) > 10
ORDER BY NoRecords DESC

-- checking for most purchased products
SELECT ProductID
, count(*) NoRecords
FROM orders.orders ord 
GROUP BY ProductID
HAVING count(*) > 8
ORDER BY NoRecords DESC

-- checking for most active customers
SELECT CustomerID
, count(*) RecordCount
FROM orders.orders ord 
GROUP BY CustomerID
HAVING count(*) > 10
ORDER BY RecordCount DESC

/* join checks */

-- Prders without Product (should be 0)
SELECT count(*) RecordCount
FROM orders.orders ord 
	 LEFT JOIN orders.products prd
	   ON ord.ProductID = prd.ProductID
WHERE prd.ProductID IS NULL

-- Prders without Customer (should be 0)
SELECT count(*) RecordCount
FROM orders.orders ORD 
	 LEFT JOIN orders.customers CST
	   ON ORD.CustomerID = CST.CustomerID
WHERE CST.CustomerID IS NULL

-- Products without Orders (153 records)
SELECT count(*) RecordCount
FROM orders.products prd
	 LEFT JOIN orders.orders ord 
	   ON prd.ProductID = ord.ProductID 
WHERE ord.ProductID IS NULL


-- Customers without Orders (815 records)
SELECT count(*) RecordCount
FROM orders.customers CST
	 LEFT JOIN orders.orders ORD
	   ON ORD.CustomerID = CST.CustomerID
WHERE ORD.CustomerID IS NULL

The more tables are involved, the more complex the validation logic can become. One should focus on the most important aspects.

Building the Logic

Once one has an acceptable understanding of the data entities involved and the relation between them, it's time to build the needed business logic by joining the various tables at the various levels of detail. One can focus on the minimum required, respectively attempt to build a general model that can address a broader set of requirements. For the PoC it's usually recommended to start small by addressing the immediate requirements, though some flexibility might be needed for exploring the data and preparing the logic for a broader set of requirements. Independently of the scope, one should consider a set of validations.

Usually, it makes sense to encapsulate the logic in several views or table-valued functions that reflect the logic for the main purposes and which allow a high degree of reuse (see [1]). Of course, one can use the standard approach for modelling the bronze, silver, respectively the gold layers adopted by many professionals. For a PoC, even if that's not mandatory, it might still be a good idea to make steps in the respective direction.

In this case, dealing with only three tables - a fact table and two dimensions table - there are several perspectives that can be built:

a) all records from fact table + dimension records

The following view provides the lowest level of details for the fact table, allowing thus to look at the data from different perspectives as long as focus is only the values used is Sales Orders:

-- create the view
CREATE OR ALTER VIEW orders.vSalesOrders
-- Sales Orders with Product & Customer information
AS
SELECT ORD.SalesOrderID
, ORD.OrderDate
, ORD.CustomerID
, CST.FirstName 
, CST.LastName
, CST.EmailAddress
, ORD.LineItem
, ORD.ProductID
, PRD.ProductName 
, PRD.Category
, ORD.OrderQty
, ORD.LineItemTotal
, PRD.ListPrice 
, ORD.OrderQty * PRD.ListPrice ListPriceTotal
FROM orders.orders ORD 
	 JOIN orders.products PRD
	   ON ORD.ProductID = PRD.ProductID
	 JOIN orders.customers CST
	   ON ORD.CustomerID = CST.CustomerID

-- test the view   
SELECT *
FROM orders.vSalesOrders
WHERE SalesOrderID = 71780

One can use full joins unless some of the references dimensions are not available.

b) aggregated data for all dimension combinations

The previous view allows to aggregate the data at the various levels of details:

-- Sales volume by Customer & Product
SELECT ORD.EmailAddress
, ORD.ProductName 
, ORD.Category
, SUM(ORD.OrderQty) OrderQty
, SUM(ORD.LineItemTotal) LineItemTotal
FROM orders.vSalesOrders ORD 
WHERE ORD.OrderDate >= '2022-06-01'
  AND ORD.OrderDate < '2022-07-01'
GROUP BY ORD.EmailAddress
, ORD.ProductName 
, ORD.Category
ORDER BY ORD.EmailAddress
, ORD.ProductName

One can comment out the dimensions not needed. The query can be included in a view as well.

c) all records from each dimension table + aggregated fact records

Sometimes, it's useful to look at the data from a dimension's perspective, though it might be needed to create such an object for each dimension, like in the below examples. For the maximum of flexibility the logic can be included in a table-valued function:

-- create the user-defined function
CREATE OR ALTER FUNCTION orders.tvfProductsSalesVolume(
    @StartDate date NULL,
    @EndDate date NULL
)
RETURNS TABLE
-- Sales volume by Product
AS
RETURN (
SELECT PRD.ProductID
, PRD.ProductName 
, PRD.Category
, ORD.FirstOrderDate
, ORD.LastOrderDate 
, IsNull(ORD.TotalSalesQty, 0) TotalSalesQty 
, IsNull(ORD.TotalSalesValue, 0) TotalSalesValue
, IsNull(ORD.OrderCount, 0) OrderCount
, IsNull(ORD.LineCount, 0) LineCount
FROM orders.products PRD
     OUTER APPLY (
		SELECT Min(ORD.OrderDate) FirstOrderDate
		, Max(ORD.OrderDate) LastOrderDate 
		, SUM(ORD.OrderQty) TotalSalesQty
		, SUM(ORD.LineItemTotal) TotalSalesValue
		, count(DISTINCT SalesOrderID) OrderCount
		, count(*) LineCount
		FROM orders.orders ORD 
		WHERE ORD.ProductID = PRD.ProductID
		  AND ORD.OrderDate >= @StartDate 
		  AND ORD.OrderDate < @EndDate 
	 ) ORD
);

-- test the user-defined function
SELECT *
FROM orders.tvfProductsSalesVolume('2022-06-01','2022-07-01') PRD
WHERE TotalSalesValue <> 0
ORDER BY TotalSalesValue DESC
, LastOrderDate DESC


-- create the user-defined function
CREATE OR ALTER FUNCTION orders.tvfCustomersSalesVolume(
    @StartDate date NULL,
    @EndDate date NULL
)
RETURNS TABLE
-- Sales volume by Customer
AS
RETURN (
SELECT CST.CustomerID
, CST.FirstName 
, CST.LastName
, CST.EmailAddress
, ORD.FirstOrderDate
, ORD.LastOrderDate 
, IsNull(ORD.TotalSalesValue, 0) TotalSalesValue
, IsNull(ORD.OrderCount, 0) OrderCount
, IsNull(ORD.LineCount, 0) LineCount
FROM orders.customers CST
     OUTER APPLY (
		SELECT Min(ORD.OrderDate) FirstOrderDate
		, Max(ORD.OrderDate) LastOrderDate 
		, SUM(ORD.LineItemTotal) TotalSalesValue
		, count(DISTINCT SalesOrderID) OrderCount
		, count(*) LineCount
		FROM orders.orders ORD 
		WHERE ORD.CustomerID = CST.CustomerID
		  AND ORD.OrderDate >= @StartDate 
		  AND ORD.OrderDate < @EndDate 
	 ) ORD
);

-- test the user-defined function
SELECT *
FROM orders.tvfCustomersSalesVolume('2022-06-01','2022-07-01') PRD
WHERE TotalSalesValue <> 0
ORDER BY TotalSalesValue DESC
, LastOrderDate DESC

When restructuring the queries in similar ways, there's always a compromise between the various factors: (re)usability, performance or completeness.

Further Comments

The above database objects should allow users to address most of the requirements, though, as usual, there can be also exceptions, especially when the data needs to be aggregated at a different level of detail that requires the query to be structured differently.

The number of perspectives can increase also with the number of fact tables used to model a certain entity (e.g. Sales order headers vs. lines). For example,

In theory, one can also find ways to automate the process of creating database objects, though one must choose the relevant attributes, respectively include logic that makes sense only within a certain perspective.

No matter the data, respectively systems used as source, expect surprises and test your assumptions! For example, in the file used to create the orders.customers table, there seem to be duplicated entities with the same name and email address. One must clarify how such entities must be handled in data analysis, respectively in data modeling. For example, a person can appear twice because of the roles associated with the name or can be other entitled reasons.

The files in scope of this post are small compared with the files existing in organizations. In many scenarios files' size could range from GB to TB and thus require partitioning and different other strategies.

|>> Next Post

References
[1] sql-troubles (2023) Architecture Part IV: Building a Modern Data Warehouse with Azure Synapse [link]

Resources
[1] Microsoft Learn (2024) Fabric: Lakehouse and Delta Lake tables [link]

22 March 2025

💠🛠️🗒️SQL Server: Indexed Views [Notes]

Disclaimer: This is work in progress based on notes gathered over the years, intended to consolidate information from the various sources. The content needs yet to be reviewed against the current documentation.

Last updated: 22-Mar-2024

[SQL Server 2005] Indexed View

{def} a materialized view

materializes the data from the view queries, storing it in the database in a way similar to tables [6]

⇒ its definition is computed and the resulting data stored just like a table [3]
the view is indexed by creating a unique clustered index on it

the resulting structure is physically identical to a table with a clustered index

⇐ nonclustered indexes also are supported on this structure

can be created on a partitioned table, respectively can be partitioned [1]

{benefit} can improve the performance of some types of queries [3]

e.g. queries that aggregate many rows
⇐ because the view is stored the same way a table with a clustered index is stored [1]
⇐ not well-suited for underlying data that are frequently updated [3]
more expensive to use and maintain than filtered indexes [5]

[query optimizer]

can use it to speed up the query execution [1]

the view doesn't have to be referenced in the query for the optimizer to consider it for a substitution [1]
{downside} DML query performance can degrade significantly [1]

⇐ in some cases, a query plan can't even be produced [1]
when executing UPDATE, DELETE or INSERT on the base table referenced, the indexed views must be updated as well [1]
{recommendation} test DML queries before production use [1]

analyze the query plan and tune/simplify the DML statemen [1]

can use the structure to return results more efficiently to the user

contains logic to use this index in either of the cases

the original query text referenced the view explicitly [2]
the user submits a query that uses the same components as the view (in any equivalent order) [2]
⇐ the query processor expands indexed views early in the query pipeline and always uses the same matching code for both cases [2]

the WITH(NOEXPAND) hint tells the query processor not to expand the view definition [2]
also instructs the query processor to perform an index scan of the indexed view rather than expand it into its component parts [5]
any extra rows in the indexed view are reported as 8907 errors [5]
any missing rows are reported as 8908 errors [5]

expose some of the benefits of view materialization while retaining the benefits of global reasoning about query operations [2]
expanded (aka in-lined) before optimization begins

gives the Query Optimizer opportunities to optimize queries globally [2]
makes it difficult for the (query) optimizer to consider plans that perform the view evaluation first, then process the rest of the query [2]

arbitrary tree matching is a computationally complex problem, and the feature set of views is too large to perform this operation efficiently [2]

cases in which it does not match the view

indexed views are inserted into the Memo and evaluated against other plan choices

while they are often the best plan choice, this is not always the case [2]
the Query Optimizer can detect logical contradictions between the view definition and the query that references the view [2]

there are also some cases where the Query Optimizer does not recognize an indexed view even when it would be a good plan choice [2]

often, these cases deal with complex interactions between high-level features within the query processor (e.g. computed column matching, the algorithm to explore join orders) [2]
consider the WITH (NOEXPAND) hint to force the query processor to pick that indexed view [2]

this usually is enough to get the plan to include the indexed view [2]

indexed view alternatives

are generated and stored in the Memo
are compared using costing equations against other possible plans
partial matches cost the residual operations as well

an indexed-view plan can be generated but not picked when the Query Optimizer considers other plans to have lower costs [2]

maintained as part of the update processing for tables on which the view is based

this makes sure that the view provides a consistent result if it is selected by the Query Optimizer for any query plan [2]
some query operations are incompatible with this design guarantee

restrictions are placed on the set of supported constructs in indexed views to make sure that the view can be created, matched, and updated efficiently [2]

{operation} updating indexed views

the core question behind the restrictions is “Can the query processor compute the necessary changes to the Indexed View clustered and nonclustered indexes without having to recompute the whole indexed view?” [2]

if so, the query processor can perform these changes efficiently as part of the maintenance of the base tables that are referenced in the view[2]

this property is relatively easy for filters, projections (compute scalar), and inner joins on keys[2]
operators that destroy or create data are more difficult to maintain, so often these are restricted from use in indexed views. [2]

matching indexed views is supported in cases beyond exact matches of the query text to the view definition [2]

it also supports using an indexed view for inexact matches where the definition of the view is broader than the query submitted by the user [2]

then applies residual filters, projections (columns in the select list), and even aggregates to use the view as a partial precomputation of the query result [2]

{concept} statistics on indexed views

normally statistics aren't needed

because the substitution of the indexed views into the query plan is considered only after all the statistics for the underlying tables and indexes are attached to the query plan [3]
used if the view is directly referenced by the NOEXPAND hint in a FROM clause

an error is generated and the plan is not created if the NOEXPAND hint is used on a view that does not also contain an index [3]

can’t be created by using sp_createstats or updated by using sp_updatestats.
auto update and auto create statistics features work for indexed views

created manually

via CREATE STATISTICS on the indexed view columns
via UPDATE STATISTICS to update column or index statistics on indexed views

{operation} creating a view

requires that the underlying object’s schema can’t change
requires WITH SCHEMABINDING option [5]
⇒ must include the two-part names of all referenced tables [5]
⇐ the tables can't be dropped and the columns can't be altetd when participate in a view unless the view is tropped [5]
⇐ an error is raised [5]
the user must hold

the CREATE VIEW permission in the database [1]
ALTER permission on the schema in which the view is being created [1]
if the base table resides within a different schema, the REFERENCES permission on the table is required as a minimum [1]
if the user creating the index differs from the users who created the view, for the index creation alone the ALTER permission on the view is required [1]

{operation} creating an index on the view[

indexes can only be created on views that have the same owner as the referenced table or tables (aka intact ownership chain between the view and the tables) [1]

{operation} dropping a view

makes all indexes on the view to be dropped [1]

⇐ all nonclustered indexes and auto-created statistics on the view are dropped when the clustered index is dropped [1]

{exception} ser-created statistics on the view are maintained [1]

nonclustered indexes can be individually dropped [1]
dropping the clustered index on the view

removes the stored result set [1]
the optimizer returns to processing the view like a standard view [1]

{operation} disable indexes on tables and views

when a clustered index on a table is disabled, indexes on views associated with the table are also disabled [1]

{option} EXPAND VIEWS

allows to prevent the Database Engine from using indexed views [1]

if any of the listed options are incorrectly set, this option prevents the optimizer from using the indexes on the views [1]
via OPTION (EXPAND VIEWS) hint

{recommendation} when using datetime and smalldatetime string literals in indexed views, explicitly convert the literal to the date type by using a deterministic date format style [1]
{limitation} AVG is not allowed {workaround} use SUM and COUNT_BIG (5]
{limitation} impacted by SET options [1]

{restriction} require fixed values for several SET options [1]
{recommendation} set the ARITHABORT user option to ON server-wide as soon as the first indexed view or index on a computed column is created in any database on the server [1]

{limitation} further requirements apply (see [1])
{limitation} aren't supported on top of temporal queries

⇐ queries that use FOR SYSTEM_TIME clause).

{scenario}simplifying SQL queries
{scenario} abstracting data models from user models
{scenario} enforcing user security

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2024) SQL Server: Create indexed views [link]
[2] Kalen Delaney et all (2009) Microsoft® SQL Server® 2008 Internals
[3] Microsoft Learn (2024) SQL Server: Views [link]
[4] Microsoft Learn (2024) SQL Server: CREATE INDEX (Transact-SQL) [link]

[5] Kalen Delaney et all (2012) Microsoft® SQL Server® 2012 Internals

[6] Dmitri Korotkevitch (2016) Pro SQL Server Internals 2nd Ed.

Resources:
[R1] Microsoft Learn (2024) SQL Server: Optimize index maintenance to improve query performance and reduce resource consumption [link]

Acronyms:
DML - Data Manipulation Language
QO - Query Optimizer

19 March 2025

💠🛠️🗒️SQL Server: Views [Notes]

Last updated: 19-Mar-2024

[SQL Server 2005] View (aka virtual table)

{def} a database object that encapsulates a SQL statement and that can be used as a virtual table in further SQL statements

cannot be executed by itself

⇐ it must be used within a query [15]

doesn't store any data

except index views
data is dynamically produced from the underlying table when the view is used [32]

views depend on the underlying tables and act like a filter on the underlying tables [32]

used just like regular tables without incurring additional cost

unless the view is indexed [25]

turning a query into a view

remove the ORDER BY clause
assure there are no name duplicates
assure that each column has a name

projected columns

columns included in the view

view’s column list

renames every output column just as if every column had those alias names in the SELECT statement
a view is more self-documenting if the column names of the view are specified in the SELECT statement and not listed separately in the view [27]

{restriction} sorting is not allowed in a view

unless the view includes a TOP predicate

ORDER BY clause serves only to define which rows qualify for the TOP predicate [15]

the only way to logically guarantee sorted results is to define the ORDER BY clause in the executing query [15]

[SQL Server 2005] had a bug in the Query Optimizer that would enable an ORDER BY in a view using a top 100 percent predicate [15]

the behavior was never documented or officially supported [15]

OFFSET FETCH clause

{restriction} parameters can’t be passed to a view [100]

{alternative} use an inline table-valued function

{restriction} cannot reference a variable inside the SELECT statement [100]
{restriction} cannot create a table, whether permanent or temporary

⇒ cannot use the SELECT/INTO syntax in a view

{restriction} can reference only permanent tables

⇒ cannot reference a temporary table [100]

{benefit} present the correct fields to the user
{benefit} enforce security

by specifying

only needed columns

projects a predefined set of columns [15]
hides sensitive, irrelevant, or confusing columns [15]
should be used in parallel with SQL Server–enforced security [15]

only needed records

by allowing users access to the view without the need to give access to the used tables

grant users read permission from only the views, and restrict access to the physical tables [15]

{benefit} maintainability
{benefit} provides a level of abstraction

hides the complexity of the underlying data structures
encapsulates (business)logic
denormalize or flatten complex joins
can consolidate data across databases/servers
can be used as single version of truth

{benefit} allow changing data in the base tables
{downside} layers of nested views require needless overhead for views’ understanding
{downside} single-purpose views quickly become obsolete and clutter the database [15]
{downside} complex views are perceived as having poor performance [15]
{best practice} use generic/standard naming conventions
{best practice} use aliases for cryptic/recurring column names
{best practice} consider only the requested columns
{best practice} group specific purpose view under own schema
{best practice} avoid hardcoding values
{best practice} use views for column-level security together with SQL Server–enforced security
{best practice} limit views to ad-hoc queries and reports

for extensibility and control [15]
⇐ performance isn’t the reason [15]

{poor practices} create views for single-purpose queries (aka one time requests)
{operation} create a view
{operation} drop a view
{operation} alter a view
{operation} select data
{operation} update data

unless the view is a simple single table view, it’s difficult to update the underlying data through the view [15]

{type} inline views

exist only during the execution of a query [32]
simplify the development of a one-time query [32]

allows creating queries in steps

enables troubleshooting

can replace inline UDFs
alternatives

inline UDFs
temporary tables

{type} indexed views

materialize the data, storing the results of the view in a clustered index on disk [15]
similar to a covering index

but with greater control

can include data from multiple data sources [15]
no need to include the clustered index keys [15]

designing an indexed view is more like designing an indexing structure than creating a view [15]

can cause deadlock when two or more of the participating tables is updated/inserted/deleted from two or more sessions in parallel such that they block each other causing a deadlock scenario [29]

{type} compatibility views

allow accessing a subset of the SQL Server 2000 system tables

don’t contain any metadata related to features added after

views have the same names as many of the system tables in previous version, as well as the same column names

⇒ any code that uses the SQL Server 2000 system tables won’t break [16]
there’s no guarantee that will be returned exactly the same results as the ones from the corresponding tables in SQL Server 2000 [16]

accessible from any database
hidden in the resource database

e.g. sysobjects, sysindexes, sysusers, sysdatabases

{type} [SQL Server 2015] catalog views

general interface to the persisted system metadata
built on an inheritance model

⇒ no need to redefine internally sets of attributes common to many objects

available over sys schema

must be included in object’s reference

some of the names are easy to remember because they are similar to the SQL Server 2000 system table names [16]
the columns displayed are very different from the columns in the compatibility views
some metadata appears only in the master database

keeps track of system-wide data (e.g. databases and logins)
other metadata is available in every database (e.g. objects and permissions)
metadata appearing only in the msdb database isn’t available through catalog views but is still available in system tables, in the schema dbo (e.g. backup and restore, replication, Database Maintenance Plans, Integration Services, log shipping, and SQL Server Agent)

{type} partitioned views

allow the data in a large table to be split into smaller member tables

the data is partitioned between the member tables based on ranges of data values in one of the columns [4]
the data ranges for each member table are defined in a CHECK constraint specified on the partitioning column [4]
a view that uses UNION ALL to combine selects of all the member tables into a single result set is then defined [4]
when SELECT statements referencing the view specify a search condition on the partition column, the query optimizer uses the CHECK constraint definitions to determine which member table contains the rows [4]

{type} distributed partition views (DPV)

local partitioned views

a single table is horizontally split into multiple tables, usually all have the same structure [30]

cross database partitioned views

tables are split among different databases on the same server instance

distributed (across server or instance) partitioned views

tables participating in the view reside in different databases which reside ondifferent servers or different instances

{type} nested views

views referred by other views [15]
can lead to an abstraction layer with nested views several layers deep

too difficult to diagnose and maintain [15]

{type} updatable view

view that allows updating the underlying tables

only one table may be updated
if the view includes joins, then the UPDATE statement that references the view must change columns in only one table [15]

typically not a recommended solution for application design
WITH CHECK OPTION causes the WHERE clause of the view to check the data being inserted or updated through the view in addition to the data being retrieved [15]

it makes the WHERE clause a two-way restriction [15]

⇒ can protect the data from undesired inserts and updates [15]

⇒ useful when the view should limit inserts and updates with the same restrictions applied to the WHERE clause [15]
when CHECK OPTION isn’t use, records inserted in the view that don’t match the WHERE constraints will disappear (aka disappearing rows) [15]

{type} non-updatable views

views that don’t allow updating the underlying tables
{workaround} build an INSTEAD OF trigger that inspects the modified data and then performs a legal UPDATE operation based on that data [15]

{type} horizontally positioned views.

used s to enforce row-level security with the help of a WITH CHECK option

{downside} has a high maintenance cost [15]
{alternative} row-level security can be designed using user-access tables and stored procedures [15]

{type} schema-bound views

the SELECT statement must include the schema name for any referenced objects [15]

SELECT * (all columns) is not permitted [15]

{type} subscription views

a view used to export Master Data Services data to subscribing systems

Previous Post <<||>> Next Post

References:
[4] Microsoft (2013) SQL Server 2000 Documentation

[15] Adam Jorgensen et al (2012) Microsoft® SQL Server® 2012 Bible

[16] Bob Beauchemin et al (2012) Microsoft SQL Server 2012 Internals
[25] Basit A Masood-Al-Farooq et al (2014) SQL Server 2014 Development Essentials: Design, implement, and deliver a successful database solution with Microsoft SQL Server 2014

[30] Kevin Cox (2007) Distributed Partitioned Views / Federated Databases: Lessons Learned

[32] Sikha S Bagui & Richard W Earp (2006) Learning SQL on SQL Server 2005

[100] Itzik Ben-Gan et al (2012) Exam 70-461: Querying Microsoft SQL Server 201

Acronyms:
DPV - Distributed Partition Views
UDF - User-Defined Function

SQL Troubles

Pages