SQL Troubles: objects

Showing posts with label objects. Show all posts

01 January 2025

💎🏭SQL Reloaded: SQL Server Metadata (Part I: Indexes Overview)

There are scenarios in which it's useful to gather information about the available indexes and their definition as a primary step for troubleshooting or index maintenance. Moreover, it's useful to take a baseline of the defined indexes and update it accordingly when indexes change. A minimum of information can be gathered in Excel files or similar repositories, though for the same can be used also metadata tools, at least when they're easy to use and the associated costs aren't neglectable.

Usually, there are two levels at which the information is needed - at index, respectively at column level. Sometimes it's useful to have an independent query for each level of detail, though in data warehouses and similar use cases it's useful to provide a model on top of the metadata, at least to improve query's maintainability.

The first view encapsulates the logic needed to export the data and it's based on the sys.indexes, sys.objects, sys.schemas and sys.database_principals system objects.

-- create schema for metadata
CREATE SCHEMA meta;

-- clean after
DROP VIEW IF EXISTS meta.vIndexes

-- create views
CREATE OR ALTER VIEW meta.vIndexes
AS
-- sys metadata - indexes
SELECT OBJ.schema_id
, IND.object_id 
, IND.index_id 
, SCH.name schema_name
, OBJ.name table_name
, IND.name index_name
, IND.type_desc index_type
, IND.is_primary_key
, IND.is_unique_constraint
, IND.fill_factor
, IND.has_filter
, IND.auto_created
, IND.is_unique is_unique_index
, DBP.type principal_type
, DBP.type_desc principal_type_desc
--, INC.*
FROM sys.indexes IND WITH (NOLOCK)
     JOIN sys.objects OBJ WITH (NOLOCK)
       ON IND.object_id = OBJ.object_id
          JOIN sys.schemas SCH WITH (NOLOCK)
            ON OBJ.schema_id = SCH.schema_id
               JOIN sys.database_principals DBP
                 ON SCH.principal_id = DBP.principal_id 
WHERE DBP.type IN ('S', 'E')

Upon case, the needed information can be exported via a query like the one below:

-- sys metadata - index columns
SELECt IND.db_name
, IND.schema_name
, IND.table_name
, IND.index_name
, IND.index_type
, IND.principal_type
FROM meta.vIndexes IND
WHERE IND.schema_name = 'SalesLT'
  AND IND.table_name IN ('Address', 'Customer')
ORDER BY IND.table_name
, IND.index_name

Output:

db_name	schema_name	table_name	index_name	index_type	principal_type
AdventureWorks01...	SalesLT	Address	AK_Address_rowguid	NONCLUSTERED	S
AdventureWorks01...	SalesLT	Address	IX_Address_AddressLine1_AddressLine2_City_StateProvince_PostalCode_CountryRegion	NONCLUSTERED	S
AdventureWorks01...	SalesLT	Address	IX_Address_StateProvince	NONCLUSTERED	S
AdventureWorks01...	SalesLT	Address	PK_Address_AddressID	CLUSTERED	S
AdventureWorks01...	SalesLT	Customer	AK_Customer_rowguid	NONCLUSTERED	S
AdventureWorks01...	SalesLT	Customer	IX_Customer_EmailAddress	NONCLUSTERED	S
AdventureWorks01...	SalesLT	Customer	PK_Customer_CustomerID	CLUSTERED	S

Similarly, on top of the above view can be built a similar object that provides also the column-related information by adding the sys.index_columns and sys.columns system object to the logic:

-- clean after
DROP VIEW IF EXISTS meta.vIndexColumns

-- create 
CREATE OR ALTER VIEW meta.vIndexColumns
AS
-- sys metadata - index columns
SELECT IND.db_id
, IND.schema_id
, INC.object_id 
, INC.index_id 
, INC.index_column_id
, INC.column_id
, IND.db_name
, IND.schema_name
, IND.table_name
, IND.index_name
, COL.name column_name
, INC.key_ordinal
, INC.partition_ordinal
, IND.index_type
, IND.is_primary_key
, IND.is_unique_constraint
, IND.fill_factor
, IND.has_filter
, IND.auto_created
, IND.is_unique_index
, INC.is_descending_key
, INC.is_included_column
, IND.principal_type
, IND.principal_type_desc
FROM sys.index_columns INC
     JOIN sys.columns COL WITH (NOLOCK)
       ON INC.object_id = COL.object_id 
      AND INC.column_id = COL.column_id 
     JOIN meta.vIndexes IND WITH (NOLOCK)
       ON INC.object_id = IND.object_id
      AND INC.index_id = IND.index_id

And, there's an example of the query based on this view:

-- sys metadata - index columns
SELECt INC.schema_name
, INC.table_name
, INC.index_name
, INC.column_name
, INC.key_ordinal
, INC.index_type
, INC.principal_type
FROM meta.vIndexColumns INC
WHERE INC.schema_name = 'SalesLT'
  AND INC.table_name IN ('Address', 'Customer')
ORDER BY INC.table_name
, INC.index_name
, INC.key_ordinal

Output:

schema_name	table_name	index_name	column_name	key_ordinal	index_type	principal_type
SalesLT	Address	AK_Address_rowguid	rowguid	1	NONCLUSTERED	S
SalesLT	Address	IX_Address_AddressLine1_AddressLine2_City_StateProvince_PostalCode_CountryRegion	AddressLine1	1	NONCLUSTERED	S
SalesLT	Address	IX_Address_AddressLine1_AddressLine2_City_StateProvince_PostalCode_CountryRegion	AddressLine2	2	NONCLUSTERED	S
SalesLT	Address	IX_Address_AddressLine1_AddressLine2_City_StateProvince_PostalCode_CountryRegion	City	3	NONCLUSTERED	S
SalesLT	Address	IX_Address_AddressLine1_AddressLine2_City_StateProvince_PostalCode_CountryRegion	StateProvince	4	NONCLUSTERED	S
SalesLT	Address	IX_Address_AddressLine1_AddressLine2_City_StateProvince_PostalCode_CountryRegion	PostalCode	5	NONCLUSTERED	S
SalesLT	Address	IX_Address_AddressLine1_AddressLine2_City_StateProvince_PostalCode_CountryRegion	CountryRegion	6	NONCLUSTERED	S
SalesLT	Address	IX_Address_StateProvince	StateProvince	1	NONCLUSTERED	S
SalesLT	Address	PK_Address_AddressID	AddressID	1	CLUSTERED	S
SalesLT	Customer	AK_Customer_rowguid	rowguid	1	NONCLUSTERED	S
SalesLT	Customer	IX_Customer_EmailAddress	EmailAddress	1	NONCLUSTERED	S
SalesLT	Customer	PK_Customer_CustomerID	CustomerID	1	CLUSTERED	S

Notes:
1) As a DBA it's useful to take a baseline of the indexes defined and reevaluate their usefulness over time. These are one of the checks that should be done when one becomes responsible for the administration of a database server, independently of the vendor.
2) The definitions of the views can be extended as needed, though one should try to keep the overall complexity to a minimum.
3) There are voices against the use of NOLOCK. Feel free to change the objects accordingly!
4) It's useful to work in a dedicated schema (e.g. meta) and have a different naming convention that deviates slightly from one defined by Microsoft. This should make sure that no confusion with the system objects exists.
5) Azure SQL takes over many of the responsibilities for index maintenance. Even if indexes are managed automatically by the system, a baseline is still needed, at least to evaluate functionality's performance, respectively the changes that occurred in the environment.
6) Except attributes which were added for specific functionality over time, the queries should work at least starting with SQL Server 2005 forward.
7) See also the notes on clustered vs nonclustered indexes.

Happy coding!

Previous Post <<||>> Next Post

06 February 2024

💎🏭SQL Reloaded: Microsoft Fabric's Delta Tables in Action - Data Objects Metadata

There seem to be four main sources for learning about the functionality available in the Delta Lake especially in what concerns the SQL dialect used by the Spark SQL: Databricks [1], Delta Lake [2], Azure Databricks [3], respectively the Data Engineering documentation in Microsoft Fabric [4] and the afferent certification material. Unfortunately, the latter focuses more on PySpark. So, until Microsoft addresses the gap, one can consult the other sources, check what's working and built thus the required knowledge for handling the various tasks.

First of all, it's important to understand which the data objects available in Microsoft Fabric are. Based on [5] I could identify the following hierarchy:

According to the same source [5] the metastore contains all of the metadata that defines data objects in the lakehouse, while the catalog is the highest abstraction in the lakehouse. The database, called also a schema, keeps its standard definition - a collection of data objects, such as tables or views (aka relations) and functions. The tables, views and functions keep their standard definitions. Except the metastore, these are also the (securable) objects on which permissions can be set.

One can explore the structure in Spark SQL by using the SHOW command:

-- explore the data objects from the lakehouse
SHOW CATALOGS;

SHOW DATABASES;

SHOW SCHEMAS;

SHOW CATALOGS;

SHOW VIEWS;

SHOW TABLES;

SHOW FUNCTIONS;

Moreover, one can list only the objects from an object from the parent (e.g. the tables existing in a database):

-- all tables from a database
SHOW TABLES FROM Testing;

-- all tables from a database matching a pattern
SHOW TABLES FROM Testing LIKE 'Asset*';

-- all tables from a database matching multiple patterns
SHOW TABLES FROM Testing LIKE 'Asset*|cit*';

Notes:
1) Same syntax applies for views and functions, respectively for the other objects in respect to their parents (from the hierarchy).
2) Instead of FROM one can use the IN keyword, though I'm not sure what's the difference given that they seem to return same results.
3) For databases and tables one can use also the SQL Server to export the related metadata. Unfortunately, it's not the case for views and functions because when the respective objects are created in Spark SQL, they aren't visible over the SQL Endpoint, and vice versa.

4) Given that there are multiple environments that deal with delta tables, to minimize the confusion that might result from their use, it makes sense to use database as term instead of schema.

References:
[1] Databricks (2024) SQL language reference (link)
[2] Delta Lake (2023) Delta Lake documentation (link)
[3] Microsoft Learn (2023) Azure Databricks documentation (link)
[4] Microsoft Learn (2023) Data Engineering documentation in Microsoft Fabric (link)
[5] Databricks (2023) Data objects in the Databricks lakehouse (link)

13 June 2009

🛢DBMS: Conceptual Data Model (Definitions)

"Represents the overall logical structure of a database, which is independent of any software or data storage structure. A Conceptual model often contains data objects not yet implemented in the physical databases. It gives a formal representation of the data needed to run an enterprise or a business activity." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"An organization of data that describes the relationships among the primitive data elements. For example, in the ER model, it is a diagram of the entities, their relationships, and their attributes." (Sam Lightstone et al, "Physical Database Design: The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More", 2007)

"An abstract model or representation of data for a particular domain, business enterprise, field of study, etc., independent of any specific software or information system. Usually expressed in terms of entities and relationships." (J P Getty Trust, "Introduction to Metadata" 2nd Ed., 2008)

"An organization of data that describes the relationships among the primitive data elements. For example, in the ER model, it is a diagram of the entities, their relationships, and their attributes." (Toby J Teorey, ", Database Modeling and Design" 4th Ed, 2010)

"In the ANSI four-schema architecture, this is a description of a portion of an enterprise in terms of the fundamental things of significant interest to it. They are fundamental in that most things seen by business owners are examples of these. The model is constructed in a rigorous manner, being fully normalized, eschewing many-to-many relationships and is expressed in terms of binary relationships only." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A data model that is presented at a high level of abstraction, hiding the underlying details, and making it easier for people to comprehend. A conceptual model should reflect the phenomena in the users' world being modeled as directly as possible, as close to the way the users think. For example, many-to-many relationships are common in conceptual models." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A high-level data model identifying major entities and relationships, not fully attributed and therefore not necessarily normalized." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed., 2012)

"A high-level data modeling that specifies an abstract map of concepts and their relationships. SQL’s inherent hierarchical data modeling does this easily and naturally." (Michael M David & Lee Fesperman, "Advanced SQL Dynamic Data Modeling and Hierarchical Processing", 2013)

"A detailed model that captures the overall structure of organizational data that is independent of any database management system or other implementation considerations." (Jeffrey A. Hoffer et al, "Modern Systems Analysis and Design" 7th Ed., 2014)

"The output of the conceptual design process. The conceptual model provides a global view of an entire database. Describes the main data objects, avoiding details." (Carlos Coronel & Steven Morris, "Database Systems: Design, Implementation, & Management" Ed. 11, 2014)

"A model of a database expressed at platform independent level; describes entities from a considered domain, properties of these entities, and the relationships between entities." (Iwona Dubielewicz et al, "Quality-Driven Database System Development within MDA Approach", 2015)

"The most abstract form of data model. It includes the important entities and the relationships among them and contains only major attributes." (Besma Khalfi et al, "Enhanced F-Perceptory Approach for Dealing with Geographic Data Imprecision from the Conceptual Modeling to the Fuzzy Geographical Database Building", 2017)

12 March 2009

🛢DBMS: Object (Definitions)

"Conceptually a container that holds data in the database. Objects have various properties including an access control list." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"One of the components of a database: a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure. Also called a database object. In COM programming, an object has properties and methods and exposes interfaces; for example, the SQL-DMO is a hierarchy of COM objects." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"An active data value that has characteristics and properties." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours 2nd Ed.", 2001)

"In databases, one of the components of a database: a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"An instance of an item of interest to the data model." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"In databases, one of the components of a database: a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure. In object-oriented programming, an instance of a class." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A set of instructions, generated at application compilation time, that is created and managed by a DBMS. The access plan predetermines the way an application’s query will access the database at run time." (Carlos Coronel & Steven Morris, "Database Systems: Design, Implementation, & Management" Ed. 11, 2014)

"A database component in a database. Can also refer to the database itself." (Technet)

"A database object in a relational database is a data structure used to either store or reference data." (Techopedia) [source]

"An object in the database that can be manipulated with SQL. Schema objects such as tables and indexes reside in schemas. Nonschema objects such as directories and roles do not reside in schemas." (Oracle)

"An object that exists in an installation of a database system, such as an instance, a database, a database partition group, a buffer pool, a table, or an index." (IBM)

SQL Troubles

Pages