Showing posts with label catalog. Show all posts
Showing posts with label catalog. Show all posts

06 February 2024

SQL Reloaded: Microsoft Fabric's Delta Tables in Action - Data Objects Metadata

There seem to be four main sources for learning about the functionality available in the Delta Lake especially in what concerns the SQL dialect used by the Spark SQL: Databricks [1], Delta Lake [2], Azure Databricks [3], respectively the Data Engineering documentation in Microsoft Fabric [4] and the afferent certification material. Unfortunately, the latter focuses more on PySpark. So, until Microsoft addresses the gap, one can consult the other sources, check what's working and built thus the required knowledge for handling the various tasks. 

First of all, it's important to understand which the data objects available in Microsoft Fabric are. Based on [5] I could identify the following hierarchy:


According to the same source [5] the metastore contains all of the metadata that defines data objects in the lakehouse, while the catalog is the highest abstraction in the lakehouse. The database, called also a schema, keeps its standard definition - a collection of data objects, such as tables or views (aka relations) and functions. The tables, views and functions keep their standard definitions. Except the metastore, these are also the (securable) objects on which permissions can be set. 

One can explore the structure in Spark SQL by using the SHOW command:

-- explore the data objects from the lakehouse
SHOW CATALOGS;

SHOW DATABASES;

SHOW SCHEMAS;

SHOW CATALOGS;

SHOW VIEWS;

SHOW TABLES;

SHOW FUNCTIONS;

Moreover, one can list only the objects from an object from the parent (e.g. the tables existing in a database):
 
-- all tables from a database
SHOW TABLES FROM Testing;

-- all tables from a database matching a pattern
SHOW TABLES FROM Testing LIKE 'Asset*';

-- all tables from a database matching multiple patterns
SHOW TABLES FROM Testing LIKE 'Asset*|cit*';

Notes:
1) Same syntax applies for views and functions, respectively for the other objects in respect to their parents (from the hierarchy). 
2) Instead of FROM one can use the IN keyword, though I'm not sure what's the difference given that they seem to return same results.
3) For databases and tables one can use also the SQL Server to export the related metadata. Unfortunately, it's not the case for views and functions because when the respective objects are created in Spark SQL, they aren't visible over the SQL Endpoint, and vice versa.

4) Given that there are multiple environments that deal with delta tables, to minimize the confusion that might result from their use, it makes sense to use database as term instead of schema. 

References:
[1] Databricks (2024) SQL language reference (link)
[2] Delta Lake (2023) Delta Lake documentation (link)
[3] Microsoft Learn (2023) Azure Databricks documentation (link)
[4] Microsoft Learn (2023) Data Engineering documentation in Microsoft Fabric (link)
[5] Databricks (2023) Data objects in the Databricks lakehouse (link)

29 January 2017

Data Management: Data Dictionary (Definitions)

"The system tables that contain descriptions of the database objects and how they are structured." (Karen Paulsell et al, "Sybase SQL Server: Performance and Tuning Guide", 1996)

"A set of system tables stored in a catalog. A data dictionary includes definitions of database structures and related information, such as permissions." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"Software in which metadata is stored, manipulated and defined – a data dictionary is normally associated with a tool used to support software engineering." (Keith Gordon, "Principles of Data Management", 2007)

"A list of descriptions of data items to help developers stay on the same track." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The place where information about data that exists in the organization is stored. This should include both technical and business details about each data element." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"Data dictionary are mini database management systems that manages metadata. It is a repository of information about a database that documents data elements of a database. The data dictionary is an integral part of the database management systems and stores metadata or information about the database, attribute names and definitions for each table in the database." (Vijay K Pallaw, "Database Management Systems" 2nd Ed., 2010)

"In the days of mainframe computers, this was a listing of record layouts, describing each field in each type of file." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Software coupled with a data store for managing data definitions." (Craig S Mullins, "Database Administration", 2012)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations. They also contain information about which programs use which data and which users are interested in which reports." (SQL Server 2012 Glossary, "Microsoft", 2012)

"A reference by which a team can understand what data assets they have, how those assets were created, what they mean, and where to find them." (Evan Stubbs, "Delivering Business Analytics: Practical Guidelines for Best Practice", 2013)

"A repository of the metadata useful to the corporation" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A comprehensive record of business and technical definitions of the elements within a dataset. Also referred to as a business glossary." (Jonathan Ferrar et al, "The Power of People", 2017)

"A database containing data about all the databases in a database system. Data dictionaries store all the various schema and file specifications and their locations." (BAAN)

"A read-only collection of database tables and views containing reference information about the database, its structures, and its users." (Oracle)

"A set of system tables, stored in a catalog, that includes definitions of database structures and related information, such as permissions." (Microsoft Technet)

"A set of tables that keep track of the structure of both the database and the inventory of database objects." (IBM)

"A specialized type of database containing metadata; a repository of information describing the characteristics of data used to design, monitor, document, protect, and control data in information systems and databases; an application system supporting the definition and management of database metadata." (TOGAF)

"Metadata that keeps track of database objects such as tables, indexes, and table columns." (MySQL)

30 March 2011

SQL Server: Full-Text Catalog (Definitions)

"A catalog that stores a database's full-text index." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"The full-text catalog stores all of the full-text indexes for tables within a database." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"A Full-Text catalog is a special storage space used to house Full-Text indexes. By default, all Full-Text indexes are housed in a single catalog." (Thomas Moore, "EXAM CRAM™ 2: Designing and Implementing Databases with SQL Server 2000 Enterprise Edition", 2005)

"A special storage space used to house Full-Text indexes. By default, all Full-Text indexes are housed in a single catalog." (Thomas Moore, "MCTS 70-431: Implementing and Maintaining Microsoft SQL Server 2005", 2006)

"A full-text catalog is a logical grouping of SQL Server full-text indexes for management purposes." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"A collection of full-text index components and other files that are organized in a specific directory structure and contain the data that is needed to perform queries." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A full-text catalog is a logical grouping of SQL Server full-text indexes for management purposes." (Jay Natarajan et al, "Pro T-SQL 2012 Programmer's Guide" 3rd Ed., 2012)

"A logical grouping of SQL Server full-text indexes for management purposes." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

20 February 2009

DBMS: Database Catalog (Definitions)

"Tables provided in the Model database." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"The set of system tables in a database that describes the database contents." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A database catalog contains the definition of all the objects in the database, as well as the definition of the database." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"This is an object in a database that is used to organize full-text indexes." (Joseph L Jorden & Dandy Weyn, "MCTS Microsoft SQL Server 2005: Implementation and Maintenance Study Guide - Exam 70-431", 2006)

"A directory storing metadata." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"A group of schemas, usually composed of all schemas handled by a single DBMS." (Jan L Harrington, "SQL Clearly Explained 3rd Ed. ", 2010)

"The part of a database that contains the definition of all the objects in the database, as well as the definition of the database." (Microsoft, "SQL Server 2012 Glossary", 2012)

"Structured list made up of comparable objects, used as a reference." (Gilbert Raymond & Philippe Desfray, "Modeling Enterprise Architecture with TOGAF", 2014)

"A collection of tables and views that contains descriptions of objects such as tables, views, and indexes." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.