Showing posts with label objects. Show all posts
Showing posts with label objects. Show all posts

06 February 2024

SQL Reloaded: Microsoft Fabric's Delta Tables in Action - Data Objects Metadata

There seem to be four main sources for learning about the functionality available in the Delta Lake especially in what concerns the SQL dialect used by the Spark SQL: Databricks [1], Delta Lake [2], Azure Databricks [3], respectively the Data Engineering documentation in Microsoft Fabric [4] and the afferent certification material. Unfortunately, the latter focuses more on PySpark. So, until Microsoft addresses the gap, one can consult the other sources, check what's working and built thus the required knowledge for handling the various tasks. 

First of all, it's important to understand which the data objects available in Microsoft Fabric are. Based on [5] I could identify the following hierarchy:


According to the same source [5] the metastore contains all of the metadata that defines data objects in the lakehouse, while the catalog is the highest abstraction in the lakehouse. The database, called also a schema, keeps its standard definition - a collection of data objects, such as tables or views (aka relations) and functions. The tables, views and functions keep their standard definitions. Except the metastore, these are also the (securable) objects on which permissions can be set. 

One can explore the structure in Spark SQL by using the SHOW command:

-- explore the data objects from the lakehouse
SHOW CATALOGS;

SHOW DATABASES;

SHOW SCHEMAS;

SHOW CATALOGS;

SHOW VIEWS;

SHOW TABLES;

SHOW FUNCTIONS;

Moreover, one can list only the objects from an object from the parent (e.g. the tables existing in a database):
 
-- all tables from a database
SHOW TABLES FROM Testing;

-- all tables from a database matching a pattern
SHOW TABLES FROM Testing LIKE 'Asset*';

-- all tables from a database matching multiple patterns
SHOW TABLES FROM Testing LIKE 'Asset*|cit*';

Notes:
1) Same syntax applies for views and functions, respectively for the other objects in respect to their parents (from the hierarchy). 
2) Instead of FROM one can use the IN keyword, though I'm not sure what's the difference given that they seem to return same results.
3) For databases and tables one can use also the SQL Server to export the related metadata. Unfortunately, it's not the case for views and functions because when the respective objects are created in Spark SQL, they aren't visible over the SQL Endpoint, and vice versa.

4) Given that there are multiple environments that deal with delta tables, to minimize the confusion that might result from their use, it makes sense to use database as term instead of schema. 

References:
[1] Databricks (2024) SQL language reference (link)
[2] Delta Lake (2023) Delta Lake documentation (link)
[3] Microsoft Learn (2023) Azure Databricks documentation (link)
[4] Microsoft Learn (2023) Data Engineering documentation in Microsoft Fabric (link)
[5] Databricks (2023) Data objects in the Databricks lakehouse (link)

13 June 2009

DBMS: Conceptual Data Model (Definitions)

"Represents the overall logical structure of a database, which is independent of any software or data storage structure. A Conceptual model often contains data objects not yet implemented in the physical databases. It gives a formal representation of the data needed to run an enterprise or a business activity." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"An organization of data that describes the relationships among the primitive data elements. For example, in the ER model, it is a diagram of the entities, their relationships, and their attributes." (Sam Lightstone et al, "Physical Database Design: The Database Professional’s Guide to Exploiting Indexes, Views, Storage, and More", 2007)

"An abstract model or representation of data for a particular domain, business enterprise, field of study, etc., independent of any specific software or information system. Usually expressed in terms of entities and relationships." (J P Getty Trust, "Introduction to Metadata" 2nd Ed., 2008)

"An organization of data that describes the relationships among the primitive data elements. For example, in the ER model, it is a diagram of the entities, their relationships, and their attributes." (Toby J Teorey, ", Database Modeling and Design" 4th Ed, 2010)

"In the ANSI four-schema architecture, this is a description of a portion of an enterprise in terms of the fundamental things of significant interest to it. They are fundamental in that most things seen by business owners are examples of these. The model is constructed in a rigorous manner, being fully normalized, eschewing many-to-many relationships and is expressed in terms of binary relationships only." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A data model that is presented at a high level of abstraction, hiding the underlying details, and making it easier for people to comprehend. A conceptual model should reflect the phenomena in the users' world being modeled as directly as possible, as close to the way the users think. For example, many-to-many relationships are common in conceptual models." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A high-level data model identifying major entities and relationships, not fully attributed and therefore not necessarily normalized." (Craig S Mullins, "Database Administration: The Complete Guide to DBA Practices and Procedures" 2nd Ed., 2012)

"A high-level data modeling that specifies an abstract map of concepts and their relationships. SQL’s inherent hierarchical data modeling does this easily and naturally." (Michael M David & Lee Fesperman, "Advanced SQL Dynamic Data Modeling and Hierarchical Processing", 2013)

"A detailed model that captures the overall structure of organizational data that is independent of any database management system or other implementation considerations." (Jeffrey A. Hoffer et al, "Modern Systems Analysis and Design" 7th Ed., 2014)

"The output of the conceptual design process. The conceptual model provides a global view of an entire database. Describes the main data objects, avoiding details." (Carlos Coronel & Steven Morris, "Database Systems: Design, Implementation, & Management"  Ed. 11, 2014)

"A model of a database expressed at platform independent level; describes entities from a considered domain, properties of these entities, and the relationships between entities." (Iwona Dubielewicz et al, "Quality-Driven Database System Development within MDA Approach", 2015)

"The most abstract form of data model. It includes the important entities and the relationships among them and contains only major attributes." (Besma Khalfi et al, "Enhanced F-Perceptory Approach for Dealing with Geographic Data Imprecision from the Conceptual Modeling to the Fuzzy Geographical Database Building", 2017)

12 March 2009

DBMS: Object (Definitions)

"Conceptually a container that holds data in the database. Objects have various properties including an access control list." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"One of the components of a database: a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure. Also called a database object. In COM programming, an object has properties and methods and exposes interfaces; for example, the SQL-DMO is a hierarchy of COM objects." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"An active data value that has characteristics and properties." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours 2nd Ed.", 2001)

"In databases, one of the components of a database: a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"An instance of an item of interest to the data model." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"In databases, one of the components of a database: a table, index, trigger, view, key, constraint, default, rule, user-defined data type, or stored procedure. In object-oriented programming, an instance of a class." (Jim Joseph, "Microsoft SQL Server 2008 Reporting Services Unleashed", 2009)

"A set of instructions, generated at application compilation time, that is created and managed by a DBMS. The access plan predetermines the way an application’s query will access the database at run time." (Carlos Coronel & Steven Morris, "Database Systems: Design, Implementation, & Management" Ed. 11, 2014)

"A database component in a database. Can also refer to the database itself." (Technet)

"A database object in a relational database is a data structure used to either store or reference data." (Techopedia) [source]

"An object in the database that can be manipulated with SQL. Schema objects such as tables and indexes reside in schemas. Nonschema objects such as directories and roles do not reside in schemas." (Oracle)

"An object that exists in an installation of a database system, such as an instance, a database, a database partition group, a buffer pool, a table, or an index." (IBM)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.