SQL Troubles: 💎🏭SQL Reloaded: Microsoft Fabric's Delta Tables in Action

04 February 2024

💎🏭SQL Reloaded: Microsoft Fabric's Delta Tables in Action - CRUD Operations I (Basic Operations)

When building a data-related solution, the most basic functionality for a SQL developer is the CRUD (create, read, update, delete) operations. That's why I decided to test the feature on Lakehouse delta tables, for a first demonstrative post on Microsoft Fabric.

Unfortunately, at least for the moment, the SQL Endpoint allows only read access to the delta tables, which frankly for an SQL developer is an important limitation. There's however the possibility of running SQL over Spark SQL, which supports a basic dialect of SQL (similar to HiveSQL, though it might lack some features). To run the code, you'll need to create a notebook and you can set the language to Spark SQL for the whole notebook. Make sure, that you have a Lakehouse assigned in the notebook. Of course, you'll need access to Microsoft Fabric.

Create a cell for each of the following blocks of code. To make sure that the code works, it might be a good idea to run the cells individually!

-- drop the test table (if exists already)
DROP TABLE IF EXISTS Assets;

--create the test table
CREATE TABLE Assets(
 Id int NOT NULL,
 CreationDate timestamp NOT NULL,
 Vendor string NOT NULL,
 Asset string NOT NULL,
 Model string NOT NULL,
 Owner string NOT NULL,
 Tag string NOT NULL,
 Quantity decimal(13, 2) NOT NULL
) 
USING DELTA;

-- insert test data
INSERT INTO Assets
VALUES (1, '2024-03-01T00:00:00Z', 'IBM', 'Laptop 1','Model 1','Owner 1','XX0001','1');

-- insert more test records 
INSERT INTO Assets
VALUES ('2', '2024-02-01T00:00:00Z','IBM','Laptop 2','Model 2','Owner 2','XX0001','1')
, ('3', '2024-02-01T00:00:00Z','Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1')
, ('4', '2024-01-01T00:00:00Z','Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1')
, ('5', '2024-01-15T00:00:00Z','Dell','Laptop 4','Model 4','Owner 3','DD0001','1')
, ('6', '2024-01-16T00:00:00Z','Dell','Laptop 4','Model 4','Owner 4','DD0001','1');

-- retrieving all the records
SELECT *
FROM Assets
ORDER BY Vendor;

-- updating a record
UPDATE Assets 
SET CreationDate = '2024-01-19T00:00:00Z'
WHERE Id = 6;

-- deleting a record
DELETE FROM Assets
WHERE Id = 2;

-- reviewing the changes
SELECT *
FROM Assets
ORDER BY Vendor;

Comments:
[1] There seems to be no way to provide a schema (see the SQL syntax for Spark), however the dbo schema seems to be used in the background (see table metadata post). Calling a table with the schema in SQL Spark results in an error:
"[TABLE_OR_VIEW_NOT_FOUND] The table or view `<schema>`.`<object_name>` cannot be found. Verify the spelling and correctness of the schema and catalog."

[2] The data types are different from the ones in SQL Server (e.g. timestamp for dates, string instead of varchars). Dates and decimals support precision, while strings translate to a varchar with 8000 characters (see table metadata post).

[3] Implicit conversions seem to occur when the format is correct. Otherwise, an explicit conversion is needed.

[4] Consider using semicolons at the end of each statement. That's mandatory when running multiple statements within the same cell. Otherwise, an error results.

[5] It would be interesting to test what's the performance when doing CRUD operations on large datasets.

[6] One can use the LIMIT x clause as alternative for TOP x from T-SQL:

-- reviewing the last 3 records
SELECT *
FROM Assets
ORDER BY CreationDate
LIMIT 3;

[7] SELECT FROM VALUES seems to work as well, though the data types must be the same (no implicit conversions occur):

SELECT *
FROM (
VALUES ('1', '2024-03-01T00:00:00Z', 'IBM', 'Laptop 1','Model 1','Owner 1','XX0001','1')
, ('2', '2024-02-01T00:00:00Z','IBM','Laptop 2','Model 2','Owner 2','XX0001','1')
) DAT(Id, CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity);

Therefore, when bringing the two INSERTs together, you'll need to change the first value from the first INSERT for the ID from numeric to string.

You can use the DESCRIBE QUERY command to troubleshoot the differences:

 -- retrieve query's output metadata
 DESCRIBE QUERY 
 VALUES (1, '2024-03-01T00:00:00Z', 'IBM','Laptop 1','Model 1','Owner 1','XX0001','1') 
 AS DAT(Id, CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity)

Output:

col_name	data_type	comment
Id	int	NULL
CreationDate	string	NULL
Vendor	string	NULL
Asset	string	NULL
Model	string	NULL
Owner	string	NULL
Tag	string	NULL
Quantity	string	NULL

Happy coding!

Resources:
[1] Microsoft Learn (2023) How to use Microsoft Fabric notebooks (link)
[2] Apache Spark (2023) SQL syntax (link)

SQL Troubles

Pages

04 February 2024

💎🏭SQL Reloaded: Microsoft Fabric's Delta Tables in Action - CRUD Operations I (Basic Operations)

No comments:

About Me