When building a data-related solution, the most basic functionality for a SQL developer is the CRUD (create, read, update, delete) operations. That's why I decided to test the feature on Lakehouse delta tables, for a first demonstrative post on Microsoft Fabric.
Unfortunately, at least for the moment, the SQL Endpoint allows only read access to the delta tables, which frankly for an SQL developer is an important limitation. There's however the possibility of running SQL over Spark SQL, which supports a basic dialect of SQL (similar to HiveSQL, though it might lack some features). To run the code, you'll need to create a notebook and you can set the language to Spark SQL for the whole notebook. Make sure, that you have a Lakehouse assigned in the notebook. Of course, you'll need access to Microsoft Fabric.
Create a cell for each of the following blocks of code. To make sure that the code works, it might be a good idea to run the cells individually!
-- drop the test table (if exists already) DROP TABLE IF EXISTS Assets; --create the test table CREATE TABLE Assets( Id int NOT NULL, CreationDate timestamp NOT NULL, Vendor string NOT NULL, Asset string NOT NULL, Model string NOT NULL, Owner string NOT NULL, Tag string NOT NULL, Quantity decimal(13, 2) NOT NULL ) USING DELTA;
-- insert test data INSERT INTO Assets VALUES (1, '2024-03-01T00:00:00Z', 'IBM', 'Laptop 1','Model 1','Owner 1','XX0001','1');
-- insert more test records INSERT INTO Assets VALUES ('2', '2024-02-01T00:00:00Z','IBM','Laptop 2','Model 2','Owner 2','XX0001','1') , ('3', '2024-02-01T00:00:00Z','Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1') , ('4', '2024-01-01T00:00:00Z','Microsoft','Laptop 3','Model 3','Owner 2','WX0001','1') , ('5', '2024-01-15T00:00:00Z','Dell','Laptop 4','Model 4','Owner 3','DD0001','1') , ('6', '2024-01-16T00:00:00Z','Dell','Laptop 4','Model 4','Owner 4','DD0001','1');
-- retrieving all the records SELECT * FROM Assets ORDER BY Vendor;
-- updating a record UPDATE Assets SET CreationDate = '2024-01-19T00:00:00Z' WHERE Id = 6;
-- deleting a record DELETE FROM Assets WHERE Id = 2;
-- reviewing the changes SELECT * FROM Assets ORDER BY Vendor;
Comments:
[1] There seems to be no way to provide a schema (see the SQL syntax for Spark), however the dbo schema seems to be used in the background (see table metadata post). Calling a table with the schema in SQL Spark results in an error:
"[TABLE_OR_VIEW_NOT_FOUND] The table or view `<schema>`.`<object_name>` cannot be found. Verify the spelling and correctness of the schema and catalog."
[2] The data types are different from the ones in SQL Server (e.g. timestamp for dates, string instead of varchars). Dates and decimals support precision, while strings translate to a varchar with 8000 characters (see table metadata post).
[3] Implicit conversions seem to occur when the format is correct. Otherwise, an explicit conversion is needed.
[4] Consider using semicolons at the end of each statement. That's mandatory when running multiple statements within the same cell. Otherwise, an error results.
[5] It would be interesting to test what's the performance when doing CRUD operations on large datasets.
[6] One can use the LIMIT x clause as alternative for TOP x from T-SQL:
-- reviewing the last 3 records SELECT * FROM Assets ORDER BY CreationDate LIMIT 3;
[7] SELECT FROM VALUES seems to work as well, though the data types must be the same (no implicit conversions occur):
SELECT * FROM ( VALUES ('1', '2024-03-01T00:00:00Z', 'IBM', 'Laptop 1','Model 1','Owner 1','XX0001','1') , ('2', '2024-02-01T00:00:00Z','IBM','Laptop 2','Model 2','Owner 2','XX0001','1') ) DAT(Id, CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity);
Therefore, when bringing the two INSERTs together, you'll need to change the first value from the first INSERT for the ID from numeric to string.
You can use the DESCRIBE QUERY command to troubleshoot the differences:-- retrieve query's output metadata DESCRIBE QUERY VALUES (1, '2024-03-01T00:00:00Z', 'IBM','Laptop 1','Model 1','Owner 1','XX0001','1') AS DAT(Id, CreationDate, Vendor, Asset, Model, Owner, Tag, Quantity)
col_name | data_type | comment |
Id | int | NULL |
CreationDate | string | NULL |
Vendor | string | NULL |
Asset | string | NULL |
Model | string | NULL |
Owner | string | NULL |
Tag | string | NULL |
Quantity | string | NULL |
Happy coding!
Resources:
[1] Microsoft Learn (2023) How to use Microsoft Fabric notebooks (link)
[2] Apache Spark (2023) SQL syntax (link)
No comments:
Post a Comment