SQL Troubles: 💎🏭SQL Reloaded: Misusing Views in Serverless SQL Pool

11 February 2023

💎🏭SQL Reloaded: Misusing Views in Serverless SQL Pool

The lack of user-defined tables in serverless SQL pool is probably one of the biggest gaps one needs to overcome as developer in Azure Synapse. On the other side, given that views are "virtual tables", views can be misused to store small chunks of data that don't change that often.

In the process there are two activities involved - preparing the data, respectively creating the view. Some developers might prefer preparing the data in Excel though, when possible, I prepare the data directly from the source system. For example, the first four columns from the below query can be used for the first approach, while the last column prepares the data as needed by a VALUES.

-- Product subcategories denormalized
SELECT PCT.ProductSubcategoryID
, PCT.ProductCategoryID
, PSC.Name Category
, PCT.Name Subcategory 
, CONCAT(', (''', PCT.ProductSubcategoryID, ''', ''', PCT.ProductCategoryID, ''', ''', PSC.Name, ''', ''', PCT.Name, ''')') [Values] 
FROM Production.ProductSubcategory PCT
     JOIN Production.ProductCategory PSC
	   ON PCT.ProductCategoryID = PSC.ProductCategoryID

Independently of the method used for data preparation, the view can be built as follows:

-- creating the view
CREATE VIEW dbo.ProductCategories
AS
-- Product categories
SELECT Cast(DAT.ProductSubcategoryID as int) ProductSubcategoryID
, Cast(DAT.ProductCategoryID as int) ProductCategoryID
, Cast(DAT.Category as nvarchar(255)) Category
, Cast(DAT.Subcategory  as nvarchar(255)) Subcategory
, Dense_Rank() OVER (PARTITION BY DAT.Category, DAT.Subcategory

         ORDER BY DAT.ProductSubcategoryID, DAT.ProductCategoryID) Ranking
FROM (-- prepared data
VALUES ('1', '1', 'Bikes', 'Mountain Bikes')
, ('2', '1', 'Bikes', 'Road Bikes')
, ('3', '1', 'Bikes', 'Touring Bikes')
, ('4', '2', 'Components', 'Handlebars')
, ('5', '2', 'Components', 'Bottom Brackets')
, ('6', '2', 'Components', 'Brakes')

) DAT(ProductSubcategoryID, ProductCategoryID, Category, Subcategory)

-- testing the view
SELECT *
FROM dbo.ProductCategories

Observe that for each column was defined explicitly a data type and, even if the definition might change, it's still a better idea than relying on the database engine for inferring the data type. Secondly, I prefer to include also a "Ranking" column based on some attributes I'm expecting to be unique over the whole dataset. This, just in case some duplicate might make its way into the dataset. This is not a must, but something to consider in exceptional cases.

The alternative would be to save the same data into a file and make it available as a CETAS. For small datasets the overhead can be minimal. This can be a better idea if the users need to modify the data manually. On the other side, storing the definition of a view into an SQL file and making corrections as needed can prove to be faster for a developer, at least until the dataset stabilizes and less changes are needed.

It's important to understand that in this way a feature is misused and there are more likely some penalties deriving from it. As mentioned in the beginning, the dataset must be small (e.g. maximum a few hundreds of records) and change seldom. In extremis, it's not advisable to build a solution based on this!

One scenario I had to use this option was creating a dataset based on recursive logic built inside of a stored procedure, which would output the data in the form needed by the view, as done above. This was used as workaround, as I couldn't save the data via a pipeline. (I had a deja vu from the first attempts of exporting data in SSIS 2005!)

In theory the maintenance of such a view can be automated via a pipeline, if the effort makes sense, and the solution is stable enough.

I blogged some years back on Misusing Views and Pseudo-Constants.

Happy coding!

Previous Post <<||>> Next Post

SQL Troubles

Pages

11 February 2023

💎🏭SQL Reloaded: Misusing Views in Serverless SQL Pool

No comments:

About Me