26 December 2024

💎🏭SQL Reloaded: Microsoft Fabric's SQL Databases (Part III: Backups)

The SQL database in Microsoft Fabric supports a minimal setup, besides the name no additional information being provided. So a database administrator might ask himself whether the database is backed up and how often this happens, what types of backups are generated, where are they stored, how long are retained, respectively what's the process for restoring a database in case it was corrupted or other issues occur? 

The standard documentation provides the following information about the SQL databases in Fabric

  • retain sufficient regular full (every week), differential (every 12 hours), and log backups (every 10 minutes) to allow a PITR within the last 7 days by default (aka retention period), frequency that can't be modified currently! [1];
  • when a database is deleted, the backup is kept for the same retention period [1];
  • sufficient backups are taken starting with database's creation [1]
  • the exact timing of all database backups is determined by the SQL database service [1]
  • the schedule of backup jobs can't be changed or disabled [1]
  • all backups are stored on ZRS Azure storage accounts, 3 backups being copied synchronously across 3 Azure availability zones in the primary region; when ZRS is not available, backups are stored on LRS [1]
  • backups that are no longer needed to provide PITR functionality are automatically deleted [1]
  • all backups are compressed to reduce backup storage compression (on average the ratio is 3 to 4 times)
  • backups can't be accessed directly, as they are isolated from Microsoft Fabric platform and these are inaccessible by end-users, the PITR capability being the only way one can interact with backup files [1];
  • cross-workspace PITR is not supported [1];
  • deleting a workspace deletes all afferent databases and they can't be restored anymore [1];
  • there are no geo-replicated backups [1];
  • no long-term retention backups support is available [1];
  • backups can be restored only through Fabric portal ⇐ no REST API, Azure PowerShell, or CLI commands are supported currently [1];
  • one can't overwrite an existing database with a restore, therefore restoring a database resumes to creating a new SQL database in the same workspace as the original database [3];
  • restoring backups from dropped databases is not currently possible after the retention period [3].

The frequency and type of backups usually is matched to address the business' requirements, and not the other way around. This can be a harsh limitation for some businesses and probably they will be forced to find alternatives (e.g. versioning). 

The relatively small retention period can be an issue for many organizations. Especially in the BI area, one probably met situations in which the configuration data was deleted or modified, and the functionality stopped working, and this was discovered after weeks or months, especially when the functionality is not business critical. Properly documenting the functionality and its use can help alleviate some issues, though one can doubt that everything is adequately addressed. 

Providing solutions for each department and deployment type (e.g. production, test, development) will increase the complexity of solutions, though that's valid for scenarios (including BI). It will be most probably a nightmare to manage this, fully or semiautomated deployments being needed ideally in the process. 

It's also the question of what architectures can be adopted to overcome current limitations. 

According to the documentation [3], given that the functionality is still in preview, there's no cost or consumption of CPU resources associated with restoring databases.

The below script can be used to review the backups taken. Hopefully Microsoft will be in control of the backups, though what happens if for any reason no backup is taken for a long period? Such cases happened in the past in other type of environments, and it has chances to still happen in the future. It's also the question how the records for a failed backup look like, how the admins are informed, respectively how we can troubleshoot all this. 

-- SQL dadatabase - database backups history
SELECT dbb.backup_file_id
, dbb.logical_database_id 
, dbb.physical_database_name
, dbb.logical_server_name
, dbb.logical_database_name 
, dbb.backup_start_date 
, dbb.backup_finish_date 
, DateDiff(s, dbb.backup_finish_date, dbb.backup_start_date) Duration_Sec
, dbb.backup_type 
, CASE dbb.backup_type  
    WHEN 'D' THEN 'Database'  
    WHEN 'I' THEN 'Incremental or differential backup'
    WHEN 'L' THEN 'Log'  
  END backup_type_desc  
, dbb.in_retention
, CASE dbb.in_retention
    WHEN 1 THEN 'In retention'
    WHEN 0 THEN 'Out of retention'
  END in_retention_desc
FROM sys.dm_database_backups dbb
WHERE dbb.backup_type = 'D'
ORDER BY dbb.backup_start_date DESC
, dbb.backup_type

However, one needs also an overview to understand how often the backups are taken:

-- SQL dadatabase - database backups overview
SELECT dbb.logical_database_name
, FORMAT(dbb.backup_start_date, 'yyyy-MMM-dd') Date
, SUM(CASE WHEN dbb.backup_type = 'D' THEN 1 ELSE 0 END) [Database]
, SUM(CASE WHEN dbb.backup_type = 'I' THEN 1 ELSE 0 END) [Incremental] 
, SUM(CASE WHEN dbb.backup_type = 'L' THEN 1 ELSE 0 END) [Log]
, CASE dbb.in_retention
    WHEN 1 THEN 'In retention'
    WHEN 0 THEN 'Out of retention'
  END in_retention_desc
FROM sys.dm_database_backups dbb
--WHERE dbb.logical_database_name = 'AdventureWorks01'
GROUP BY dbb.logical_database_name
, FORMAT(dbb.backup_start_date, 'yyyy-MMM-dd')
, dbb.in_retention
ORDER BY Date DESC
, dbb.in_retention

Output:

logical_database_name Date Database Incremental Log in_retention_desc
AdventureWorks... 2024-Dec-26 0 1 54 In retention
AdventureWorks... 2024-Dec-25 0 2 134 In retention
AdventureWorks... 2024-Dec-24 0 2 110 In retention
AdventureWorks... 2024-Dec-23 0 1 20 In retention
AdventureWorks... 2024-Dec-22 0 0 10 In retention
AdventureWorks... 2024-Dec-21 1 0 0 Out of retention
AdventureWorks... 2024-Dec-21 0 1 105 In retention

Notes:
1) It's interesting that the only database backup is out of retention (see 6th record in the above table).
2) One can build further logic to focus for example on the backups occurred during critical periods (e.g. business hours).
3) According to [4] because a SQL database is a serverless product, it will automatically paused if there is no user activity, and this independently of the mirroring activity. So, the more activity in a database, the more backups will be available.
4) A user must be granted VIEW DATABASE STATE in the database to query the sys.dm_database_backups DMV or must be a member of any Fabric workspace role that can query it [2]. To use the PITR to restore a database the user must have Admin, Contributor or Member roles on the Fabric Workspace [3].

Happy coding!

Previous Post <<||>> Next Post

Resources:
[1] Microsoft Fabric (2024) Automatic backups in SQL database in Microsoft Fabric [link]
[2] Microsoft Fabric (2024) sys.dm_database_backups [link]
[3] Microsoft Fabric (2024) Restore from a backup in SQL database in Microsoft Fabric [link]
[4] Microsoft Fabric (2024) Mirroring Fabric SQL database in Microsoft Fabric [link]

Acronyms:
CLI - Command Line Interface
DMV - Dynamic Management View
LRS - locally redundant storage
PITR - point-in-time restore
ZRS - zone-redundant storage

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.