Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!
Last updated: 11-Mar-2024
|  | 
| Warehouse vs SQL analytics endpoint in Microsoft Fabric [3] | 
[Microsoft Fabric] Warehouse
- {def} highly available relational data warehouse that can be used to store and query data in the Lakehouse
- supports the full transactional T-SQL capabilities
- modernized version of the traditional data warehouse
- unifies capabilities from Synapse Dedicated and Serverless SQL Pools
- resources are managed elastically to provide the best possible performance
- ⇒ no need to think about indexing or distribution
- a new parser gives enhanced CSV file ingestion time
- metadata is cached in addition to data
- improved assignment of compute resources to milliseconds
- multi-TB result sets are streamed to the client
- leverages a distributed query processing engine
- provides with workloads that have a natural isolation boundary [3]
- true isolation is achieved by separating workloads with different characteristics, ensuring that ETL jobs never interfere with their ad hoc analytics and reporting workloads [3]
- {operation} data ingestion
- involves moving data from source systems into the data warehouse [2]
- the data becomes available for analysis [1]
- via Pipelines, Dataflows, cross-database querying, COPY INTO command
- no need to copy data from the lakehouse to the data warehouse [1]
- one can query data in the lakehouse directly from the data warehouse using cross-database querying [1]
- {operation} data storage
- involves storing the data in a format that is optimized for analytics [2]
- {operation} data processing
- involves transforming the data into a format that is ready for consumption by analytical tools [1]
- {operation} data analysis and delivery
- involves analyzing the data to gain insights and delivering those insights to the business [1]
- {operation} designing a warehouse (aka warehouse design)
- standard warehouse design can be used
- {operation} sharing a warehouse (aka warehouse sharing)
- a way to provide users read access to the warehouse for downstream consumption
- via SQL, Spark, or Power BI
- the level of permissions can be customized to provide the appropriate level of access
- {feature} mirroring
- provides a modern way of accessing and ingesting data continuously and seamlessly from any database or data warehouse into the Data Warehousing experience in Fabric
- any database can be accessed and managed centrally from within Fabric without having to switch database clients
- data is replicated in a reliable way in real-time and lands as Delta tables for consumption in any Fabric workload
- {feature} v-order
- write time optimization to the parquet file format that enables lightning-fast reads under the MF compute engine [5]
- {feature} caching
- stores frequently accessed data and metadata in a faster storage layer [6]
- {feature} snapshots
- {def} read-only representation of a warehouse at a specific point in time [7]
- {feature} automatic purging
- routinely and systematically eliminating expired data periodically [8]
- {benefit} proactively helps maintain an efficient and cost-effective data infrastructure [8]
- via garbage collection
- background process, that periodically identifies and cleans [8]
- all the data and log files of dropped tables
- aborted transactions
- temporary tables
- expired files
- executes every 24 hours, when the warehouse is active [8]
- ensures the data warehouse remains optimized and efficient [8]
- {goal} storage cost optimization
- {goal} minimize maintenance overhead
- {goal} adhering to data retention regulations
- {concept} SQL analytics endpoint
- a warehouse that is automatically generated from a Lakehouse in Microsoft Fabric [3]
- {concept} virtual warehouse
- can containing data from virtually any source by using shortcuts [3]
- {concept} cross database querying
- enables to quickly and seamlessly leverage multiple data sources for fast insights and with zero data duplication [3]
References:
[1] Microsoft Learn (2023) Fabric: Get started with data warehouses in Microsoft Fabric (link) 
[2] Microsoft Learn (2023) Fabric: Microsoft Fabric decision guide: choose a data store (link)
[3] Microsoft Learn (2024) Fabric: What is data warehousing in Microsoft Fabric? (link)
[3] Microsoft Learn (2024) Fabric: What is data warehousing in Microsoft Fabric? (link)
[4] Microsoft Learn (2023) Fabric: Better together: the lakehouse and warehouse (link)
[5] Microsoft Learn (2024) Fabric: Understand V-Order for Microsoft Fabric Warehouse [link]
[6] Microsoft Learn (2024) Fabric: Caching in Fabric data warehousing [link]
[7] Microsoft Learn (2024) Fabric: Warehouse Snapshots in Microsoft Fabric (Preview) [link]
[8] Microsoft Fabric Updates Blog (2025) Intelligent Data Cleanup: Smart Purging for Smarter Data Warehouses [link] 
Resources:
[R1] Microsoft Learn (2023) Fabric: Data warehousing documentation in Microsoft Fabric (link)
Acronyms:
ETL - Extract, Transfer, Load
MF - Microsoft Fabric
 

No comments:
Post a Comment