Showing posts with label cache. Show all posts
Showing posts with label cache. Show all posts

17 March 2025

🏭🗒️Microsoft Fabric: Caching in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 17-Mar-2024

[Microsoft Fabric] Caching
  • {def} technique that improves the performance of data processing by storing frequently accessed data and metadata in a faster storage layer [1]
    • e.g. local memory, local SSD disk
    • ⇐ subsequent requests can be served faster, directly from the cache [1]
      • if a set of data has been previously accessed by a query, any subsequent queries will retrieve that data directly from the in-memory cache [1]
      • local memory operations are notably faster compared to fetching data from remote storage [1]
      • ⇐ significantly diminishes IO latency [1]
    • fully transparent to the user
    • consistently active and operates seamlessly in the background [1]
    • orchestrated and upheld by MF
      • it doesn't offer users the capability to manually clear the cache [1] 
    • provides transactional consistency
      • ensures that any modifications to the data in storage after it has been initially loaded into the in-memory cache, will result in consistent data [1]
    • when the cache reaches its capacity threshold and fresh data is being read for the first time, objects that have remained unused for the longest duration will be removed from the cache [1]
      • process is enacted to create space for the influx of new data and maintain an optimal cache utilization strategy [1] 
  • {type} in-memory cache
    • data in cache is organized in a compressed columnar format (aka columnar storage) [1]
      • ⇐ optimized for analytical queries
      • each column of data is stored separately [1]
        • {benefit} allows for better compression [1]
          • since similar data values are stored together [1]
        • {benefit} reduces the memory footprint
    • when queries need to perform operations on a specific column, the engine can work more efficiently speeding up queries' execution [1]
      • it doesn't have to process unnecessary data from other columns
      • can perform operations on multiple columns simultaneously [1]
        • taking advantage of modern multi-core processors [1]
    • when retrieves data from storage, the data is transformed from its original file-based format into highly optimized structures in in-memory cache [1]
    • {scenario} analytical workloads where queries involve scanning large amounts of data to perform aggregations, filtering, and other data manipulations [1]
  • {type} disk cache
    • complementary extension to the in-memory cache
    • any data loaded into the in-memory cache is also serialized to the SSD cache [1]
      • data removed from the in-memory cache remains within the SSD cache for an extended period
      • when subsequent query requests the data, it is retrieved from the SSD cache into the in-memory cache quicker [1]
  • {issue} cold run (aka cold cache) performance
    • the first 1-3 executions of a query perform noticeably slower than subsequent executions [2]
      • if the first run's performance is crucial, try manually creating statistics (aka pre-warming the cache) [2]
      • otherwise, one can rely on the automatic statistics generated in the first query run and leveraged in subsequent runs [2]
      • as long as underlying data does not change significantly [2]
  • differentiated from
    • [Kusto] caching policy [link]
    • [Apache Spark] intelligent cache [link]
    • [Power BI] query caching [link]
    • [Azure] caching [link]
    References:
    [1] Microsoft Learn (2024) Fabric: Caching in Fabric data warehousing [link]
    [2] Microsoft Learn (2024) Fabric Data Warehouse performance guidelines [link]
    [3] Sandeep Pawar (2023) Pre-Warming The Direct Lake Dataset For Warm Cache Import-Like Performance [link]

    Resources:
    [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    Acronyms:
    IO - Input/Output
    MF - Microsoft Fabric
    SSD - Solid State Drive

    13 June 2011

    🔹SQL Server: Procedure Cache (Definitions)

    "An area of SQL Server memory used to hold trees and query plans that have recently been used or are currently in use by SQL Server." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

    "A temporary storage location for the current, executing version of a specific stored procedure." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

    "The part of the SQL Server memory pool that is used to store execution plans for Transact-SQL batches, stored procedures, and triggers." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

    "A shared, reserved memory area that stores the most recently executed SQL statements or PL/SQL procedures (including triggers and functions). Also called SQL cache." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

    "The part of the SQL Server memory pool that is used to store execution plans for Transact-SQL batches, stored procedures, and triggers." (Microsoft, "SQL Server 2012 Glossary", 2012)

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.