Showing posts with label cache. Show all posts
Showing posts with label cache. Show all posts

21 June 2025

🏭🗒️Microsoft Fabric: Result Set Caching in SQL Analytics Endpoints [Notes] 🆕

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 21-Jun-2025

[Microsoft Fabric] Result Set Caching in SQL Analytics Endpoints

  • {def} built-in performance optimization for Warehouse and Lakehouse that improves read latency [1]
    • fully transparent to the user [3]
    • persists the final result sets for applicable SELECT T-SQL queries
      • caches all the data accessed by a query [3]
      • subsequent runs that "hit" cache will process just the final result set
        • can bypass complex compilation and data processing of the original query[1]
          • ⇐ returns subsequent queries faster [1]
      • the cache creation and reuse is applied opportunistically for queries
    • works on
      • warehouse tables
      • shortcuts to OneLake sources
      • shortcuts to non-Azure sources
    • the management of cache is handled automatically [1]
      • regularly evicts cache as needed
    • as data changes, result consistency is ensured by invalidating cache created earlier [1]
  • {operation} enable setting
    • via ALTER DATABASE <database_name> SET RESULT_SET_CACHING ON
  • {operation} validate setting
    • via SELECT name, is_result_set_caching_on FROM sys.databases
  • {operation} configure setting
    • configurable at item level
      • once enabled, it can then be disabled 
        • at the item level
        • for individual queries
          • e.g. debugging or A/B testing a query
        • via OPTION ( USE HINT ('DISABLE_RESULT_SET_CACHE') 
    • {default} during the preview, result set caching is off for all items [1]
  • [monitoring] 
    • via Message Output
      • applicable to Fabric Query editor, SSMS
      • the statement "Result set cache was used" is displayed after query execution if the query was able to use an existing result set cache
    • via queryinsights.exec_requests_history system view
      • result_cache_hit displays indicates result set cache usage for each query execution [1]
        • {value} 2: the query used result set cache (cache hit)
        • {value} 1: the query created result set cache
        • {value} 0: the query wasn't applicable for result set cache creation or usage [1]
          • {reason} the cache no longer exists
          • {reason} the cache was invalidated by a data change, disqualifying it for reuse [1]
          • {reason} query isn't deterministic
            • isn't eligible for cache creation [1]
          • {reason} query isn't a SELECT statement
  • [warehousing] 
    • {scenario} analytical queries that process large amounts of data to produce a relatively small result [1]
    • {scenario} workloads that trigger the same analytical queries repeatedly [1]
      • the same heavy computation can be triggered multiple times, even though the final result remains the same [1]

References:
[1] Microsoft Learn (2025) Result set caching (preview) [link]
[2] Microsoft Fabric Update Blog (2025) Result Set Caching for Microsoft Fabric Data Warehouse (Preview) [link|aka]
[3] Microsoft Learn (2025) In-memory and disk caching [link]
[4] Microsoft Learn (2025) Performance guidelines in Fabric Data Warehouse [link

Resources:
[R1] Microsoft Fabric (2025) Fabric Update - June 2025 [link]

Acronyms:
MF - Microsoft Fabric
SSMS - SQL Server Management Studio

17 March 2025

🏭🗒️Microsoft Fabric: Caching in Warehouse [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 17-Mar-2024

[Microsoft Fabric] Caching
  • {def} technique that improves the performance of data processing by storing frequently accessed data and metadata in a faster storage layer [1]
    • e.g. local memory, local SSD disk
    • ⇐ subsequent requests can be served faster, directly from the cache [1]
      • if a set of data has been previously accessed by a query, any subsequent queries will retrieve that data directly from the in-memory cache [1]
      • local memory operations are notably faster compared to fetching data from remote storage [1]
      • ⇐ significantly diminishes IO latency [1]
    • fully transparent to the user
    • consistently active and operates seamlessly in the background [1]
    • orchestrated and upheld by MF
      • it doesn't offer users the capability to manually clear the cache [1] 
    • provides transactional consistency
      • ensures that any modifications to the data in storage after it has been initially loaded into the in-memory cache, will result in consistent data [1]
    • when the cache reaches its capacity threshold and fresh data is being read for the first time, objects that have remained unused for the longest duration will be removed from the cache [1]
      • process is enacted to create space for the influx of new data and maintain an optimal cache utilization strategy [1] 
  • {type} in-memory cache
    • data in cache is organized in a compressed columnar format (aka columnar storage) [1]
      • ⇐ optimized for analytical queries
      • each column of data is stored separately [1]
        • {benefit} allows for better compression [1]
          • since similar data values are stored together [1]
        • {benefit} reduces the memory footprint
    • when queries need to perform operations on a specific column, the engine can work more efficiently speeding up queries' execution [1]
      • it doesn't have to process unnecessary data from other columns
      • can perform operations on multiple columns simultaneously [1]
        • taking advantage of modern multi-core processors [1]
    • when retrieves data from storage, the data is transformed from its original file-based format into highly optimized structures in in-memory cache [1]
    • {scenario} analytical workloads where queries involve scanning large amounts of data to perform aggregations, filtering, and other data manipulations [1]
  • {type} disk cache
    • complementary extension to the in-memory cache
    • any data loaded into the in-memory cache is also serialized to the SSD cache [1]
      • data removed from the in-memory cache remains within the SSD cache for an extended period
      • when subsequent query requests the data, it is retrieved from the SSD cache into the in-memory cache quicker [1]
  • {issue} cold run (aka cold cache) performance
    • the first 1-3 executions of a query perform noticeably slower than subsequent executions [2]
      • if the first run's performance is crucial, try manually creating statistics (aka pre-warming the cache) [2]
      • otherwise, one can rely on the automatic statistics generated in the first query run and leveraged in subsequent runs [2]
      • as long as underlying data does not change significantly [2]
  • differentiated from
    • [Kusto] caching policy [link]
    • [Apache Spark] intelligent cache [link]
    • [Power BI] query caching [link]
    • [Azure] caching [link]
    References:
    [1] Microsoft Learn (2024) Fabric: Caching in Fabric data warehousing [link]
    [2] Microsoft Learn (2024) Fabric Data Warehouse performance guidelines [link]
    [3] Sandeep Pawar (2023) Pre-Warming The Direct Lake Dataset For Warm Cache Import-Like Performance [link]

    Resources:
    [R1] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

    Acronyms:
    IO - Input/Output
    MF - Microsoft Fabric
    SSD - Solid State Drive

    13 June 2011

    🔹SQL Server: Procedure Cache (Definitions)

    "An area of SQL Server memory used to hold trees and query plans that have recently been used or are currently in use by SQL Server." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

    "A temporary storage location for the current, executing version of a specific stored procedure." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

    "The part of the SQL Server memory pool that is used to store execution plans for Transact-SQL batches, stored procedures, and triggers." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

    "A shared, reserved memory area that stores the most recently executed SQL statements or PL/SQL procedures (including triggers and functions). Also called SQL cache." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed., 2011)

    "The part of the SQL Server memory pool that is used to store execution plans for Transact-SQL batches, stored procedures, and triggers." (Microsoft, "SQL Server 2012 Glossary", 2012)

    Related Posts Plugin for WordPress, Blogger...

    About Me

    My photo
    Koeln, NRW, Germany
    IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.