09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventhouses

  • [def] 
  • a service that empowers users to extract insights and visualize data in motion
    • offers an end-to-end solution for 
      • event-driven scenarios
        • ⇐ rather than schedule-driven solutions  [1]
    • a workspace of databases
      • can be shared across projects [1]
  • allows to manage multiple databases at once
    • sharing capacity and resources to optimize performance and cost
    • provides unified monitoring and management across all databases and per database [1]
  • provide a solution for handling and analyzing large volumes of data
    • particularly in scenarios requiring real-time analytics and exploration [1]
    • designed to handle real-time data streams efficiently [1]
      • lets organizations ingest, process, and analyze data in near real-time [1]
  • provide a scalable infrastructure that allows organizations to handle growing volumes of data, ensuring optimal performance and resource use.
    • preferred engine for semistructured and free text analysis
    • tailored to time-based, streaming events with structured, semistructured, and unstructured data [1]
    • allows to get data 
      • from multiple sources, 
      • in multiple pipelines
        • e.g. Eventstream, SDKs, Kafka, Logstash, data flows, etc.
      • multiple data formats [1]
    • data is automatically indexed and partitioned based on ingestion time
  • designed to optimize cost by suspending the service when not in use [1]
    • reactivating the service, can lead to a latency of a few seconds [1]
      • for highly time-sensitive systems that can't tolerate this latency, use Minimum consumption setting [1] 
        • enables the service to be always available at a selected minimum level [1]
          • customers pay for 
            • the minimum compute level selected [1]
            • the actual consumption when the compute level is above the minimum set [1]
        • the specified compute is available to all the databases within the eventhouse [1]
    • {scenario} solutions that includes event-based data
      • e.g. telemetry and log data, time series and IoT data, security and compliance logs, or financial records [1]
  • KQL databases 
    • can be created within an eventhouse [1]
    • can either be a standard database, or a database shortcut [1]
    • an exploratory query environment is created for each KQL Database, which can be used for exploration and data management [1]
    • data availability in OneLake can be enabled on a database or table level [1]
  • Eventhouse page 
    • serves as the central hub for all your interactions within the Eventhouse environment [1]
    • Eventhouse ribbon
      • provides quick access to essential actions within the Eventhouse
    • explorer pane
      • provides an intuitive interface for navigating between Eventhouse views and working with databases [1]
    • main view area 
      • displays the system overview details for the eventhouse [1]
  • {feature} Eventhouse monitoring
    • offers comprehensive insights into the usage and performance of the eventhouse by collecting end-to-end metrics and logs for all aspects of an Eventhouse [2]
    • part of workspace monitoring that allows you to monitor Fabric items in your workspace [2]
    • provides a set of tables that can be queried to get insights into the usage and performance of the eventhouse [2]
      • can be used to optimize the eventhouse and improve the user experience [2]
  • {feature} query logs table
    • contains the list of queries run on an Eventhouse KQL database
      • for each query, a log event record is stored in the EventhouseQueryLogs table [3]
    • can be used to
      • analyze query performance and trends [3]
      • troubleshoot slow queries [3]
      • identify heavy queries consuming large amount of system resources [3]
      • identify the users/applications running the highest number of queries[3]
  • {feature} OneLake availability
    • {benefit} allows to create one logical copy of a KQL database data in an eventhouse by turning on the feature [4]
      • users can query the data in the KQL database in Delta Lake format via other Fabric engines [4]
        • e.g. Direct Lake mode in Power BI, Warehouse, Lakehouse, Notebooks, etc.
    • {prerequisite} a workspace with a Microsoft Fabric-enabled capacity [4]
    • {prerequisite} a KQL database with editing permissions and data [4]
    • {constraint} rename tables
    • {constraint} alter table schemas
    • {constraint} apply RLS to tables
    • {constraint} data can't be deleted, truncated, or purged
    • when turned on, a mirroring policy is enabled
      • can be used to monitor data latency or alter it to partition delta tables [4]
  • {feature} robust adaptive mechanism
    • intelligently batches incoming data streams into one or more Parquet files, structured for analysis [4]
    • ⇐ important when dealing with trickling data [4]
      • ⇐ writing many small Parquet files into the lake can be inefficient resulting in higher costs and poor performance [4]
    • delays write operations if there isn't enough data to create optimal Parquet files [4]
      • ensures Parquet files are optimal in size and adhere to Delta Lake best practices [4]
      • ensures that the Parquet files are primed for analysis and balances the need for prompt data availability with cost and performance considerations [4]
      • {default} the write operation can take up to 3 hours or until files of sufficient size are created [4]
        • typically the files have 200-256 MB
        • the value can be adjusted between 5 minutes and 3 hours [4]
          • {warning} adjusting the delay to a shorter period might result in a suboptimal delta table with a large number of small files [4]
            • can lead to inefficient query performance [4]
        • {restriction} the resultant table in OneLake is read-only and can't be optimized after creation [4]
    • delta tables can be partitioned to improve query speed [4]
      • each partition is represented as a separate column using the PartitionName listed in the Partitions list [4]
        • ⇒ OneLake copy has more columns than the source table [4]
References:
[1] Microsoft Learn (2025) Microsoft Fabric: Eventhouse overview [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Eventhouse monitoring [link
[3] Microsoft Learn (2025) Microsoft Fabric: Query logs [link]  
[4] Microsoft Learn (2025) Microsoft Fabric: Eventhouse OneLake Availability [link]
[5] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

Acronyms:
KQL - Kusto Query Language
SDK - Software Development Kit
RLS - Row Level Security 
RTI - Real-Time Intelligence

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.