Showing posts with label eventhouse. Show all posts
Showing posts with label eventhouse. Show all posts

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventhouses

  • [def] 
  • a service that empowers users to extract insights and visualize data in motion
    • offers an end-to-end solution for 
      • event-driven scenarios
        • ⇐ rather than schedule-driven solutions  [1]
    • a workspace of databases
      • can be shared across projects [1]
  • allows to manage multiple databases at once
    • sharing capacity and resources to optimize performance and cost
    • provides unified monitoring and management across all databases and per database [1]
  • provide a solution for handling and analyzing large volumes of data
    • particularly in scenarios requiring real-time analytics and exploration [1]
    • designed to handle real-time data streams efficiently [1]
      • lets organizations ingest, process, and analyze data in near real-time [1]
  • provide a scalable infrastructure that allows organizations to handle growing volumes of data, ensuring optimal performance and resource use.
    • preferred engine for semistructured and free text analysis
    • tailored to time-based, streaming events with structured, semistructured, and unstructured data [1]
    • allows to get data 
      • from multiple sources, 
      • in multiple pipelines
        • e.g. Eventstream, SDKs, Kafka, Logstash, data flows, etc.
      • multiple data formats [1]
    • data is automatically indexed and partitioned based on ingestion time
  • designed to optimize cost by suspending the service when not in use [1]
    • reactivating the service, can lead to a latency of a few seconds [1]
      • for highly time-sensitive systems that can't tolerate this latency, use Minimum consumption setting [1] 
        • enables the service to be always available at a selected minimum level [1]
          • customers pay for 
            • the minimum compute level selected [1]
            • the actual consumption when the compute level is above the minimum set [1]
        • the specified compute is available to all the databases within the eventhouse [1]
    • {scenario} solutions that includes event-based data
      • e.g. telemetry and log data, time series and IoT data, security and compliance logs, or financial records [1]
  • KQL databases 
    • can be created within an eventhouse [1]
    • can either be a standard database, or a database shortcut [1]
    • an exploratory query environment is created for each KQL Database, which can be used for exploration and data management [1]
    • data availability in OneLake can be enabled on a database or table level [1]
  • Eventhouse page 
    • serves as the central hub for all your interactions within the Eventhouse environment [1]
    • Eventhouse ribbon
      • provides quick access to essential actions within the Eventhouse
    • explorer pane
      • provides an intuitive interface for navigating between Eventhouse views and working with databases [1]
    • main view area 
      • displays the system overview details for the eventhouse [1]
  • {feature} Eventhouse monitoring
    • offers comprehensive insights into the usage and performance of the eventhouse by collecting end-to-end metrics and logs for all aspects of an Eventhouse [2]
    • part of workspace monitoring that allows you to monitor Fabric items in your workspace [2]
    • provides a set of tables that can be queried to get insights into the usage and performance of the eventhouse [2]
      • can be used to optimize the eventhouse and improve the user experience [2]
  • {feature} query logs table
    • contains the list of queries run on an Eventhouse KQL database
      • for each query, a log event record is stored in the EventhouseQueryLogs table [3]
    • can be used to
      • analyze query performance and trends [3]
      • troubleshoot slow queries [3]
      • identify heavy queries consuming large amount of system resources [3]
      • identify the users/applications running the highest number of queries[3]
  • {feature} OneLake availability
    • {benefit} allows to create one logical copy of a KQL database data in an eventhouse by turning on the feature [4]
      • users can query the data in the KQL database in Delta Lake format via other Fabric engines [4]
        • e.g. Direct Lake mode in Power BI, Warehouse, Lakehouse, Notebooks, etc.
    • {prerequisite} a workspace with a Microsoft Fabric-enabled capacity [4]
    • {prerequisite} a KQL database with editing permissions and data [4]
    • {constraint} rename tables
    • {constraint} alter table schemas
    • {constraint} apply RLS to tables
    • {constraint} data can't be deleted, truncated, or purged
    • when turned on, a mirroring policy is enabled
      • can be used to monitor data latency or alter it to partition delta tables [4]
  • {feature} robust adaptive mechanism
    • intelligently batches incoming data streams into one or more Parquet files, structured for analysis [4]
    • ⇐ important when dealing with trickling data [4]
      • ⇐ writing many small Parquet files into the lake can be inefficient resulting in higher costs and poor performance [4]
    • delays write operations if there isn't enough data to create optimal Parquet files [4]
      • ensures Parquet files are optimal in size and adhere to Delta Lake best practices [4]
      • ensures that the Parquet files are primed for analysis and balances the need for prompt data availability with cost and performance considerations [4]
      • {default} the write operation can take up to 3 hours or until files of sufficient size are created [4]
        • typically the files have 200-256 MB
        • the value can be adjusted between 5 minutes and 3 hours [4]
          • {warning} adjusting the delay to a shorter period might result in a suboptimal delta table with a large number of small files [4]
            • can lead to inefficient query performance [4]
        • {restriction} the resultant table in OneLake is read-only and can't be optimized after creation [4]
    • delta tables can be partitioned to improve query speed [4]
      • each partition is represented as a separate column using the PartitionName listed in the Partitions list [4]
        • ⇒ OneLake copy has more columns than the source table [4]
References:
[1] Microsoft Learn (2025) Microsoft Fabric: Eventhouse overview [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Eventhouse monitoring [link
[3] Microsoft Learn (2025) Microsoft Fabric: Query logs [link]  
[4] Microsoft Learn (2025) Microsoft Fabric: Eventhouse OneLake Availability [link]
[5] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Eventhouse Monitoring (Preview) [link]
[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
KQL - Kusto Query Language
SDK - Software Development Kit
RLS - Row Level Security 
RTI - Real-Time Intelligence

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Real-Time Intelligence (RTI) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Real-Time Intelligence [RTI]

  • [def]
    • {goal} provide a complete real-time SaaS platform within MF
      • {benefit} helps gain actionable insights from data, with the ability to ingest, transform, query, visualize, and act on it in real time [4]
    • {goal} provides a single place for data-in-motion
      • {benefit} allows to pull event streams from Real Time Hub
        • provides a single data estate for data in motion simplifying the ingestion, curation and processing of streaming data from Microsoft and external sources [4]
        • empowers users to extract insights and visualize data in motion [1]
    • {goal} enable rapid solution development
      • {benefit} provides a range of no-code, low-code and pro-code experiences for various scenarios [4]
        • everything from business insight discovery to complex stream processing, and application and model development [4]
    • {goal} enable real-time AI insights
      • {benefit} scales beyond human monitoring and drive actions with built in, automated capabilities [4]
        • allows anyone in the organization to take advantage of [4]
    • offers an end-to-end solution for 
      • event-driven scenarios
        • ⇐ rather than schedule-driven solutions. 
      • streaming data
      • data logs
    • {benefit} help customers accelerate speed and precision of business by providing [4]
      • {goal} operational efficiency
        • by allowing to streamline processes and make data driven decisions with accurate, up to date information [4]
      • {goal} end-to-end visibility
        • by allowing to gain a holistic understanding of business health and discover actionable insights for timely action [4]
      • {goal} competitive advantage
        • by allowing to quickly react to shifting market trends, identify opportunities and mitigate risk in real time [4]
    • seamlessly connects time-based data from various sources using no-code connectors [1]
      • enables immediate 
        • visual insights
        • geospatial analysis
        • trigger-based reactions 
        • ⇐ all are part of an organization-wide data catalog [1]
      • ⇐ time oriented data is difficult to manage, yet critical for success [4]
        • {challenge} capture high throughput data from disparate sources in real time [4]
        • {challenge} model scenarios using event data [4]
        • {challenge} choose from an array of bespoke technologies and data formats [4]
        • {challenge} leverage the power of AI against data in real time [4]
        • without the ability to leverage time oriented data, businesses are vulnerable to risks [4]
          • {risk} poor decision-making
          • {risk} financial loss
          • {risk} reduced operational efficiency
          • {risk} impaired data integrity
          • {risk} non-compliance
          • {risk} negative user experience
      • {capability} single unified SaaS solution
        • in opposition to a fragmented, fragile tech stack
        • allows to ingest & process all event sources, in any data format [4]
          • one can connect to diverse  streaming sources and leverage no code and low code experiences to process and route quickly [4]
            • via out of the box connectors for streaming and event data sources [4]
          • events can be routed to other Fabric and 3rd party entities [4]
          • organizational BI reports can be enhanced with enriched data [4]
        • allows to analyze and transform data event streams using queries and visual exploration to discover insights in real time [4]
          • one can manage an unlimited amount of data [4]
          • multiple databases can be monitored and managed at once [4]
        • allows to act quickly on top of data
          • via triggers and alerts on changing data to respond automatically and set action when specific conditions are detected [4]
            • helps drive actions on a per instance state that evolves over time [4]
            • helps to act on data without needing a deep schema and semantic modeling [4]
      • {capability} accessible data and analytics tools
        • in opposition to advanced skillsets required
      • {capability} real-time stream processing
        • in opposition to batch data processing
    • handles 
      • data ingestion
      • data transformation
      • data storage
      • data analytics
      • data visualization
      • data tracking
      • AI
      • real-time actions
    • can be used for 
      • data analysis
      • immediate visual insights
      • centralization of data in motion for an organization
      • actions on data
      • efficient querying, transformation, and storage of large volumes of structured or unstructured data [1]
  • helps evaluate data from 
    • IoT systems
    • system logs
    • free text
    • semi structured data, or contribute data for consumption by others in your organization, 
  • provides a versatile solution
    • transforms the data into a dynamic, actionable resource that drives value across the entire organization
  • its components are built on trusted, core Microsoft rather than schedule-driven solutions 
    • ⇐ together they extend the overall Fabric capabilities to provide event-driven solutions [1]
  • {feature} Real-Time hub
    • serves as a centralized catalog that facilitates the easy access, addition, exploration, and data sharing [1]
    • expands the range of data sources
      • ⇐ it enables broader insights and visual clarity across various domains [1]
    • ensures that data is accessible to all [1]
      • promoting quick decision-making and informed action
    • the sharing of streaming data from diverse sources unlocks the potential to build BI solutions across the organization [1]
    • use the data consumption tools to explore the data [1]
  • {feature} Real-Time dashboards 
    • come equipped with out-of-the-box interactions 
      • {benefit} simplify the process of understanding data, making it accessible to anyone who wants to make decision based on data in motion using visual tools, Natural Language and Copilot [1]
    • query the data in real-time as it’s being loaded [6]
      • every time a query is run, it leverages the latest data available in an Eventhouse or OneLake [6]
        • behave much like DirectQuery, but without the need to load data into a semantic model. [6]  
  • {feature} Activator
    • {benefit} allows to turn insights into actions by setting up alerts from various parts of Fabric to react to data patterns or conditions in real-time [1]
    • takes events as they are being processed into Eventstreams or Eventhouses and connects them to downstream systems to make data actionable [6]
  • {feature} Real-Time hub events 
    • a catalog of data in motionless
    • contains:
      • data streams 
        • all data streams that are actively running in Fabric to which the user has access to
        • once  a stream of data is connected, the entire SaaS solution becomes accessible [1]
      • Microsoft sources: 
        • easily discover streaming sources that the users have and quickly configure ingestion of those sources into Fabric
          • e.g. Azure Event Hubs, Azure IoT Hub, Azure SQL DB CDC, Azure Cosmos DB CDC, PostgreSQL DB CDC
      • Fabric events
        • event-driven capabilities support real-time notifications and data processing 
          • ⇒ one can monitor and react to events [1]
            • e.g. Fabric Workspace Item events, Azure Blob Storage events
          • ⇐ the events can be used to trigger other actions or workflows [1]
            • e.g. invoking a data pipeline or sending a notification via email. 
        • the events can be sent to other destinations via eventstreams [1]
  • {feature} Eventstreams
    • event processing capabilities 
      • ⇐ behave like event listeners that wait for data to be pushed to them [6]
    • {benefit} allow to capture, transform, and route high volumes of real-time events to various destinations with a no-code experience [1]
    • support multiple data sources and data destinations [1]
    • {benefit} allow to do filtering, data cleansing, transformation, windowed aggregations, and dupe detection, to land the data in the needed shape [1]
    • one can use the content-based routing capabilities to send data to different destinations based on filters [1]
    • derived eventstreams allows constructing new streams as a result of transformations and/or aggregations that can be shared to consumers in Real-Time hub [1]
  • {feature} Eventhouses
    • the ideal analytics engine to process data in motion
      •  scalable ingestion engine with the ability to handle up to millions of events per hour [6]
    • tailored to time-based, streaming events with structured, semi structured, and unstructured data [1]
    • data is automatically indexed and partitioned based on ingestion time
      • ⇐ provides fast and complex analytic querying capabilities on high-granularity data [1]
    • the stored data can be made available in OneLake for consumption by other Fabric experiences [1]
      • ⇐ the data is ready for lightning-fast query using various code, low-code, or no-code options in Fabric [1]
    • the data can be queried in native KQL or in T-SQL in the KQL query set [1]
References:
[1] Microsoft Fabric (2024) What is Real-Time Intelligence? [link]
[2] Microsoft Fabric (2024) Real-Time Intelligence documentation in Microsoft Fabric [link
[3] Microsoft Fabric Updates Blog (2024) Fabric workloads are now generally available! [link]
[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]
[5] Microsoft Fabric Community (2024) Benefits of Migrating to Fabric RTI [link]
[6] Microsoft Fabric Update Blog (2025) Operational Reporting with Microsoft Fabric Real-Time Intelligence [link]
[7] Microsoft Learn (2025) Get started with Real-Time Intelligence in Microsoft Fabric [link]
[8] Microsoft Learn (2025) Implement Real-Time Intelligence with Microsoft Fabric [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Learn (2024) Microsoft Fabric RTI Demo Application [link] [GitHub]
[R3] Microsoft Fabric Updates Blog (2024) Understanding Real-Time Intelligence usage reporting and billing [link]
[R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
AI - Artificial Intelligence
CDC - Change Data Capture
DB - database
IoT - Internet of Things
KQL  - Kusto Query Language
MF - Microsoft Fabric
RTI - Real-Time Intelligence
SaaS - Software-as-a-Service
SQL - Structured Query Language

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

Business Intelligence Series
Business Intelligence Series

Microsoft made available a reference guide for the data stores supported for Microsoft Fabric workloads [1], including the new Fabric SQL database (see previous post). Here's the consolidated table followed by a few aspects to consider: 

Area Lakehouse Warehouse Eventhouse Fabric SQL database Power BI Datamart
Data volume Unlimited Unlimited Unlimited 4 TB Up to 100 GB
Type of data Unstructured, semi-structured, structured Structured, semi-structured (JSON) Unstructured, semi-structured, structured Structured, semi-structured, unstructured Structured
Primary developer persona Data engineer, data scientist Data warehouse developer, data architect, data engineer, database developer App developer, data scientist, data engineer AI developer, App developer, database developer, DB admin Data scientist, data analyst
Primary dev skill Spark (Scala, PySpark, Spark SQL, R) SQL No code, KQL, SQL SQL No code, SQL
Data organized by Folders and files, databases, and tables Databases, schemas, and tables Databases, schemas, and tables Databases, schemas, tables Database, tables, queries
Read operations Spark, T-SQL T-SQL, Spark* KQL, T-SQL, Spark T-SQL Spark, T-SQL
Write operations Spark (Scala, PySpark, Spark SQL, R) T-SQL KQL, Spark, connector ecosystem T-SQL Dataflows, T-SQL
Multi-table transactions No Yes Yes, for multi-table ingestion Yes, full ACID compliance No
Primary development interface Spark notebooks, Spark job definitions SQL scripts KQL Queryset, KQL Database SQL scripts Power BI
Security RLS, CLS**, table level (T-SQL), none for Spark Object level, RLS, CLS, DDL/DML, dynamic data masking RLS Object level, RLS, CLS, DDL/DML, dynamic data masking Built-in RLS editor
Access data via shortcuts Yes Yes Yes Yes No
Can be a source for shortcuts Yes (files and tables) Yes (tables) Yes Yes (tables) No
Query across items Yes Yes Yes Yes No
Advanced analytics Interface for large-scale data processing, built-in data parallelism, and fault tolerance Interface for large-scale data processing, built-in data parallelism, and fault tolerance Time Series native elements, full geo-spatial and query capabilities T-SQL analytical capabilities, data replicated to delta parquet in OneLake for analytics Interface for data processing with automated performance tuning
Advanced formatting support Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format Full indexing for free text and semi-structured data like JSON Table support for OLTP, JSON, vector, graph, XML, spatial, key-value Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format
Ingestion latency Available instantly for querying Available instantly for querying Queued ingestion, streaming ingestion has a couple of seconds latency Available instantly for querying Available instantly for querying

It can be used as a map for what is needed to know for using each feature, respectively to identify how one can use the previous experience, and here I'm referring to the many SQL developers. One must consider also the capabilities and limitations of each storage repository.

However, what I'm missing is some references regarding the performance for data access, especially compared with on-premise workloads. Moreover, the devil hides in details, therefore one must test thoroughly before committing to any of the above choices. For the newest overview please check the referenced documentation!

For lakehouses, the hardest limitation is the lack of multi-table transactions, though that's understandable given its scope. However, probably the most important aspect is whether it can scale with the volume of reads/writes as currently the SQL endpoint seems to lag. 

The warehouse seems to be more versatile, though careful attention needs to be given to its design. 

The Eventhouse opens the door to a wide range of time-based scenarios, though it will be interesting how developers cope with its lack of functionality in some areas. 

Fabric SQL databases are a new addition, and hopefully they'll allow considering a wide range of OLTP scenarios. Starting with 28th of March 2025, SQL databases will be ON by default and tenant admins must manually turn them OFF before the respective date [3].

Power BI datamarts have been in preview for a couple of years.


References:
[1] Microsoft Fabric (2024) Microsoft Fabric decision guide: choose a data store [link]
[2] Reitse's blog (2024) Testing Microsoft Fabric Capacity: Data Warehouse vs Lakehouse Performance [link]
[3] Microsoft Fabric Update Blog (2025) Extending flexibility: default checkbox changes on tenant settings for SQL database in Fabric [link]
[4] Microsoft Fabric Update Blog (2025) Enhancing SQL database in Fabric: share your feedback and shape the future [link]
[5] Microsoft Fabric Update Blog (2025) Why SQL database in Fabric is the best choice for low-code/no-code Developers [link
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.