SQL Troubles: eventhouse

Showing posts with label eventhouse. Show all posts

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Mar-2025

Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventhouses

[def]
a service that empowers users to extract insights and visualize data in motion

offers an end-to-end solution for

event-driven scenarios

⇐ rather than schedule-driven solutions [1]

a workspace of databases

can be shared across projects [1]

allows to manage multiple databases at once

sharing capacity and resources to optimize performance and cost
provides unified monitoring and management across all databases and per database [1]

provide a solution for handling and analyzing large volumes of data

particularly in scenarios requiring real-time analytics and exploration [1]
designed to handle real-time data streams efficiently [1]

lets organizations ingest, process, and analyze data in near real-time [1]

provide a scalable infrastructure that allows organizations to handle growing volumes of data, ensuring optimal performance and resource use.

preferred engine for semistructured and free text analysis
tailored to time-based, streaming events with structured, semistructured, and unstructured data [1]
allows to get data

from multiple sources,
in multiple pipelines

e.g. Eventstream, SDKs, Kafka, Logstash, data flows, etc.

multiple data formats [1]

data is automatically indexed and partitioned based on ingestion time

designed to optimize cost by suspending the service when not in use [1]

reactivating the service, can lead to a latency of a few seconds [1]

for highly time-sensitive systems that can't tolerate this latency, use Minimum consumption setting [1]

enables the service to be always available at a selected minimum level [1]

customers pay for

the minimum compute level selected [1]
the actual consumption when the compute level is above the minimum set [1]

the specified compute is available to all the databases within the eventhouse [1]

{scenario} solutions that includes event-based data

e.g. telemetry and log data, time series and IoT data, security and compliance logs, or financial records [1]

KQL databases

can be created within an eventhouse [1]
can either be a standard database, or a database shortcut [1]
an exploratory query environment is created for each KQL Database, which can be used for exploration and data management [1]
data availability in OneLake can be enabled on a database or table level [1]

Eventhouse page

serves as the central hub for all your interactions within the Eventhouse environment [1]
Eventhouse ribbon

provides quick access to essential actions within the Eventhouse

explorer pane

provides an intuitive interface for navigating between Eventhouse views and working with databases [1]

main view area

displays the system overview details for the eventhouse [1]

{feature} Eventhouse monitoring

offers comprehensive insights into the usage and performance of the eventhouse by collecting end-to-end metrics and logs for all aspects of an Eventhouse [2]
part of workspace monitoring that allows you to monitor Fabric items in your workspace [2]
provides a set of tables that can be queried to get insights into the usage and performance of the eventhouse [2]

can be used to optimize the eventhouse and improve the user experience [2]

{feature} query logs table

contains the list of queries run on an Eventhouse KQL database

for each query, a log event record is stored in the EventhouseQueryLogs table [3]

can be used to

analyze query performance and trends [3]
troubleshoot slow queries [3]
identify heavy queries consuming large amount of system resources [3]
identify the users/applications running the highest number of queries[3]

{feature} OneLake availability

{benefit} allows to create one logical copy of a KQL database data in an eventhouse by turning on the feature [4]

users can query the data in the KQL database in Delta Lake format via other Fabric engines [4]

e.g. Direct Lake mode in Power BI, Warehouse, Lakehouse, Notebooks, etc.

{prerequisite} a workspace with a Microsoft Fabric-enabled capacity [4]
{prerequisite} a KQL database with editing permissions and data [4]
{constraint} rename tables
{constraint} alter table schemas
{constraint} apply RLS to tables
{constraint} data can't be deleted, truncated, or purged
when turned on, a mirroring policy is enabled

can be used to monitor data latency or alter it to partition delta tables [4]

{feature} robust adaptive mechanism

intelligently batches incoming data streams into one or more Parquet files, structured for analysis [4]
⇐ important when dealing with trickling data [4]

⇐ writing many small Parquet files into the lake can be inefficient resulting in higher costs and poor performance [4]

delays write operations if there isn't enough data to create optimal Parquet files [4]

ensures Parquet files are optimal in size and adhere to Delta Lake best practices [4]
ensures that the Parquet files are primed for analysis and balances the need for prompt data availability with cost and performance considerations [4]
{default} the write operation can take up to 3 hours or until files of sufficient size are created [4]

typically the files have 200-256 MB
the value can be adjusted between 5 minutes and 3 hours [4]

{warning} adjusting the delay to a shorter period might result in a suboptimal delta table with a large number of small files [4]

can lead to inefficient query performance [4]

{restriction} the resultant table in OneLake is read-only and can't be optimized after creation [4]

delta tables can be partitioned to improve query speed [4]

each partition is represented as a separate column using the PartitionName listed in the Partitions list [4]

⇒ OneLake copy has more columns than the source table [4]

Previous Post <<||>> Next Post

References:
[1] Microsoft Learn (2025) Microsoft Fabric: Eventhouse overview [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Eventhouse monitoring [link]

[3] Microsoft Learn (2025) Microsoft Fabric: Query logs [link]

[4] Microsoft Learn (2025) Microsoft Fabric: Eventhouse OneLake Availability [link]

[5] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Eventhouse Monitoring (Preview) [link]

[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
KQL - Kusto Query Language
SDK - Software Development Kit
RLS - Row Level Security
RTI - Real-Time Intelligence

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Real-Time Intelligence (RTI) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)!

Last updated: 9-Mar-2025

Real-Time Intelligence architecture [4]

[Microsoft Fabric] Real-Time Intelligence [RTI]

[def]

{goal} provide a complete real-time SaaS platform within MF

{benefit} helps gain actionable insights from data, with the ability to ingest, transform, query, visualize, and act on it in real time [4]

{goal} provides a single place for data-in-motion

{benefit} allows to pull event streams from Real Time Hub

provides a single data estate for data in motion simplifying the ingestion, curation and processing of streaming data from Microsoft and external sources [4]
empowers users to extract insights and visualize data in motion [1]

{goal} enable rapid solution development

{benefit} provides a range of no-code, low-code and pro-code experiences for various scenarios [4]

everything from business insight discovery to complex stream processing, and application and model development [4]

{goal} enable real-time AI insights

{benefit} scales beyond human monitoring and drive actions with built in, automated capabilities [4]

allows anyone in the organization to take advantage of [4]

offers an end-to-end solution for

event-driven scenarios

⇐ rather than schedule-driven solutions.

streaming data
data logs

{benefit} help customers accelerate speed and precision of business by providing [4]

{goal} operational efficiency

by allowing to streamline processes and make data driven decisions with accurate, up to date information [4]

{goal} end-to-end visibility

by allowing to gain a holistic understanding of business health and discover actionable insights for timely action [4]

{goal} competitive advantage

by allowing to quickly react to shifting market trends, identify opportunities and mitigate risk in real time [4]

seamlessly connects time-based data from various sources using no-code connectors [1]

enables immediate

visual insights
geospatial analysis
trigger-based reactions
⇐ all are part of an organization-wide data catalog [1]

⇐ time oriented data is difficult to manage, yet critical for success [4]

{challenge} capture high throughput data from disparate sources in real time [4]
{challenge} model scenarios using event data [4]
{challenge} choose from an array of bespoke technologies and data formats [4]
{challenge} leverage the power of AI against data in real time [4]
without the ability to leverage time oriented data, businesses are vulnerable to risks [4]

{risk} poor decision-making
{risk} financial loss
{risk} reduced operational efficiency
{risk} impaired data integrity
{risk} non-compliance
{risk} negative user experience

{capability} single unified SaaS solution

in opposition to a fragmented, fragile tech stack
allows to ingest & process all event sources, in any data format [4]

one can connect to diverse streaming sources and leverage no code and low code experiences to process and route quickly [4]

via out of the box connectors for streaming and event data sources [4]

events can be routed to other Fabric and 3rd party entities [4]
organizational BI reports can be enhanced with enriched data [4]

allows to analyze and transform data event streams using queries and visual exploration to discover insights in real time [4]

one can manage an unlimited amount of data [4]
multiple databases can be monitored and managed at once [4]

allows to act quickly on top of data

via triggers and alerts on changing data to respond automatically and set action when specific conditions are detected [4]

helps drive actions on a per instance state that evolves over time [4]
helps to act on data without needing a deep schema and semantic modeling [4]

{capability} accessible data and analytics tools

in opposition to advanced skillsets required

{capability} real-time stream processing

in opposition to batch data processing

handles

data ingestion
data transformation
data storage
data analytics
data visualization
data tracking
AI
real-time actions

can be used for

data analysis
immediate visual insights
centralization of data in motion for an organization
actions on data
efficient querying, transformation, and storage of large volumes of structured or unstructured data [1]

helps evaluate data from

IoT systems
system logs
free text
semi structured data, or contribute data for consumption by others in your organization,

provides a versatile solution

transforms the data into a dynamic, actionable resource that drives value across the entire organization

its components are built on trusted, core Microsoft rather than schedule-driven solutions

⇐ together they extend the overall Fabric capabilities to provide event-driven solutions [1]

{feature} Real-Time hub

serves as a centralized catalog that facilitates the easy access, addition, exploration, and data sharing [1]
expands the range of data sources

⇐ it enables broader insights and visual clarity across various domains [1]

ensures that data is accessible to all [1]

promoting quick decision-making and informed action

the sharing of streaming data from diverse sources unlocks the potential to build BI solutions across the organization [1]
use the data consumption tools to explore the data [1]

{feature} Real-Time dashboards

come equipped with out-of-the-box interactions

{benefit} simplify the process of understanding data, making it accessible to anyone who wants to make decision based on data in motion using visual tools, Natural Language and Copilot [1]

query the data in real-time as it’s being loaded [6]

every time a query is run, it leverages the latest data available in an Eventhouse or OneLake [6]

behave much like DirectQuery, but without the need to load data into a semantic model. [6]

{feature} Activator

{benefit} allows to turn insights into actions by setting up alerts from various parts of Fabric to react to data patterns or conditions in real-time [1]
takes events as they are being processed into Eventstreams or Eventhouses and connects them to downstream systems to make data actionable [6]

{feature} Real-Time hub events

a catalog of data in motionless
contains:

data streams

all data streams that are actively running in Fabric to which the user has access to
once a stream of data is connected, the entire SaaS solution becomes accessible [1]

Microsoft sources:

easily discover streaming sources that the users have and quickly configure ingestion of those sources into Fabric

e.g. Azure Event Hubs, Azure IoT Hub, Azure SQL DB CDC, Azure Cosmos DB CDC, PostgreSQL DB CDC

Fabric events

event-driven capabilities support real-time notifications and data processing

⇒ one can monitor and react to events [1]

e.g. Fabric Workspace Item events, Azure Blob Storage events

⇐ the events can be used to trigger other actions or workflows [1]

e.g. invoking a data pipeline or sending a notification via email.

the events can be sent to other destinations via eventstreams [1]

{feature} Eventstreams

event processing capabilities

⇐ behave like event listeners that wait for data to be pushed to them [6]

{benefit} allow to capture, transform, and route high volumes of real-time events to various destinations with a no-code experience [1]
support multiple data sources and data destinations [1]
{benefit} allow to do filtering, data cleansing, transformation, windowed aggregations, and dupe detection, to land the data in the needed shape [1]
one can use the content-based routing capabilities to send data to different destinations based on filters [1]
derived eventstreams allows constructing new streams as a result of transformations and/or aggregations that can be shared to consumers in Real-Time hub [1]

{feature} Eventhouses

the ideal analytics engine to process data in motion

scalable ingestion engine with the ability to handle up to millions of events per hour [6]

tailored to time-based, streaming events with structured, semi structured, and unstructured data [1]
data is automatically indexed and partitioned based on ingestion time

⇐ provides fast and complex analytic querying capabilities on high-granularity data [1]

the stored data can be made available in OneLake for consumption by other Fabric experiences [1]

⇐ the data is ready for lightning-fast query using various code, low-code, or no-code options in Fabric [1]

the data can be queried in native KQL or in T-SQL in the KQL query set [1]

Previous Post <<||>> Next Post

References:

[1] Microsoft Fabric (2024) What is Real-Time Intelligence? [link]
[2] Microsoft Fabric (2024) Real-Time Intelligence documentation in Microsoft Fabric [link]

[3] Microsoft Fabric Updates Blog (2024) Fabric workloads are now generally available! [link]

[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]
[5] Microsoft Fabric Community (2024) Benefits of Migrating to Fabric RTI [link]
[6] Microsoft Fabric Update Blog (2025) Operational Reporting with Microsoft Fabric Real-Time Intelligence [link]
[7] Microsoft Learn (2025) Get started with Real-Time Intelligence in Microsoft Fabric [link]
[8] Microsoft Learn (2025) Implement Real-Time Intelligence with Microsoft Fabric [link]

Resources:

[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Learn (2024) Microsoft Fabric RTI Demo Application [link] [GitHub]
[R3] Microsoft Fabric Updates Blog (2024) Understanding Real-Time Intelligence usage reporting and billing [link]

[R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
AI - Artificial Intelligence
CDC - Change Data Capture
DB - database
IoT - Internet of Things
KQL - Kusto Query Language
MF - Microsoft Fabric
RTI - Real-Time Intelligence
SaaS - Software-as-a-Service
SQL - Structured Query Language

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

Business Intelligence Series

Microsoft made available a reference guide for the data stores supported for Microsoft Fabric workloads [1], including the new Fabric SQL database (see previous post). Here's the consolidated table followed by a few aspects to consider:

Area	Lakehouse	Warehouse	Eventhouse	Fabric SQL database	Power BI Datamart
Data volume	Unlimited	Unlimited	Unlimited	4 TB	Up to 100 GB
Type of data	Unstructured, semi-structured, structured	Structured, semi-structured (JSON)	Unstructured, semi-structured, structured	Structured, semi-structured, unstructured	Structured
Primary developer persona	Data engineer, data scientist	Data warehouse developer, data architect, data engineer, database developer	App developer, data scientist, data engineer	AI developer, App developer, database developer, DB admin	Data scientist, data analyst
Primary dev skill	Spark (Scala, PySpark, Spark SQL, R)	SQL	No code, KQL, SQL	SQL	No code, SQL
Data organized by	Folders and files, databases, and tables	Databases, schemas, and tables	Databases, schemas, and tables	Databases, schemas, tables	Database, tables, queries
Read operations	Spark, T-SQL	T-SQL, Spark*	KQL, T-SQL, Spark	T-SQL	Spark, T-SQL
Write operations	Spark (Scala, PySpark, Spark SQL, R)	T-SQL	KQL, Spark, connector ecosystem	T-SQL	Dataflows, T-SQL
Multi-table transactions	No	Yes	Yes, for multi-table ingestion	Yes, full ACID compliance	No
Primary development interface	Spark notebooks, Spark job definitions	SQL scripts	KQL Queryset, KQL Database	SQL scripts	Power BI
Security	RLS, CLS**, table level (T-SQL), none for Spark	Object level, RLS, CLS, DDL/DML, dynamic data masking	RLS	Object level, RLS, CLS, DDL/DML, dynamic data masking	Built-in RLS editor
Access data via shortcuts	Yes	Yes	Yes	Yes	No
Can be a source for shortcuts	Yes (files and tables)	Yes (tables)	Yes	Yes (tables)	No
Query across items	Yes	Yes	Yes	Yes	No
Advanced analytics	Interface for large-scale data processing, built-in data parallelism, and fault tolerance	Interface for large-scale data processing, built-in data parallelism, and fault tolerance	Time Series native elements, full geo-spatial and query capabilities	T-SQL analytical capabilities, data replicated to delta parquet in OneLake for analytics	Interface for data processing with automated performance tuning
Advanced formatting support	Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format	Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format	Full indexing for free text and semi-structured data like JSON	Table support for OLTP, JSON, vector, graph, XML, spatial, key-value	Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format
Ingestion latency	Available instantly for querying	Available instantly for querying	Queued ingestion, streaming ingestion has a couple of seconds latency	Available instantly for querying	Available instantly for querying

It can be used as a map for what is needed to know for using each feature, respectively to identify how one can use the previous experience, and here I'm referring to the many SQL developers. One must consider also the capabilities and limitations of each storage repository.

However, what I'm missing is some references regarding the performance for data access, especially compared with on-premise workloads. Moreover, the devil hides in details, therefore one must test thoroughly before committing to any of the above choices. For the newest overview please check the referenced documentation!

For lakehouses, the hardest limitation is the lack of multi-table transactions, though that's understandable given its scope. However, probably the most important aspect is whether it can scale with the volume of reads/writes as currently the SQL endpoint seems to lag.

The warehouse seems to be more versatile, though careful attention needs to be given to its design.

The Eventhouse opens the door to a wide range of time-based scenarios, though it will be interesting how developers cope with its lack of functionality in some areas.

Fabric SQL databases are a new addition, and hopefully they'll allow considering a wide range of OLTP scenarios. Starting with 28th of March 2025, SQL databases will be ON by default and tenant admins must manually turn them OFF before the respective date [3].

Power BI datamarts have been in preview for a couple of years.

Previous Post <<||>> Next Post

References:
[1] Microsoft Fabric (2024) Microsoft Fabric decision guide: choose a data store [link]

[2] Reitse's blog (2024) Testing Microsoft Fabric Capacity: Data Warehouse vs Lakehouse Performance [link]

[3] Microsoft Fabric Update Blog (2025) Extending flexibility: default checkbox changes on tenant settings for SQL database in Fabric [link]

[4] Microsoft Fabric Update Blog (2025) Enhancing SQL database in Fabric: share your feedback and shape the future [link]

[5] Microsoft Fabric Update Blog (2025) Why SQL database in Fabric is the best choice for low-code/no-code Developers [link]

SQL Troubles

Pages

09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Real-Time Intelligence (RTI) [Notes]

18 December 2024

🧭🏭Business Intelligence: Microsoft Fabric (Part VII: Data Stores Comparison)

About Me