Showing posts with label CDC. Show all posts
Showing posts with label CDC. Show all posts

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventstreams [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 8-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventstream(s)

  • {def} feature in Microsoft Fabric's Real-Time Intelligence experience, that allows to bring real-time events into Fabric
    • bring real-time events into Fabric, transform them, and then route them to various destinations without writing any code 
      • ⇐ aka no-code solution
      • {feature} drag and drop experience 
        • gives users an intuitive and easy way to create your event data processing, transforming, and routing logic without writing any code 
    • work by creating a pipeline of events from multiple internal and external sources to different destinations
      • a conveyor belt that moves data from one place to another [1]
      • transformations to the data can be added along the way [1]
        • filtering, aggregating, or enriching
  • {def} eventstream
    • an instance of the Eventstream item in Fabric [2]
    • {feature} end-to-end data flow diagram 
      • provide a comprehensive understanding of the data flow and organization [2].
  • {feature} eventstream visual editor
    • used to design pipelines by dragging and dropping different nodes [1]
    • sources
      • where event data comes from 
      • one can choose
        • the source type
        • the data format
        • the consumer group
      • Azure Event Hubs
        • allows to get event data from an Azure event hub [1]
        • allows to create a cloud connection  with the appropriate authentication and privacy level [1]
      • Azure IoT Hub
        • SaaS service used to connect, monitor, and manage IoT assets with a no-code experience [1]
      • CDC-enabled databases
        • software process that identifies and tracks changes to data in a database, enabling real-time or near-real-time data movement [1]
        • Azure SQL Database 
        • PostgreSQL Database
        • MySQL Database
        • Azure Cosmos DB
      • Google Cloud Pub/Sub
        • messaging service for exchanging event data among applications and services [1]
      • Amazon Kinesis Data Streams
        • collect, process, and analyze real-time, streaming data [1]
      • Confluent Cloud Kafka
        • fully managed service based on Apache Kafka for stream processing [1]
      • Fabric workspace events
        • events triggered by changes in Fabric Workspace
          • e.g. creating, updating, or deleting items. 
        • allows to capture, transform, and route events for in-depth analysis and monitoring within Fabric [1]
        • the integration offers enhanced flexibility in tracking and understanding workspace activities [1]
      • Azure blob storage events
        • system triggers for actions like creating, replacing, or deleting a blob [1]
          • these actions are linked to Fabric events
            • allowing to process Blob Storage events as continuous data streams for routing and analysis within Fabric  [1]
        • support streamed or unstreamed events [1]
      • custom endpoint
        • REST API or SDKs can be used to send event data from custom app to eventstream [1]
        • allows to specify the data format and the consumer group of the custom app [1]
      • sample data
        • out-of-box sample data
    • destinations
      • where transformed event data is stored. 
        • in a table in an eventhouse or a lakehouse [1]
        • redirect data to 
          • another eventstream for further processing [1]
          • an activator to trigger an action [1]
      • Eventhouse
        • offers the capability to funnel your real-time event data into a KQL database [1]
      • Lakehouse
        • allows to preprocess real-time events before their ingestion in the lakehouse
          • the events are transformed into Delta Lake format and later stored in specific lakehouse tables [1]
            • facilitating the data warehousing needs [1]
      • custom endpoint
        • directs real-time event traffic to a bespoke application [1]
        • enables the integration of proprietary applications with the event stream, allowing for the immediate consumption of event data [1]
        • {scenario} aim to transfer real-time data to an independent system not hosted on the Microsoft Fabric [1]
      • Derived Stream
        • specialized destination created post-application of stream operations like Filter or Manage Fields to an eventstream
        • represents the altered default stream after processing, which can be routed to various destinations within Fabric and monitored in the Real-Time hub [1]
      • Fabric Activator
        • enables to use Fabric Activator to trigger automated actions based on values in streaming data [1]
    • transformations
      • filter or aggregate the data as is processed from the stream [1]
      • include common data operations
        • filtering
          • filter events based on the value of a field in the input
          • depending on the data type (number or text), the transformation keeps the values that match the selected condition, such as is null or is not null [1]
        • joining
          • transformation that combines data from two streams based on a matching condition between them [1]
        • aggregating
          • calculates an aggregation every time a new event occurs over a period of time [1]
            • Sum, Minimum, Maximum, or Average
          • allows renaming calculated columns, and filtering or slicing the aggregation based on other dimensions in your data [1]
          • one can have one or more aggregations in the same transformation [1]
        • grouping
          • allows to calculate aggregations across all events within a certain time window [1]
            • one can group by the values in one or more fields [1]
          • allows for the renaming of columns
            • similar to the Aggregate transformation 
            • ⇐ provides more options for aggregation and includes more complex options for time windows [1]
          • allows to add more than one aggregation per transformation [1]
            • allows to define the logic needed for processing, transforming, and routing event data [1]
        • union
          • allows to connect two or more nodes and add events with shared fields (with the same name and data type) into one table [1]
            • fields that don't match are dropped and not included in the output [1]
        • expand
          • array transformation that allows to create a new row for each value within an array [1]
        • manage fields
          • allows to add, remove, change data type, or rename fields coming in from an input or another transformation [1]
        • temporal windowing functions 
          • enable to analyze data events within discrete time periods [1]
          • way to perform operations on the data contained in temporal windows [1]
            • e.g. aggregating, filtering, or transforming streaming events that occur within a specified time period [1]
            • allow analyzing streaming data that changes over time [1]
              • e.g. sensor readings, web-clicks, on-line transactions, etc.
              • provide great flexibility to keep an accurate record of events as they occur [1]
          • {type} tumbling windows
            • divides incoming events into fixed and nonoverlapping intervals based on arrival time [1]
          • {type} sliding windows 
            • take the events into fixed and overlapping intervals based on time and divides them [1]
          • {type} session windows 
            • divides events into variable and nonoverlapping intervals that are based on a gap of lack of activity [1]
          • {type} hopping windows
            • are different from tumbling windows as they model scheduled overlapping window [1]
          • {type} snapshot windows 
            • group event stream events that have the same timestamp and are unlike the other windowing functions, which require the function to be named [1]
            • one can add the System.Timestamp() to the GROUP BY clause [1]
          • {type} window duration
            • the length of each window interval [1]
            • can be in seconds, minutes, hours, and even days [1]
          • {parameter} window offset
            • optional parameter that shifts the start and end of each window interval by a specified amount of time [1]
          • {concept} grouping key
            • one or more columns in your event data that you wish to group by [1]
          • aggregation function
            • one or more of the functions applied to each group of events in each window [1]
              • where the counts, sums, averages, min/max, and even custom functions become useful [1]
    • see the event data flowing through the pipeline in real-time [1]
    • handles the scaling, reliability, and security of event stream automatically [1]
      • no need to write any code or manage any infrastructure [1]
    • {feature} eventstream editing canvas
      • used to 
        • add and manage sources and destinations [1]
        • see the event data [1]
        • check the data insights [1]
        • view logs for each source or destination [1]
  • {feature} Apache Kafka endpoint on the Eventstream item
    • {benefit} enables users to connect and consume streaming events through the Kafka protocol [2]
      • application using the protocol can send or receive streaming events with specific topics [2]
      • requires updating the connection settings to use the Kafka endpoint provided in the Eventstream [2]
  • {feature} support runtime logs and data insights for the connector sources in Live View mode [3]
    • allows to examine detailed logs generated by the connector engines for the specific connector [3]
      • help with identifying failure causes or warnings [3]
      • ⇐ accessible in the bottom pane of an eventstream by selecting the relevant connector source node on the canvas in Live View mode [3]
  • {feature} support data insights for the connector sources in Live View mode [3]
  • {feature} integrates eventstreams CI/CD tools
      • {benefit} developers can efficiently build and maintain eventstreams from end-to-end in a web-based environment, while ensuring source control and smooth versioning across projects [3]
  • {feature} REST APIs
    •  allow to automate and manage eventstreams programmatically
      • {benefit} simplify CI/CD workflows and making it easier to integrate eventstreams with external applications [3]
  • {recommendation} use event streams feature with at least SKU: F4 [2]
  • {limitation} maximum message size: 1 MB [2]
  • {limitation} maximum retention period of event data: 90 days [2]

References:
[1] Microsoft Learn (2024) Microsoft Fabric: Use real-time eventstreams in Microsoft Fabric [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Fabric Eventstream - overview [link]
[3] Microsoft Learn (2024) Microsoft Fabric: What's new in Fabric event streams? [link]
[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]
[5] Microsoft Learn (2025) Use real-time eventstreams in Microsoft Fabric [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Fabric Updates Blog (2024) CI/CD – Git Integration and Deployment Pipeline [link]
[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
API - Application Programming Interface
CDC - Change Data Capture
CI/CD  - Continuous Integration/Continuous Delivery
DB - database
IoT - Internet of Things
KQL - Kusto Query Language
RTI - Real-Time Intelligence
SaaS - Software-as-a-Service
SDK - Software Development Kit
SKU - Stock Keeping Unit

20 August 2009

🛢DBMS: Change Data Capture [CDC] (Definitions)

"As changes are made to a production data source, change data capture reads the source database log. This information can be used to prepare a batch to update the data warehouse, or it can update the data warehouse on a transaction-by-transaction basis. With SQL Server 7.0, transactional replication is an example of change data capture." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"The process of capturing changes made to a production data source. Change data capture is typically performed by reading the log file of the Database Management System of the source database. Change data capture consolidates units of work, ensures data is synchronized with the original source, and reduces data volume in a Data Warehousing environment." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of capturing changes made to a production data source; typically used in data warehousing environments." (Craig S Mullins, "Database Administration", 2012)

"Change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data. CDC often uses the database transaction log to populate the deltas, although it can also query the database directly." (Piethein Strengholt, "Data Management at Scale", 2020)

"CDC is the process of capturing changes that were made in the source systems and applying these changes throughout the enterprise for both decision support systems such as information warehouse and operational data stores as well as other downstream consuming applications." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"An automated approach for ensuring that data changes are synchronized across an enterprise by replicating data changes from a source system to other systems." (Jonathan Ferrar et al, "The Power of People", 2017)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.