09 March 2025

🏭🎗️🗒️Microsoft Fabric: Eventhouses [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventhouses

  • [def] 
  • a service that empowers users to extract insights and visualize data in motion
    • offers an end-to-end solution for 
      • event-driven scenarios
        • ⇐ rather than schedule-driven solutions  [1]
    • a workspace of databases
      • can be shared across projects [1]
  • allows to manage multiple databases at once
    • sharing capacity and resources to optimize performance and cost
    • provides unified monitoring and management across all databases and per database [1]
  • provide a solution for handling and analyzing large volumes of data
    • particularly in scenarios requiring real-time analytics and exploration [1]
    • designed to handle real-time data streams efficiently [1]
      • lets organizations ingest, process, and analyze data in near real-time [1]
  • provide a scalable infrastructure that allows organizations to handle growing volumes of data, ensuring optimal performance and resource use.
    • preferred engine for semistructured and free text analysis
    • tailored to time-based, streaming events with structured, semistructured, and unstructured data [1]
    • allows to get data 
      • from multiple sources, 
      • in multiple pipelines
        • e.g. Eventstream, SDKs, Kafka, Logstash, data flows, etc.
      • multiple data formats [1]
    • data is automatically indexed and partitioned based on ingestion time
  • designed to optimize cost by suspending the service when not in use [1]
    • reactivating the service, can lead to a latency of a few seconds [1]
      • for highly time-sensitive systems that can't tolerate this latency, use Minimum consumption setting [1] 
        • enables the service to be always available at a selected minimum level [1]
          • customers pay for 
            • the minimum compute level selected [1]
            • the actual consumption when the compute level is above the minimum set [1]
        • the specified compute is available to all the databases within the eventhouse [1]
    • {scenario} solutions that includes event-based data
      • e.g. telemetry and log data, time series and IoT data, security and compliance logs, or financial records [1]
  • KQL databases 
    • can be created within an eventhouse [1]
    • can either be a standard database, or a database shortcut [1]
    • an exploratory query environment is created for each KQL Database, which can be used for exploration and data management [1]
    • data availability in OneLake can be enabled on a database or table level [1]
  • Eventhouse page 
    • serves as the central hub for all your interactions within the Eventhouse environment [1]
    • Eventhouse ribbon
      • provides quick access to essential actions within the Eventhouse
    • explorer pane
      • provides an intuitive interface for navigating between Eventhouse views and working with databases [1]
    • main view area 
      • displays the system overview details for the eventhouse [1]
  • {feature} Eventhouse monitoring
    • offers comprehensive insights into the usage and performance of the eventhouse by collecting end-to-end metrics and logs for all aspects of an Eventhouse [2]
    • part of workspace monitoring that allows you to monitor Fabric items in your workspace [2]
    • provides a set of tables that can be queried to get insights into the usage and performance of the eventhouse [2]
      • can be used to optimize the eventhouse and improve the user experience [2]
  • {feature} query logs table
    • contains the list of queries run on an Eventhouse KQL database
      • for each query, a log event record is stored in the EventhouseQueryLogs table [3]
    • can be used to
      • analyze query performance and trends [3]
      • troubleshoot slow queries [3]
      • identify heavy queries consuming large amount of system resources [3]
      • identify the users/applications running the highest number of queries[3]
  • {feature} OneLake availability
    • {benefit} allows to create one logical copy of a KQL database data in an eventhouse by turning on the feature [4]
      • users can query the data in the KQL database in Delta Lake format via other Fabric engines [4]
        • e.g. Direct Lake mode in Power BI, Warehouse, Lakehouse, Notebooks, etc.
    • {prerequisite} a workspace with a Microsoft Fabric-enabled capacity [4]
    • {prerequisite} a KQL database with editing permissions and data [4]
    • {constraint} rename tables
    • {constraint} alter table schemas
    • {constraint} apply RLS to tables
    • {constraint} data can't be deleted, truncated, or purged
    • when turned on, a mirroring policy is enabled
      • can be used to monitor data latency or alter it to partition delta tables [4]
  • {feature} robust adaptive mechanism
    • intelligently batches incoming data streams into one or more Parquet files, structured for analysis [4]
    • ⇐ important when dealing with trickling data [4]
      • ⇐ writing many small Parquet files into the lake can be inefficient resulting in higher costs and poor performance [4]
    • delays write operations if there isn't enough data to create optimal Parquet files [4]
      • ensures Parquet files are optimal in size and adhere to Delta Lake best practices [4]
      • ensures that the Parquet files are primed for analysis and balances the need for prompt data availability with cost and performance considerations [4]
      • {default} the write operation can take up to 3 hours or until files of sufficient size are created [4]
        • typically the files have 200-256 MB
        • the value can be adjusted between 5 minutes and 3 hours [4]
          • {warning} adjusting the delay to a shorter period might result in a suboptimal delta table with a large number of small files [4]
            • can lead to inefficient query performance [4]
        • {restriction} the resultant table in OneLake is read-only and can't be optimized after creation [4]
    • delta tables can be partitioned to improve query speed [4]
      • each partition is represented as a separate column using the PartitionName listed in the Partitions list [4]
        • ⇒ OneLake copy has more columns than the source table [4]
References:
[1] Microsoft Learn (2025) Microsoft Fabric: Eventhouse overview [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Eventhouse monitoring [link
[3] Microsoft Learn (2025) Microsoft Fabric: Query logs [link]  
[4] Microsoft Learn (2025) Microsoft Fabric: Eventhouse OneLake Availability [link]
[5] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Eventhouse Monitoring (Preview) [link]
[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
KQL - Kusto Query Language
SDK - Software Development Kit
RLS - Row Level Security 
RTI - Real-Time Intelligence

08 March 2025

🏭🎗️🗒️Microsoft Fabric: Real-Time Intelligence (RTI) [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 9-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Real-Time Intelligence [RTI]

  • [def]
    • {goal} provide a complete real-time SaaS platform within MF
      • {benefit} helps gain actionable insights from data, with the ability to ingest, transform, query, visualize, and act on it in real time [4]
    • {goal} provides a single place for data-in-motion
      • {benefit} allows to pull event streams from Real Time Hub
        • provides a single data estate for data in motion simplifying the ingestion, curation and processing of streaming data from Microsoft and external sources [4]
        • empowers users to extract insights and visualize data in motion [1]
    • {goal} enable rapid solution development
      • {benefit} provides a range of no-code, low-code and pro-code experiences for various scenarios [4]
        • everything from business insight discovery to complex stream processing, and application and model development [4]
    • {goal} enable real-time AI insights
      • {benefit} scales beyond human monitoring and drive actions with built in, automated capabilities [4]
        • allows anyone in the organization to take advantage of [4]
    • offers an end-to-end solution for 
      • event-driven scenarios
        • ⇐ rather than schedule-driven solutions. 
      • streaming data
      • data logs
    • {benefit} help customers accelerate speed and precision of business by providing [4]
      • {goal} operational efficiency
        • by allowing to streamline processes and make data driven decisions with accurate, up to date information [4]
      • {goal} end-to-end visibility
        • by allowing to gain a holistic understanding of business health and discover actionable insights for timely action [4]
      • {goal} competitive advantage
        • by allowing to quickly react to shifting market trends, identify opportunities and mitigate risk in real time [4]
    • seamlessly connects time-based data from various sources using no-code connectors [1]
      • enables immediate 
        • visual insights
        • geospatial analysis
        • trigger-based reactions 
        • ⇐ all are part of an organization-wide data catalog [1]
      • ⇐ time oriented data is difficult to manage, yet critical for success [4]
        • {challenge} capture high throughput data from disparate sources in real time [4]
        • {challenge} model scenarios using event data [4]
        • {challenge} choose from an array of bespoke technologies and data formats [4]
        • {challenge} leverage the power of AI against data in real time [4]
        • without the ability to leverage time oriented data, businesses are vulnerable to risks [4]
          • {risk} poor decision-making
          • {risk} financial loss
          • {risk} reduced operational efficiency
          • {risk} impaired data integrity
          • {risk} non-compliance
          • {risk} negative user experience
      • {capability} single unified SaaS solution
        • in opposition to a fragmented, fragile tech stack
        • allows to ingest & process all event sources, in any data format [4]
          • one can connect to diverse  streaming sources and leverage no code and low code experiences to process and route quickly [4]
            • via out of the box connectors for streaming and event data sources [4]
          • events can be routed to other Fabric and 3rd party entities [4]
          • organizational BI reports can be enhanced with enriched data [4]
        • allows to analyze and transform data event streams using queries and visual exploration to discover insights in real time [4]
          • one can manage an unlimited amount of data [4]
          • multiple databases can be monitored and managed at once [4]
        • allows to act quickly on top of data
          • via triggers and alerts on changing data to respond automatically and set action when specific conditions are detected [4]
            • helps drive actions on a per instance state that evolves over time [4]
            • helps to act on data without needing a deep schema and semantic modeling [4]
      • {capability} accessible data and analytics tools
        • in opposition to advanced skillsets required
      • {capability} real-time stream processing
        • in opposition to batch data processing
    • handles 
      • data ingestion
      • data transformation
      • data storage
      • data analytics
      • data visualization
      • data tracking
      • AI
      • real-time actions
    • can be used for 
      • data analysis
      • immediate visual insights
      • centralization of data in motion for an organization
      • actions on data
      • efficient querying, transformation, and storage of large volumes of structured or unstructured data [1]
  • helps evaluate data from 
    • IoT systems
    • system logs
    • free text
    • semi structured data, or contribute data for consumption by others in your organization, 
  • provides a versatile solution
    • transforms the data into a dynamic, actionable resource that drives value across the entire organization
  • its components are built on trusted, core Microsoft rather than schedule-driven solutions 
    • ⇐ together they extend the overall Fabric capabilities to provide event-driven solutions [1]
  • {feature} Real-Time hub
    • serves as a centralized catalog that facilitates the easy access, addition, exploration, and data sharing [1]
    • expands the range of data sources
      • ⇐ it enables broader insights and visual clarity across various domains [1]
    • ensures that data is accessible to all [1]
      • promoting quick decision-making and informed action
    • the sharing of streaming data from diverse sources unlocks the potential to build BI solutions across the organization [1]
    • use the data consumption tools to explore the data [1]
  • {feature} Real-Time dashboards 
    • come equipped with out-of-the-box interactions 
      • {benefit} simplify the process of understanding data, making it accessible to anyone who wants to make decision based on data in motion using visual tools, Natural Language and Copilot [1]
    • query the data in real-time as it’s being loaded [6]
      • every time a query is run, it leverages the latest data available in an Eventhouse or OneLake [6]
        • behave much like DirectQuery, but without the need to load data into a semantic model. [6]  
  • {feature} Activator
    • {benefit} allows to turn insights into actions by setting up alerts from various parts of Fabric to react to data patterns or conditions in real-time [1]
    • takes events as they are being processed into Eventstreams or Eventhouses and connects them to downstream systems to make data actionable [6]
  • {feature} Real-Time hub events 
    • a catalog of data in motionless
    • contains:
      • data streams 
        • all data streams that are actively running in Fabric to which the user has access to
        • once  a stream of data is connected, the entire SaaS solution becomes accessible [1]
      • Microsoft sources: 
        • easily discover streaming sources that the users have and quickly configure ingestion of those sources into Fabric
          • e.g. Azure Event Hubs, Azure IoT Hub, Azure SQL DB CDC, Azure Cosmos DB CDC, PostgreSQL DB CDC
      • Fabric events
        • event-driven capabilities support real-time notifications and data processing 
          • ⇒ one can monitor and react to events [1]
            • e.g. Fabric Workspace Item events, Azure Blob Storage events
          • ⇐ the events can be used to trigger other actions or workflows [1]
            • e.g. invoking a data pipeline or sending a notification via email. 
        • the events can be sent to other destinations via eventstreams [1]
  • {feature} Eventstreams
    • event processing capabilities 
      • ⇐ behave like event listeners that wait for data to be pushed to them [6]
    • {benefit} allow to capture, transform, and route high volumes of real-time events to various destinations with a no-code experience [1]
    • support multiple data sources and data destinations [1]
    • {benefit} allow to do filtering, data cleansing, transformation, windowed aggregations, and dupe detection, to land the data in the needed shape [1]
    • one can use the content-based routing capabilities to send data to different destinations based on filters [1]
    • derived eventstreams allows constructing new streams as a result of transformations and/or aggregations that can be shared to consumers in Real-Time hub [1]
  • {feature} Eventhouses
    • the ideal analytics engine to process data in motion
      •  scalable ingestion engine with the ability to handle up to millions of events per hour [6]
    • tailored to time-based, streaming events with structured, semi structured, and unstructured data [1]
    • data is automatically indexed and partitioned based on ingestion time
      • ⇐ provides fast and complex analytic querying capabilities on high-granularity data [1]
    • the stored data can be made available in OneLake for consumption by other Fabric experiences [1]
      • ⇐ the data is ready for lightning-fast query using various code, low-code, or no-code options in Fabric [1]
    • the data can be queried in native KQL or in T-SQL in the KQL query set [1]
References:
[1] Microsoft Fabric (2024) What is Real-Time Intelligence? [link]
[2] Microsoft Fabric (2024) Real-Time Intelligence documentation in Microsoft Fabric [link
[3] Microsoft Fabric Updates Blog (2024) Fabric workloads are now generally available! [link]
[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]
[5] Microsoft Fabric Community (2024) Benefits of Migrating to Fabric RTI [link]
[6] Microsoft Fabric Update Blog (2025) Operational Reporting with Microsoft Fabric Real-Time Intelligence [link]
[7] Microsoft Learn (2025) Get started with Real-Time Intelligence in Microsoft Fabric [link]
[8] Microsoft Learn (2025) Implement Real-Time Intelligence with Microsoft Fabric [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Learn (2024) Microsoft Fabric RTI Demo Application [link] [GitHub]
[R3] Microsoft Fabric Updates Blog (2024) Understanding Real-Time Intelligence usage reporting and billing [link]
[R4] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
AI - Artificial Intelligence
CDC - Change Data Capture
DB - database
IoT - Internet of Things
KQL  - Kusto Query Language
MF - Microsoft Fabric
RTI - Real-Time Intelligence
SaaS - Software-as-a-Service
SQL - Structured Query Language

🏭🎗️🗒️Microsoft Fabric: Eventstreams [Notes]

Disclaimer: This is work in progress intended to consolidate information from various sources for learning purposes. For the latest information please consult the documentation (see the links below)! 

Last updated: 8-Mar-2025

Real-Time Intelligence architecture
Real-Time Intelligence architecture [4]

[Microsoft Fabric] Eventstream(s)

  • {def} feature in Microsoft Fabric's Real-Time Intelligence experience, that allows to bring real-time events into Fabric
    • bring real-time events into Fabric, transform them, and then route them to various destinations without writing any code 
      • ⇐ aka no-code solution
      • {feature} drag and drop experience 
        • gives users an intuitive and easy way to create your event data processing, transforming, and routing logic without writing any code 
    • work by creating a pipeline of events from multiple internal and external sources to different destinations
      • a conveyor belt that moves data from one place to another [1]
      • transformations to the data can be added along the way [1]
        • filtering, aggregating, or enriching
  • {def} eventstream
    • an instance of the Eventstream item in Fabric [2]
    • {feature} end-to-end data flow diagram 
      • provide a comprehensive understanding of the data flow and organization [2].
  • {feature} eventstream visual editor
    • used to design pipelines by dragging and dropping different nodes [1]
    • sources
      • where event data comes from 
      • one can choose
        • the source type
        • the data format
        • the consumer group
      • Azure Event Hubs
        • allows to get event data from an Azure event hub [1]
        • allows to create a cloud connection  with the appropriate authentication and privacy level [1]
      • Azure IoT Hub
        • SaaS service used to connect, monitor, and manage IoT assets with a no-code experience [1]
      • CDC-enabled databases
        • software process that identifies and tracks changes to data in a database, enabling real-time or near-real-time data movement [1]
        • Azure SQL Database 
        • PostgreSQL Database
        • MySQL Database
        • Azure Cosmos DB
      • Google Cloud Pub/Sub
        • messaging service for exchanging event data among applications and services [1]
      • Amazon Kinesis Data Streams
        • collect, process, and analyze real-time, streaming data [1]
      • Confluent Cloud Kafka
        • fully managed service based on Apache Kafka for stream processing [1]
      • Fabric workspace events
        • events triggered by changes in Fabric Workspace
          • e.g. creating, updating, or deleting items. 
        • allows to capture, transform, and route events for in-depth analysis and monitoring within Fabric [1]
        • the integration offers enhanced flexibility in tracking and understanding workspace activities [1]
      • Azure blob storage events
        • system triggers for actions like creating, replacing, or deleting a blob [1]
          • these actions are linked to Fabric events
            • allowing to process Blob Storage events as continuous data streams for routing and analysis within Fabric  [1]
        • support streamed or unstreamed events [1]
      • custom endpoint
        • REST API or SDKs can be used to send event data from custom app to eventstream [1]
        • allows to specify the data format and the consumer group of the custom app [1]
      • sample data
        • out-of-box sample data
    • destinations
      • where transformed event data is stored. 
        • in a table in an eventhouse or a lakehouse [1]
        • redirect data to 
          • another eventstream for further processing [1]
          • an activator to trigger an action [1]
      • Eventhouse
        • offers the capability to funnel your real-time event data into a KQL database [1]
      • Lakehouse
        • allows to preprocess real-time events before their ingestion in the lakehouse
          • the events are transformed into Delta Lake format and later stored in specific lakehouse tables [1]
            • facilitating the data warehousing needs [1]
      • custom endpoint
        • directs real-time event traffic to a bespoke application [1]
        • enables the integration of proprietary applications with the event stream, allowing for the immediate consumption of event data [1]
        • {scenario} aim to transfer real-time data to an independent system not hosted on the Microsoft Fabric [1]
      • Derived Stream
        • specialized destination created post-application of stream operations like Filter or Manage Fields to an eventstream
        • represents the altered default stream after processing, which can be routed to various destinations within Fabric and monitored in the Real-Time hub [1]
      • Fabric Activator
        • enables to use Fabric Activator to trigger automated actions based on values in streaming data [1]
    • transformations
      • filter or aggregate the data as is processed from the stream [1]
      • include common data operations
        • filtering
          • filter events based on the value of a field in the input
          • depending on the data type (number or text), the transformation keeps the values that match the selected condition, such as is null or is not null [1]
        • joining
          • transformation that combines data from two streams based on a matching condition between them [1]
        • aggregating
          • calculates an aggregation every time a new event occurs over a period of time [1]
            • Sum, Minimum, Maximum, or Average
          • allows renaming calculated columns, and filtering or slicing the aggregation based on other dimensions in your data [1]
          • one can have one or more aggregations in the same transformation [1]
        • grouping
          • allows to calculate aggregations across all events within a certain time window [1]
            • one can group by the values in one or more fields [1]
          • allows for the renaming of columns
            • similar to the Aggregate transformation 
            • ⇐ provides more options for aggregation and includes more complex options for time windows [1]
          • allows to add more than one aggregation per transformation [1]
            • allows to define the logic needed for processing, transforming, and routing event data [1]
        • union
          • allows to connect two or more nodes and add events with shared fields (with the same name and data type) into one table [1]
            • fields that don't match are dropped and not included in the output [1]
        • expand
          • array transformation that allows to create a new row for each value within an array [1]
        • manage fields
          • allows to add, remove, change data type, or rename fields coming in from an input or another transformation [1]
        • temporal windowing functions 
          • enable to analyze data events within discrete time periods [1]
          • way to perform operations on the data contained in temporal windows [1]
            • e.g. aggregating, filtering, or transforming streaming events that occur within a specified time period [1]
            • allow analyzing streaming data that changes over time [1]
              • e.g. sensor readings, web-clicks, on-line transactions, etc.
              • provide great flexibility to keep an accurate record of events as they occur [1]
          • {type} tumbling windows
            • divides incoming events into fixed and nonoverlapping intervals based on arrival time [1]
          • {type} sliding windows 
            • take the events into fixed and overlapping intervals based on time and divides them [1]
          • {type} session windows 
            • divides events into variable and nonoverlapping intervals that are based on a gap of lack of activity [1]
          • {type} hopping windows
            • are different from tumbling windows as they model scheduled overlapping window [1]
          • {type} snapshot windows 
            • group event stream events that have the same timestamp and are unlike the other windowing functions, which require the function to be named [1]
            • one can add the System.Timestamp() to the GROUP BY clause [1]
          • {type} window duration
            • the length of each window interval [1]
            • can be in seconds, minutes, hours, and even days [1]
          • {parameter} window offset
            • optional parameter that shifts the start and end of each window interval by a specified amount of time [1]
          • {concept} grouping key
            • one or more columns in an event data use to group the data               by [1]
          • aggregation function
            • one or more of the functions applied to each group of events in each window [1]
              • where the counts, sums, averages, min/max, and even custom functions become useful [1]
    • see the event data flowing through the pipeline in real-time [1]
    • handles the scaling, reliability, and security of event stream automatically [1]
      • no need to write any code or manage any infrastructure [1]
    • {feature} eventstream editing canvas
      • used to 
        • add and manage sources and destinations [1]
        • see the event data [1]
        • check the data insights [1]
        • view logs for each source or destination [1]
  • {feature} Apache Kafka endpoint on the Eventstream item
    • {benefit} enables users to connect and consume streaming events through the Kafka protocol [2]
      • application using the protocol can send or receive streaming events with specific topics [2]
      • requires updating the connection settings to use the Kafka endpoint provided in the Eventstream [2]
  • {feature} support runtime logs and data insights for the connector sources in Live View mode [3]
    • allows to examine detailed logs generated by the connector engines for the specific connector [3]
      • help with identifying failure causes or warnings [3]
      • ⇐ accessible in the bottom pane of an eventstream by selecting the relevant connector source node on the canvas in Live View mode [3]
  • {feature} support data insights for the connector sources in Live View mode [3]
  • {feature} integrates eventstreams CI/CD tools
      • {benefit} developers can efficiently build and maintain eventstreams from end-to-end in a web-based environment, while ensuring source control and smooth versioning across projects [3]
  • {feature} REST APIs
    •  allow to automate and manage eventstreams programmatically
      • {benefit} simplify CI/CD workflows and making it easier to integrate eventstreams with external applications [3]
  • {recommendation} use event streams feature with at least SKU: F4 [2]
  • {limitation} maximum message size: 1 MB [2]
  • {limitation} maximum retention period of event data: 90 days [2]

References:
[1] Microsoft Learn (2024) Microsoft Fabric: Use real-time eventstreams in Microsoft Fabric [link]
[2] Microsoft Learn (2025) Microsoft Fabric: Fabric Eventstream - overview [link]
[3] Microsoft Learn (2024) Microsoft Fabric: What's new in Fabric event streams? [link]
[4] Microsoft Learn (2025) Real Time Intelligence L200 Pitch Deck [link]
[5] Microsoft Learn (2025) Use real-time eventstreams in Microsoft Fabric [link]

Resources:
[R1] Microsoft Learn (2024) Microsoft Fabric exercises [link]
[R2] Microsoft Fabric Updates Blog (2024) CI/CD – Git Integration and Deployment Pipeline [link]
[R3] Microsoft Learn (2025) Fabric: What's new in Microsoft Fabric? [link]

Acronyms:
API - Application Programming Interface
CDC - Change Data Capture
CI/CD  - Continuous Integration/Continuous Delivery
DB - database
IoT - Internet of Things
KQL - Kusto Query Language
RTI - Real-Time Intelligence
SaaS - Software-as-a-Service
SDK - Software Development Kit
SKU - Stock Keeping Unit

#️⃣Software Engineering: Programming (Part XVI: The Software Quality Perspective and AI)

Software Engineering Series
Software Engineering Series

Organizations tend to complain about poor software quality developed in-house, by consultancy companies or third parties, without doing much in this direction. Unfortunately, this agrees with the bigger picture reflected by the quality standards adopted by organizations - people talk and complain about them, though they aren’t that eager to include them in the various strategies, or even if they are considered, they are seldom enforced adequately!

Moreover, even if quality standards are adopted, and a lot of effort may be spent in this direction (as everybody has strong opinions and there are many exceptions), as projects progress, all the good intentions come to an end, the rules fading on the way either because are too strict, too general, aren’t adequately prioritized or communicated, or there’s no time to implement (all of) them. This applies in general to programming and to the domains that revolve around data – Business Intelligence, Data Analytics or Data Science.

The volume of good quality code and deliverables is not only a reflection of an organization’s maturity in dealing with best practices but also of its maturity in handling technical debt, Project Management, software and data quality challenges. All these aspects are strongly related to each other and therefore require a systemic approach rather than focusing on the issues locally. The systemic approach allows organizations to bridge the gaps between business areas, teams, projects and any other areas of focus.

There are many questionable studies on the effect of methodologies on software quality and data issues, proclaiming that one methodology is better than the other in addressing the multifold aspects of software quality. Besides methodologies, some studies attempt to correlate quality with organizations’ size, management or programmers’ experience, the size of software, or whatever characteristic might seem to affect quality.

Bad code is written independently of companies’ size or programmer's experience, management or organization’s maturity. Bad code doesn’t necessarily happen all at once, but it can depend on circumstances, repetitive team, requirements and code changes. There are decisions and actions that sooner or later can affect the overall outcome negatively.

Rewriting the code from scratch might look like an approachable measure though it’s seldom the cost-effective solution. Allocating resources for refactoring is usually a better approach, though this tends to increase considerably the cost of projects, and organizations might be tempted to face the risks, whatever they might be. Independently of the approaches used, sooner or later the complexity of projects, requirements or code tends to kick back.

There are many voices arguing that AI will help in addressing the problems of software development, quality assurance and probably other areas. It’s questionable how much AI will help to address the gaps, non-concordances and other mistakes in requirements, and how it will develop quality code when it has basic "understanding" issues. Even if step by step all current issues revolving around AI will be fixed, it will take time and multiple iterations until meaningful progress will be made.

At least for now, AI tools like Copilot or ChatGPT can be used for learning a programming language or framework through predefined or ad-hoc prompts. Probably, it can be used also to identify deviations from best practices or other norms in scope. This doesn’t mean that AI will replace for now code reviews, testing and other practices used in assuring the quality of software, but it can be used as an additional method to check for what was eventually missed in the other methods.

AI may also have hidden gems that when discovered, polished and sized, may have a qualitative impact on software development and software. Only time will tell what’s possible and achievable.

06 March 2025

💎🤖SQL Reloaded: Copilot Stories (Part X: Working with the Left Function)

Left is one of the dull string functions in SQL Server, returning the left part of a character string with the specified number of characters. Besides mid, right and a few other functions, Left can help retrieve the subparts of a string (see previous post). This post evaluates the answers provided by Microsoft 365 Copilot in the use of the respective function.

Disclaimer: The below code was generated by Copilot via prompts and is used for demonstration purposes!

Prompt:
Give me 10 scenarios with examples in which the Left function can be used in SQL Server.

Output:

-- Example 1: Extracting Initials
SELECT LEFT('John', 1) + LEFT('Doe', 1) AS Initials -- Result: 'JD'

-- Example 2: Getting the First N Characters of a String:
SELECT LEFT('Hello World', 5) AS Substring; -- Result: 'Hello'

-- Example 3: Extracting wear from a Date string
SELECT LEFT('2025-03-06', 4) AS Year; -- Result: '2025'

-- Example 4: Truncating a string for display:
SELECT LEFT('This is a long description', 10) + '...' AS ShortDescription; -- Result: 'This is a ...'

-- Example 5: Extracting Country Code from Phone Number:
SELECT LEFT('+49 1234567890', 3) AS CountryCode; -- Result: '+49'

-- Example 6: Getting the First Word of a Sentence:
SELECT LEFT('Hello World', CHARINDEX(' ', 'Hello World') - 1) AS FirstWord;-- Result: 'Hello'

-- Example 7: Formatting Product Codes:
SELECT LEFT(ProductCode, 3) AS ProductPrefix
FROM Products; -- Result: 'ABC' for ProductCode 'ABC12345'

-- Example 8: Extracting File Extension:
SELECT LEFT(FileName, CHARINDEX('.', FileName) - 1) AS FileExtension
FROM Files; -- Result: 'txt' for FileName 'document.txt'

-- Example 9: Creating Abbreviations:
SELECT LEFT('International Business Machines', 3) AS Abbreviation; -- Result: 'Int'

-- Example 10: Extracting Date Part from DateTime:
SELECT LEFT(CONVERT(VARCHAR, GETDATE(), 120), 10) AS DatePart; -- Result: '2025-03-06'

Example 1: Extracting the initials can prove to be useful when the space in a report is limited and one wants to have the initial of the person who created or updated the record, respectively who is responsible for a part of the process involved. One can also combine an initial with the full name.

Example 2: Getting the first word from a sentence is useful when an encoding can be derived from it. However, it's seldom that the first characters that form the encoding have a fixed number of letters. One can use the technique from Example 6.

Example 3: Getting the year or other date part from a string can be done in this way, at least if the dates are always in the expected format. It's recommended to get the date parts via the DatePart or DateName functions. 

Example 4: This is a generalization of Example 2.

Example 5: One can get the country code from a phone number if the string is correctly delimited, which is seldom the case. Sometimes, one can use the Country field from the address to identify the correct code from the phone number, though there can also be exceptions.

Example 6: This is usually the recommended way to extract the string before a delimiter and this technique has applicability in many scenarios. 

Example 7: Product numbers and other similar encodings are formed via general rules and the formula should take the rules into account.

Example 8: This technique is used quite often in combination with file names and other similar encodings.

Example 9: The rules of abbreviations are usually more complex than this, though this may work occasionally. 

Example 10: That's a valid way to extract the date from a string that contains also the time. One should consider the alternatives as well.

Prompt:
Give me 10 more scenarios with examples in which the Left function can be used in SQL Server.

Output:

-- Example 11: Extracting the First N Digits of a Number:
SELECT LEFT(CAST(1234567890 AS VARCHAR), 5) AS FirstDigits; -- Result: '12345'

-- Example 12: Trimming a String to a Fixed Length:
SELECT LEFT('This is a test string', 10) AS TrimmedString; -- Result: 'This is a '

-- Example 13: Extracting the First Part of an Email Address:
SELECT LEFT('user@example.com', CHARINDEX('@', 'user@example.com') - 1) AS Username; -- Result: 'user'

-- Example 14: Getting the First N Characters of a Column Value:
SELECT LEFT(ColumnName, 8) AS ShortenedValue
FROM TableName;-- Result: 'Example' for ColumnName 'ExampleValue'

-- Example 15: Extracting the First Few Characters of a GUID:
SELECT LEFT(NEWID(), 8) AS ShortGUID; -- Result: '550e8400'

-- Example 16: Example 11: Creating a Shortened Version of a URL:
SELECT LEFT('https://example.com/page', 15) + '...' AS ShortURL; -- Result: 'https://example...'

-- Example 17: Extracting the First N Characters of a File Path:
SELECT LEFT('C:\Users\Name\Documents\File.txt', 12) AS ShortPath; -- Result: 'C:\Users\Nam'

-- Example 18: Getting the First N Characters of a Description:
SELECT LEFT(Description, 20) AS ShortDescription
FROM Products; -- Result: 'This is a product de'

-- Example 19: Extracting the First N Characters of a JSON String:
SELECT LEFT('{"name":"John","age":30}', 10) AS ShortJSON; -- Result: '{"name":"J'

-- Example 20: Creating a Shortened Version of a Log Entry:
SELECT LEFT(LogEntry, 50) + '...' AS ShortLog
FROM Logs; -- Result: 'Error: Unable to connect to the database. Please...'

Example 11: The example makes sense especially when the first n characters have a special meaning. Just selecting n characters from a random numbers brings little benefit, though there might also be exceptions out there.

Example 12: This is similar with example 3.

Example 13: Extracting the first part of an email can be useful for further processing especially when the emails have a predefined form (e.g. first name and last name, usually delimited by a special character).

Example 14: The example is a variation of 13 but with applicability to a given column from a table.

Example 15: It may look like a way to generate unique values, though no part of the GUID is unique when taken in isolation from any other part!

Example 16: This can be used to extract the hostname of a URL.

Example 17: This technique can be used to provide the main path to a folder, though more complex rules to generate it usually apply.

Example 18: Descriptions tend to be long enough to provide identification information for the products. Therefore, selecting only long enough parts of a description can help minimize the volume of data stored, which can be useful in reports of any type. 

Example 19: The technique can be used indeed, though it has little value without the addition of further information (see first example).

Example 20: If Log entries tend to be real long, this can help extracting the essential information. It can allow further on to aggregate the information and identify the number of events stored in the log. 

Not all the examples are representative for the use of the Left function and even if it looks that they repeat themselves, the context brings slight differences. They can be met in practice in combination with CharIndex, date or similar functions. 

Happy coding!

Previous Post <<||>> Next Post 

💎🤖SQL Reloaded: Copilot Stories (Part IX: Working with the Stuff Function)

One of the underused string functions in SQL Server is Stuff, which returns part of a character, binary, text, or image expression. This post evaluates the answers provided by Microsoft 365 Copilot in the use of the respective function.

Disclaimer: The below code was generated by Copilot via prompts and is used for demonstration purposes!

Prompt:
Give me 10 scenarios with examples in which the Stuff function can be used in SQL Server.

Output:

/* Example 1: Inserting a Substring */

-- Insert a substring into a specific position within a string
SELECT STUFF('Hello World', 6, 0, 'SQL ') AS Result; -- Result: 'Hello SQL World'

/* Example 2: Replacing a Substring */

-- Replace a part of a string with another substring.
SELECT STUFF('Hello World', 7, 5, 'SQL Server') AS Result; -- Result: 'Hello SQL Server'

/* Example 3: Removing Characters */

-- Remove a specific number of characters from a string
SELECT STUFF('Hello World', 6, 5, '') AS Result; -- Result: 'Hello'

/* Example 4: Masking Sensitive Data */

-- Mask a credit card number
SELECT STUFF('1234-5678-9012-3456', 6, 9, '****-****') AS MaskedCard; -- Result: '1234-****-****-3456'

/* Example 5: Formatting Phone Numbers */

-- insert formatting characters into a phone number.
SELECT STUFF(STUFF('1234567890', 4, 0, '-'), 8, 0, '-') AS FormattedPhone; -- Result: '123-456-7890'

/* Example 6: Creating Initials */

-- extract initials from a full name.
SELECT STUFF(STUFF('John Doe', 2, 3, '.'), 5, 3, '.') AS Initials; -- Result: 'J.D.'

/* Example 7: Dynamic SQL Generation */

-- modify parts of a dynamic SQL query.
DECLARE @sql NVARCHAR(MAX) = 'SELECT * FROM Table WHERE Column = Value';
SELECT STUFF(@sql, CHARINDEX('Value', @sql), LEN('Value'), 'NewValue') AS ModifiedSQL; -- Result: 'SELECT * FROM Table WHERE Column = NewValue'

/* Example 8: URL Manipulation */

-- Insert or replace parts of a URL.
SELECT STUFF('http://example.com/page', 8, 0, 'www.') AS ModifiedURL; -- Result: 'http://www.example.com/page'

/* Example 9: Handling CSV Data */

-- Modify parts of a CSV string
SELECT STUFF('1,2,3,4,5', 3, 1, '10') AS ModifiedCSV; -- Result: '1,10,3,4,5'

/* Example 10: Custom String Formatting */ 

-- Insert custom formatting into a string
SELECT STUFF('20250305', 5, 0, '-') AS FormattedDate; -- Result: '2025-0305'

The first 6 examples are typical usages of the Stuff function while the other might raise the eyebrows of some developers.

Example 1: The example is not entirely correct (counting the characters starts at 1).

Example 2: In exchange, this example is correct.

Example 3: The example is unfortunately incorrect (same issue like in the first example). The technique can be useful when only one contiguous region is to be replaced once. Otherwise, one should use the Replace function. 

Example 4: Starting with SQL Server 2016 one can use dynamic masking for similar purposes, which should provide more flexibility in many cases. 

Example 5: Formatting phone number can be more complex than this because there can be different formats, country codes or any other special formats.

Example 6: Names can be quite complex, each part having different size.

Example 7: The example might work in certain cases, though it's enough for a field to contain the keyword "value" for the whole logic to generate errors. 

Example 8: The example is valid, though web addresses can use also other transfer protocols (ftp, ftps, sftp, scp, etc.).

Example 9: A more reliable method is splitting the string in parts and replacing the value from the nth position then inserting a text when the length of the values is not known.

Example 10: Even if the technique is valid, the example isn't entirely valid even if it might be met in practice. See 20th example below for a better approach. 

Prompt:
Give me 10 more scenarios with examples in which the Stuff function can be used in SQL Server.

Output:

/* Example 11: Reversing a string */

-- Reverse a string by replacing characters in a specific order.
SELECT STUFF(STUFF(STUFF('abcde', 1, 1, 'e'), 2, 1, 'd'), 3, 1, 'c') AS ReversedString; -- Result: 'edcba'

/* Example 12: Generating unique identifiers */

-- Modify parts of a GUID to create a custom identifier.
SELECT STUFF('123e4567-e89b-12d3-a456-426614174000', 1, 8, 'abcdef12') AS CustomGUID;-- Result: 'abcdef12-e89b-12d3-a456-426614174000'

/* Example 13: Combining columns */

-- Combine multiple columns into a single formatted string.
SELECT STUFF(FirstName + ' ' + LastName, LEN(FirstName) + 1, 0, ', ') AS FullName
FROM Employees;-- Result: 'John, Doe'

/* Example 14: Trimming strings */

-- Remove leading or trailing characters from a string.
SELECT STUFF('   Hello World   ', 1, 3, '') AS TrimmedString;-- Result: 'Hello World   '

/* Example 15: Creating abbreviations */

-- Generate abbreviations from a full string.
SELECT STUFF('North Atlantic Treaty Organization', 6, 22, 'ATO') AS Abbreviation;-- Result: 'NATO'

/* Example 16: Modifying JSON Data */

-- Insert or replace parts of a JSON string 
SELECT STUFF('{"name":"John","age":30}', 9, 4, 'Doe') AS ModifiedJSON;-- Result: '{"name":"Doe","age":30}'

/* Example 17: Handling XML Data */

-- Modify parts of an XML string.
SELECT STUFF('<name>John</name>', 7, 4, 'Doe') AS ModifiedXML;-- Result: '<name>Doe</name>'

/* Example 18: Generating random strings */

-- Insert random characters into a string.
SELECT STUFF('abcdef', 3, 0, CHAR(65 + ABS(CHECKSUM(NEWID())) % 26)) AS RandomString;-- Result: 'abXcdef' (X is a random character)

/* Example 19: Creating custom error messages  */

-- Insert dynamic values into error messages.
DECLARE @ErrorMessage NVARCHAR(100) = 'Error: Value {0} is not valid.';
SELECT STUFF(@ErrorMessage, CHARINDEX('{0}', @ErrorMessage), 3, '123') AS CustomErrorMessage;-- Result: 'Error: Value 123 is not valid.'

/* Example 20: Formatting dates */

-- Insert separators into a date string.
SELECT STUFF(STUFF('20250305', 5, 0, '-'), 8, 0, '-') AS FormattedDate; -- Result: '2025-03-05'

Example 11: Copilot got the use partially wrong given that each character except the pivoting character needs to be replaced accordingly. So for 6 characters one needs to replace the Stuff function 5 times!

Example 12:  Implementing custom GUID is a process more complex than this as one needs to take care of not generating duplicates.

Example 13: This is an example on how to handle changes dynamically.

Example 14: One should use the Trim function whenever possible, respectively the combination LTrim and RTrim, if Trim is not available (it was introduced in SQL 2017).

Example 15: The example is incorrect. One should consider the length of the string in the formula. 

Example 16:  One must know the values in advance, otherwise the example doesn't hold. Moreover, the same issue like in the first example occurs.

Example 17:  The example in not dynamic.

Example 18:  It's an interesting technique for generating "random" characters given that unique values are generated across a dataset.

Example 19: One can write error messages with multiple placeholders, though the Replace function is simpler to use. 

Example 20: It's easier to cast the value as datetime and apply the required formatting accordingly. Not testing whether the value is a date can lead to curious results.

I met some of the usages exemplified above, though I used the Stuff function seldom (see a previous post), when no other functions were available. Frankly, Copilot could prove to be a useful tool for learning SQL or other programming language in similar ways.

Happy coding!

Previous Post <<||>> Next Post 

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.