Showing posts with label observability. Show all posts
Showing posts with label observability. Show all posts

19 May 2025

#️⃣Software Engineering: Mea Culpa (Part VIII: A Look Beyond)

Software Engineering Series
Software Engineering Series

With AI on the verge, blogging and bloggers can easily become obsolete. Why bother navigating through the many blogs to get a broader perspective when the same can be obtained with AI? Just type in a prompt of the type "write a blogpost of 600 words on the importance of AI in society" and Copilot or any other similar AI agent will provide you an answer that may look much better than the first draft of most of the bloggers out there! It doesn't matter whether the text follows a well-articulated idea, a personal perspective or something creative! One gets an acceptable answer with a minimum of effort and that's what matters for many.

The results tend to increase in complexity the more models are assembled together, respectively the more uncontrolled are the experiments. Moreover, solutions that tend to work aren't necessarily optimal. Machines can't offer instant enlightenment or anything close to it. Though they have an incomparable processing power of retrieval, association, aggregation, segregation and/or iteration, which coupled with the vast amount of data, information and knowledge can generate anything in just a matter of seconds. Probably, the only area in which humans can compete with machines is creativity and wisdom, though how many will be able to leverage these at scale? Probably, machines have some characteristics that can be associated with these intrinsic human characteristics, though usually more likely the brute computational power will prevail.

At Microsoft Build, Satya Nadella mentioned that foundry encompasses already more than 1900 supported models. In theory, one can still evaluate and test such models adequately. What will happen when the scale increases with a few orders of magnitude? What will happen when for each person there are one or more personalized AI models? AI can help in many areas by generating and evaluating rapidly many plausible alternatives, though as soon the models deal with some kind of processing randomization, the chances for errors increase exponentially (at least in theory).

It's enough for one or more hallucinations or other unexpected behavior to lead to more unexpected behavior. No matter how well a model was tested, as long as there's no stable predictable mathematical model behind it, the chances for something to go wrong increase with the number of inputs, parameters, uses, or changes of context the model deals with. Unfortunately, all these aspects are seldom documented. It's not like using a formula and you know that given a set of inputs and operations, the result is the same. The evolving nature of such models makes them unpredictable in the long term. Therefore, there must always be a way to observe the changes occurring in models.

One of the important questions is how many errors can we afford in such models? How long it takes until errors impact each other to create effects comparable with a tornado. And what if the tornado increases in magnitude to the degree that it wrecks everything that crosses its path? What if multiple tornadoes join forces? How many tornadoes can destroy a field, a country or a continent? How many or big must be the tornadoes to trigger a warning?

Science-Fiction authors love to create apocalyptic scenarios, and all happens in just a few steps, respectively chapters. In nature, usually it takes many orders of magnitude to generate unpredictable behavior. But, as nature often reveals, unpredictable behavior does happen, probably more often than we expect and wish for. The more we are poking the bear, the higher the chances for something unexpected to happen! Do we really want this? What will be the price we must pay for progress?

Previous Post <<||>> Next Post

06 May 2006

🎯William Smith - Collected Quotes

"Achieving a gold standard for data quality at ingestion involves a multifaceted approach: defining explicit schemas and contracts, implementing rigorous input validation reflecting domain semantics, supporting immediate rejection or secure quarantine of low-quality data, and embedding these capabilities into high-throughput, low-latency pipelines. This first line of defense not only prevents downstream data pollution but also establishes an enterprise-wide culture and infrastructure aimed at preserving data trust from the point of entry onward." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)

"Accuracy denotes the degree to which data correctly represents the real-world entities or events to which it refers." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"At its core, data quality encompasses multiple dimensions-including accuracy, completeness, consistency, timeliness, validity, uniqueness, and relevance-that require rigorous assessment and control. The progression from traditional data management practices to cloud-native, real-time, and federated ecosystems introduces both challenges challenges and opportunities for embedding quality assurance seamlessly across the entire data value chain." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"At its core, observability rests on three fundamental pillars: metrics, logs, and traces. In the context of data systems, these pillars translate into quantitative measurements (such as data volume, processing latency, and schema changes), detailed event records (including data pipeline execution logs and error messages), and lineage traces that map the flow of data through interconnected processes. Together, they enable a granular and multidimensional understanding of data system behavior, facilitating not just detection but also rapid root-cause analysis." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Completeness refers to the extent to which required data attributes or records are present in a dataset." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Consistency signifies the absence of conflicting data within or across sources. As data ecosystems become distributed and federated, ensuring consistency transcends simple referential integrity checks."(William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Data drift refers to shifts in the statistical properties or distributions of incoming data compared to those observed during training or baseline establishment. Common variants include covariate drift (changes in feature distributions), prior probability drift (changes in class or label proportions), and concept drift (changes in the relationship between features and targets)." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Data governance establishes the overarching policies, standards, and strategic directives that define how data assets are to be managed across the enterprise. This top-level framework sets the boundaries of authority, compliance requirements, and key performance indicators for data quality." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Data Lakes embrace a schema-on-read approach, storing vast volumes of raw or lightly processed data in native formats with minimal upfront constraints. This design significantly enhances ingestion velocity and accommodates diverse, unstructured, or semi-structured datasets. However, enforcing data quality at scale becomes more complex, as traditional static constraints are absent." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Data mesh fundamentally reframes data governance and validation by distributing accountability to domain-oriented teams who act as custodians and producers of their respective data products. These teams possess intimate domain knowledge, which is essential for nuanced validation criteria that adapt to the semantics, context, and evolution of their datasets. By treating datasets as first-class products with clear ownership, interfaces, and service-level objectives, data mesh encourages autonomous validation workflows embedded directly within the domains where data originates and is consumed." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)

"Data quality insights generated through automated profiling and baseline analysis are only as valuable as their visibility and actionability within the broader organizational decision-making context." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Data quality verification, when executed as a set of static, invariant rules, often fails to accommodate the inherent fluidity of real-world datasets and evolving analytical contexts. To ensure robustness and relevance, quality checks must evolve beyond static constraints, incorporating adaptability driven by metadata, runtime information, and domain-specific business logic. This transformation enables the development of dynamic and context-aware validation systems capable of offering intelligent, self-tuning quality enforcement with reduced false positives and operational noise." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Effective management of data quality at scale requires a clear delineation of organizational roles and operational frameworks that ensure accountability, consistency, and continuous improvement. Central to this structure are the interrelated concepts of data governance, data stewardship, and operational ownership. Each serves distinct, yet complementary purposes in embedding responsibility within technology platforms, business processes, and organizational culture." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Establishing a comprehensive observability architecture necessitates a systematic approach that spans the entirety of the data pipeline, from initial telemetry collection to actionable insights accessible by diverse stakeholders. The core objective is to unify distributed data sources - metrics, logs, traces, and quality signals - into a coherent framework that enables rapid diagnosis, continuous monitoring, and strategic decision-making." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Governance sets the strategic framework, stewardship bridges strategy with execution, and operational ownership grounds responsibility within systems and processes. Advanced organizations achieve sustainable data quality by establishing clear roles, defined escalation channels, embedded tooling, standardized processes, and a culture that prioritizes data excellence as a collective, enforceable mandate." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)  

"Modern complex organizations increasingly confront the challenge of ensuring data quality at scale without centralizing validation activities into a single bottlenecked team. The data mesh paradigm and federated controls emerge as pivotal architectural styles and organizational patterns that enable decentralized, self-serve data quality validation while preserving coherence and reliability across diverse data products." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)

"Observability [...] requires that systems be instrumented to expose rich telemetry, enabling ad hoc exploration and hypothesis testing regarding system health. Thus, observability demands design considerations at the architecture level, insisting on standardization of instrumentation, consistent metadata management, and tight integration across data processing, storage, and orchestration layers." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Quality gates embody a comprehensive strategy for continuous data assurance by enforcing hierarchical checks, asserting dynamic SLAs, and automating compliance decisions grounded in explicit policies. Their architecture and operationalization directly address the complex interplay between technical robustness and regulatory compliance, ensuring that only trusted data permeates downstream systems." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Robust access control forms the cornerstone of observability system security. At the core lies the principle of least privilege, wherein users and service identities are granted the minimal set of permissions required to perform their designated tasks. This principle substantially reduces the attack surface by minimizing unnecessary access and potential lateral movement paths within the system. Implementing least privilege necessitates fine-grained role-based access control (RBAC) models tailored to organizational roles and operational workflows. RBAC configurations should be explicit regarding the scopes and data domains accessible to each role, avoiding overly broad privileges." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Relevance gauges the appropriateness of data for the given analytical or business context. Irrelevant data, though possibly accurate and complete, can introduce noise and degrade model performance or decision quality." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Robust methodologies to measure and prioritize data quality dimensions involve composite metrics and scoring systems that combine quantitative indicators-such as error rates, completeness percentages, latency distributions-with qualitative assessments from domain experts." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"The architecture of a robust data quality framework hinges fundamentally on three interconnected pillars: open standards, extensible application programming interfaces (APIs), and interoperable protocols. These pillars collectively enable the seamless exchange, validation, and enhancement of data across diverse platforms and organizational boundaries." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"The data swamp anti-pattern arises from indiscriminate ingestion of uncurated data, which rapidly dilutes data warehouse utility and complicates quality monitoring." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"The selection of KPIs should be driven by a rigorous alignment with business objectives and user requirements. This mandates close collaboration with stakeholders spanning data scientists, operations teams, compliance officers, and executive sponsors." " (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Timeliness captures the degree to which data is available when needed and reflects the relevant time frame of the underlying phenomena." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Uniqueness ensures that each entity or event is captured once and only once, preventing duplication that can distort analysis and decision-making." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025

"Validity reflects whether data conforms to the syntactic and semantic rules predefined for its domain." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.