22 May 2006

🖋️Vasily Pantyukhin - Collected Quotes

"Encoding is called redundant when different visual channels are used to represent the same information. Redundant encoding is an efficient trick that helps to understand information from diagrams faster, easier, and more accurately. […] To decode information easier, align it with the reality in perspective of both the physical world and cultural conventions. Some things have particular colors, are larger or heavier than other, or are associated with the specific place. If your encoding is not compatible with these properties, readers may wonder why things do not look like they are expected to. Consequently, their auditory is forced to spend extra efforts decoding." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"In diagramming, function has to be first. Facts and logical arguments are essential to explain the idea. However, stylish and esthetically attractive diagrams do that job even better. An additional emotional channel of information perception reinforces the total effect on sharing the designer’s personal experience, enthusiasm, and solution elegance. Of course, functions and emotions must be balanced. Too much decoration makes diagrams excessively noisy. When we make cold minimalistic diagrams, we decline the extra possibility to utilize redundant explanatory channels." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015) 

"To keep accuracy and efficiency of your diagrams appealing to a potential audience, explicitly describe the encoding principles we used. Titles, labels, and legends are the most common ways to define the meaning of the diagram and its elements." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"Upon discovering a visual image, the brain analyzes it in terms of primitive shapes and colors. Next, unity contours and connections are formed. As well, distinct variations are segmented. Finally, the mind attracts active attention to the significant things it found. That process is permanently running to react to similarities and dissimilarities in shapes, positions, rhythms, colors, and behavior. It can reveal patterns and pattern-violations among the hundreds of data values. That natural ability is the most important thing used in diagramming." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

"Usually, diagrams contain some noise – information unrelated to the diagram’s primary goal. Noise is decorations, redundant, and irrelevant data, unnecessarily emphasized and ambiguous icons, symbols, lines, grids, or labels. Every unnecessary element draws attention away from the central idea that the designer is trying to share. Noise reduces clarity by hiding useful information in a fog of useless data. You may quickly identify noise elements if you can remove them from the diagram or make them less intense and attractive without compromising the function." (Vasily Pantyukhin, "Principles of Design Diagramming", 2015)

16 May 2006

🖋️Jesús Barrasa - Collected Quotes

"A taxonomy is a classification scheme that organizes categories in a broader-narrower hierarchy. Items that share similar qualities are grouped into the same category, and the taxonomy provides a global organization by relating categories to one another." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"AI is intended to create systems for making probabilistic decisions, similar to the way humans make decisions. […] Today’s AI is not very able to generalize. Instead, it is effective for specific, well-defined tasks. It struggles with ambiguity and mostly lacks transfer learning that humans take for granted. For AI to make humanlike decisions that are more situationally appropriate, it needs to incorporate context." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Data architects often turn to graphs because they are flexible enough to accommodate multiple heterogeneous representations of the same entities as described by each of the source systems. With a graph, it is possible to associate underlying records incrementally as data is discovered. There is no need for big, up-front design, which serves only to hamper business agility. This is important because data fabric integration is not a one-off effort and a graph model remains flexible over the lifetime of the data domains." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Data fabrics are general-purpose, organization-wide data access interfaces that offer a connected view of the integrated domains by combining data stored in a local graph with data retrieved on demand from third-party systems. Their job is to provide a sophisticated index and integration points so that they can curate data across silos, offering consistent capabilities regardless of the underlying store (which might or might not be graph based) […]." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Despite their predictive power, most analytics and data science practices ignore relationships because it has been historically challenging to process them at scale." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Graph data models are uniquely able to represent complex, indirect relationships in a way that is both human readable, and machine friendly. Data structures like graphs might seem computerish and off-putting, but in reality they are created from very simple primitives and patterns. The combination of a humane data model and ease of algorithmic processing to discover otherwise hidden patterns and characteristics is what has made graphs so popular." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"In an era of machine learning, where data is likely to be used to train AI, getting quality and governance under control is a business imperative. Failing to govern data surfaces problems late, often at the point closest to users (for example, by giving harmful guidance), and hinders explainability (garbage data in, machine-learned garbage out)." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Knowledge graphs are a specific type of graph with an emphasis on contextual understanding. Knowledge graphs are interlinked sets of facts that describe real-world entities, events, or things and their interrelations in a human- and machine-understandable format." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"[…] knowledge graphs are useful because they provide contextualized understanding of data. They achieve this by adding a layer of metadata that imposes rules for structure and interpretation." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Knowledge graphs use an organizing principle so that a user (or a computer system) can reason about the underlying data. The organizing principle gives us an additional layer of organizing data (metadata) that adds connected context to support reasoning and knowledge discovery. […] Importantly, some processing can be done without knowledge of the domain, just by leveraging the features of the property graph model (the organizing principle)." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Many AI systems employ heuristic decision making, which uses a strategy to find the most likely correct decision to avoid the high cost (time) of processing lots of information. We can think of those heuristics as shortcuts or rules of thumb that we would use to make fast decisions." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Understanding the entire data ecosystem, from the production of a data point to its consumption in a dashboard or a visualization, provides the ability to invoke action, which is more valuable than the mere sum of its parts." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"We think of context as the network surrounding a data point of interest that is relevant to a specific AI system. […] AI benefits greatly from context to enable probabilistic decision making for real-time answers, handle adjacent scenarios for broader applicability, and be maximally relevant to a given situation. But all systems, including AI, are only as good as their inputs." (Jesús Barrasa et al, "Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

13 May 2006

🖋️George Siemens - Collected Quotes

"An ecology provides the special formations needed by organizations. Ecologies are: loose, free, dynamic, adaptable, messy, and chaotic. Innovation does not arise through hierarchies. As a function of creativity, innovation requires trust, openness, and a spirit of experimentation - where random ideas and thoughts can collide for re-creation." (George Siemens, "Knowing Knowledge", 2006)

"Change pressures arise from different sectors of a system. At times it is mandated from the top of a hierarchy, other times it forms from participants at a grass-roots level. Some changes are absorbed by the organization without significant impact on, or alterations of, existing methods. In other cases, change takes root. It causes the formation of new methods (how things are done and what is possible) within the organization." (George Siemens, "Knowing Knowledge", 2006)

"Complexity and diversity results in specialized nodes (a single entity can no longer know all required elements). The act of knowledge growth and learning involves connected specialized nodes." (George Siemens, "Knowing Knowledge", 2006)

"Connections create structures. Structures do not create (though they may facilitate) connections. Our approaches today reflect this error in thinking. We have tried to do the wrong thing first with knowledge. We determine that we will have a certification before we determine what it is that we want to certify. We need to enable the growth of connections and observe the structures that emerge." (George Siemens, "Knowing Knowledge", 2006)

"Context is not as simple as being in a different space [...] context includes elements like our emotions, recent experiences, beliefs, and the surrounding environment - each element possesses attributes, that when considered in a certain light, informs what is possible in the discussion." (George Siemens, "Knowing Knowledge", 2006)

"Knowledge flow can be likened to a river that meanders through the ecology of an organization. In certain areas, the river pools and in other areas it ebbs. The health of the learning ecology of the organization depends on effective nurturing of flow." (George Siemens, "Knowing Knowledge", 2006)

"Learning is a multi-faceted, integrated process where changes with any one element alters the larger network. Knowledge is subject to the nuances of complex, adaptive systems." (George Siemens, "Knowing Knowledge", 2006)

"Hierarchy adapts knowledge to the organization; a network adapts the organization to the knowledge." (George Siemens, "Knowing Knowledge", 2006)

"Learning is the process of creating networks. Nodes are external entities which we can use to form a network. Or nodes may be people, organizations, libraries, web sites, books, journals, database, or any other source of information. The act of learning (things become a bit tricky here) is one of creating an external network of nodes - where we connect and form information and knowledge sources. The learning that happens in our heads is an internal network (neural). Learning networks can then be perceived as structures that we create in order to stay current and continually acquire, experience, create, and connect new knowledge (external). And learning networks can be perceived as structures that exist within our minds (internal) in connecting and creating patterns of understanding." (George Siemens, "Knowing Knowledge", 2006)

"Nodes and connectors comprise the structure of a network. In contrast, an ecology is a living organism. It influences the formation of the network itself." (George Siemens, "Knowing Knowledge", 2006)

"Our pre-conceived structures of interpreting knowledge sometimes interfere with new knowledge." (George Siemens, "Knowing Knowledge", 2006)

"When we focus on designing ecologies in which people can forage for knowledge, we are less concerned about communicating the minutiae of changing knowledge. Instead, we are creating the conduit through which knowledge will flow." (George Siemens, "Knowing Knowledge", 2006)

06 May 2006

🎯William Smith - Collected Quotes

"Achieving a gold standard for data quality at ingestion involves a multifaceted approach: defining explicit schemas and contracts, implementing rigorous input validation reflecting domain semantics, supporting immediate rejection or secure quarantine of low-quality data, and embedding these capabilities into high-throughput, low-latency pipelines. This first line of defense not only prevents downstream data pollution but also establishes an enterprise-wide culture and infrastructure aimed at preserving data trust from the point of entry onward." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)

"Accuracy denotes the degree to which data correctly represents the real-world entities or events to which it refers." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"At its core, data quality encompasses multiple dimensions-including accuracy, completeness, consistency, timeliness, validity, uniqueness, and relevance-that require rigorous assessment and control. The progression from traditional data management practices to cloud-native, real-time, and federated ecosystems introduces both challenges challenges and opportunities for embedding quality assurance seamlessly across the entire data value chain." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"At its core, observability rests on three fundamental pillars: metrics, logs, and traces. In the context of data systems, these pillars translate into quantitative measurements (such as data volume, processing latency, and schema changes), detailed event records (including data pipeline execution logs and error messages), and lineage traces that map the flow of data through interconnected processes. Together, they enable a granular and multidimensional understanding of data system behavior, facilitating not just detection but also rapid root-cause analysis." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Completeness refers to the extent to which required data attributes or records are present in a dataset." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Consistency signifies the absence of conflicting data within or across sources. As data ecosystems become distributed and federated, ensuring consistency transcends simple referential integrity checks."(William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Data drift refers to shifts in the statistical properties or distributions of incoming data compared to those observed during training or baseline establishment. Common variants include covariate drift (changes in feature distributions), prior probability drift (changes in class or label proportions), and concept drift (changes in the relationship between features and targets)." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Data governance establishes the overarching policies, standards, and strategic directives that define how data assets are to be managed across the enterprise. This top-level framework sets the boundaries of authority, compliance requirements, and key performance indicators for data quality." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Data Lakes embrace a schema-on-read approach, storing vast volumes of raw or lightly processed data in native formats with minimal upfront constraints. This design significantly enhances ingestion velocity and accommodates diverse, unstructured, or semi-structured datasets. However, enforcing data quality at scale becomes more complex, as traditional static constraints are absent." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Data mesh fundamentally reframes data governance and validation by distributing accountability to domain-oriented teams who act as custodians and producers of their respective data products. These teams possess intimate domain knowledge, which is essential for nuanced validation criteria that adapt to the semantics, context, and evolution of their datasets. By treating datasets as first-class products with clear ownership, interfaces, and service-level objectives, data mesh encourages autonomous validation workflows embedded directly within the domains where data originates and is consumed." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)

"Data quality insights generated through automated profiling and baseline analysis are only as valuable as their visibility and actionability within the broader organizational decision-making context." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Data quality verification, when executed as a set of static, invariant rules, often fails to accommodate the inherent fluidity of real-world datasets and evolving analytical contexts. To ensure robustness and relevance, quality checks must evolve beyond static constraints, incorporating adaptability driven by metadata, runtime information, and domain-specific business logic. This transformation enables the development of dynamic and context-aware validation systems capable of offering intelligent, self-tuning quality enforcement with reduced false positives and operational noise." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Effective management of data quality at scale requires a clear delineation of organizational roles and operational frameworks that ensure accountability, consistency, and continuous improvement. Central to this structure are the interrelated concepts of data governance, data stewardship, and operational ownership. Each serves distinct, yet complementary purposes in embedding responsibility within technology platforms, business processes, and organizational culture." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Establishing a comprehensive observability architecture necessitates a systematic approach that spans the entirety of the data pipeline, from initial telemetry collection to actionable insights accessible by diverse stakeholders. The core objective is to unify distributed data sources - metrics, logs, traces, and quality signals - into a coherent framework that enables rapid diagnosis, continuous monitoring, and strategic decision-making." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Governance sets the strategic framework, stewardship bridges strategy with execution, and operational ownership grounds responsibility within systems and processes. Advanced organizations achieve sustainable data quality by establishing clear roles, defined escalation channels, embedded tooling, standardized processes, and a culture that prioritizes data excellence as a collective, enforceable mandate." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)  

"Modern complex organizations increasingly confront the challenge of ensuring data quality at scale without centralizing validation activities into a single bottlenecked team. The data mesh paradigm and federated controls emerge as pivotal architectural styles and organizational patterns that enable decentralized, self-serve data quality validation while preserving coherence and reliability across diverse data products." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025)

"Observability [...] requires that systems be instrumented to expose rich telemetry, enabling ad hoc exploration and hypothesis testing regarding system health. Thus, observability demands design considerations at the architecture level, insisting on standardization of instrumentation, consistent metadata management, and tight integration across data processing, storage, and orchestration layers." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Quality gates embody a comprehensive strategy for continuous data assurance by enforcing hierarchical checks, asserting dynamic SLAs, and automating compliance decisions grounded in explicit policies. Their architecture and operationalization directly address the complex interplay between technical robustness and regulatory compliance, ensuring that only trusted data permeates downstream systems." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Robust access control forms the cornerstone of observability system security. At the core lies the principle of least privilege, wherein users and service identities are granted the minimal set of permissions required to perform their designated tasks. This principle substantially reduces the attack surface by minimizing unnecessary access and potential lateral movement paths within the system. Implementing least privilege necessitates fine-grained role-based access control (RBAC) models tailored to organizational roles and operational workflows. RBAC configurations should be explicit regarding the scopes and data domains accessible to each role, avoiding overly broad privileges." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Relevance gauges the appropriateness of data for the given analytical or business context. Irrelevant data, though possibly accurate and complete, can introduce noise and degrade model performance or decision quality." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"Robust methodologies to measure and prioritize data quality dimensions involve composite metrics and scoring systems that combine quantitative indicators-such as error rates, completeness percentages, latency distributions-with qualitative assessments from domain experts." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"The architecture of a robust data quality framework hinges fundamentally on three interconnected pillars: open standards, extensible application programming interfaces (APIs), and interoperable protocols. These pillars collectively enable the seamless exchange, validation, and enhancement of data across diverse platforms and organizational boundaries." (William Smith, "Great Expectations for Modern Data Quality: The Complete Guide for Developers and Engineers", 2025) 

"The data swamp anti-pattern arises from indiscriminate ingestion of uncurated data, which rapidly dilutes data warehouse utility and complicates quality monitoring." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"The selection of KPIs should be driven by a rigorous alignment with business objectives and user requirements. This mandates close collaboration with stakeholders spanning data scientists, operations teams, compliance officers, and executive sponsors." " (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Timeliness captures the degree to which data is available when needed and reflects the relevant time frame of the underlying phenomena." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

"Uniqueness ensures that each entity or event is captured once and only once, preventing duplication that can distort analysis and decision-making." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025

"Validity reflects whether data conforms to the syntactic and semantic rules predefined for its domain." (William Smith, "Soda Core for Modern Data Quality and Observability: The Complete Guide for Developers and Engineers", 2025)

04 May 2006

Programming: Array (Definitions)

"A group of cells arranged by dimensions. A table is a two-dimensional array in which the cells are arranged in rows and columns, with one dimension forming the rows and the other dimension forming the columns. A cube is a three-dimensional array and can be visualized as a cube, with each dimension of the array forming one edge of the cube." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A collection of objects all of the same type." (Jesse Liberty, "Sams Teach Yourself C++ in 24 Hours 3rd Ed.", 2001)

"A list of variables that have the same name and data type." (Greg Perry, "Sams Teach Yourself Beginning Programming in 24 Hours" 2nd Ed., 2001)

"Values whose members, called elements, are accessed by an index rather than by name. An array has a rank that specifies the number of indices needed to locate an element (sometimes called the number of dimensions) within the array. It may have either zero or nonzero lower bounds in each dimension." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

"A collection of data items, all of the same type, in which each item is uniquely addressed by a 32-bit integer index. Java arrays behave like objects but have some special syntax. Java arrays begin with the index value 0." (Marcus Green & Bill Brogden, "Java 2™ Programmer Exam Cram™ 2 (Exam CX-310-035)", 2003)

"A device that aggregates large collections of hard drives into a logical whole." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"An arithmetically derived matrix or table of rows and columns that is used to impose an order for efficient experimentation. The rows contain the individual experiments. The columns contain the experimental factors and their individual levels or set points." (Clyde M Creveling, "Six Sigma for Technical Processes: An Overview for R Executives, Technical Leaders, and Engineering Managers", 2006)

"A data structure containing an ordered list of elements - any Ruby object - starting with an index of 0. Compare hash." (Michael Fitzgerald, "Learning Ruby", 2007)

"An arithmetically derived matrix or table of rows and columns that is used to impose an order for efficient experimentation. The rows contain the individual experiments. The columns contain the experimental factors and their individual levels or set points." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"In a SQL database, an ordered collection of elements of the same data type stored in a single column and row of a table." (Jan L Harrington, "SQL Clearly Explained 3rd Ed. ", 2010)

"A group of values stored together in a single variable and accessed by index." (Rod Stephens, "Stephens' Visual Basic® Programming 24-Hour Trainer", 2011)

"A grouping of similar items of the same storage type in a sequential pattern, and referenced by a sequential index value." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A variable that holds a series of values with the same data type. An index into the array lets the program select a particular value." (Rod Stephens, "Start Here!™ Fundamentals of Microsoft® .NET Programming", 2011)

"A basic collection of values that is a sequence represented by a single block of memory. Arrays have efficient direct access, but do not easily grow or shrink." (Mark C Lewis, "Introduction to the Art of Programming Using Scala", 2012)

"An ordered sequence of values, stored such that you can easily access any of the values using an integer subscript that specifies the value’s offset in the sequence." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"A group of variables stored under a single name." (Matt Telles, "Beginning Programming", 2014)

"A structure composed of multiple identical variables that can be individually addressed." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A structure that contains an ordered collection of elements of the same data type in which each element can be referenced by its index value or ordinal position in the collection." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.