Showing posts with label dimensions. Show all posts
Showing posts with label dimensions. Show all posts

05 May 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 179: How AI Can Detect Interactions Between Multiple Variables in Complex Patterns)


Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on how AI can detect interactions between multiple variables in complex patterns"

Introduction

One of the most powerful capabilities of modern Artificial Intelligence (AI) is its ability to detect interactions between multiple variables - interactions that are subtle, nonlinear, and often invisible to human intuition. Humans are good at spotting simple relationships: when one variable increases, another tends to rise or fall. But real‑world systems rarely behave so cleanly. Instead, outcomes often emerge from the interplay of many factors acting together, sometimes reinforcing each other, sometimes canceling each other out, and sometimes producing effects that only appear under very specific conditions. AI excels in precisely this territory. Its architecture allows it to uncover complex, multi‑variable interactions that would otherwise remain hidden.

The first reason AI can detect these interactions is its ability to analyze high‑dimensional data without cognitive limits. Humans can reason about two or three variables at a time, but beyond that, our intuition collapses. AI systems, especially deep learning models, can process hundreds or thousands of variables simultaneously. They can map how changes in one variable influence another, not in isolation, but in combination with many others. This is essential in fields like genomics, where the effect of a single gene may depend on the presence of dozens of others, or in economics, where market behavior emerges from the interplay of countless signals.

A second advantage lies in AI’s capacity to model nonlinear relationships. Interactions between variables are rarely linear. The effect of one variable may depend on the level of another, creating curved, threshold‑based, or conditional relationships. Traditional statistical methods often struggle with these nonlinearities unless explicitly instructed to look for them. AI models, by contrast, naturally capture nonlinear interactions through their layered structure. Neural networks, for example, learn complex transformations at each layer, allowing them to detect relationships that bend, twist, or reverse depending on context. This flexibility enables AI to uncover interactions that humans would never think to test.

Another key factor is AI’s ability to detect higher‑order interactions - relationships that involve not just pairs of variables, but combinations of three, four, or more. These higher‑order interactions are common in complex systems. For example, a medical treatment might be effective only when a patient has a specific genetic profile and a particular environmental exposure and a certain lifestyle pattern. Humans rarely detect such interactions because they require examining an enormous number of possible combinations. AI, however, can explore these combinations efficiently, identifying the rare configurations that produce meaningful effects.

AI also excels at local pattern detection, which is crucial for identifying interactions that appear only under specific conditions. Humans tend to look for global rules that apply everywhere. AI can break a dataset into many small regions and learn different relationships in each one. A variable might matter only when another variable crosses a certain threshold, or only within a particular subgroup. Models like decision trees, random forests, and gradient boosting machines are particularly good at uncovering these conditional interactions. They reveal patterns that are invisible when looking at the dataset as a whole.

A further strength comes from AI’s ability to integrate heterogeneous data sources. Interactions often span different types of information - numerical measurements, text, images, signals, or categorical variables. Humans struggle to combine such diverse inputs. AI systems, however, can fuse them into a unified representation, allowing interactions to emerge across modalities. This is especially valuable in fields like healthcare, where symptoms, lab results, imaging data, and patient history interact in complex ways.

Finally, AI’s ability to detect multi‑variable interactions is amplified by continuous learning. As new data arrives, AI systems can update their internal models, refining their understanding of how variables interact. This dynamic adaptation allows them to track evolving systems where interactions shift over time.

AI’s ability to detect interactions between multiple variables is not a replacement for human insight. Instead, it expands our analytical reach, revealing structures that lie beyond the limits of intuition. When humans and AI collaborate - combining human judgment with machine‑level pattern detection - we gain a deeper, more accurate understanding of the complex systems that shape our world.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post 


04 May 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 178: How AI Can Detect Curved Relationships in Complex Patterns)

Prompt Engineering Series
Prompt Engineering Series

Prompt: "write a post of 600 words on how AI can detect curved relationships in complex patterns"

Introduction

One of the most intriguing strengths of modern Artificial Intelligence (AI) is its ability to detect curved, nonlinear relationships hidden inside complex datasets. Humans are naturally inclined toward linear thinking. We look for straight‑line connections: more of X leads to more of Y, or less of X leads to less of Y. This bias toward linearity is cognitively efficient, but it blinds us to the deeper structure of many real‑world systems. In nature, economics, biology, psychology, and technology, relationships often bend, twist, and loop in ways that defy simple intuition. AI, however, is uniquely equipped to uncover these curved patterns - relationships that change direction, accelerate, plateau, or reverse depending on context.

The first reason AI can detect curved relationships is its ability to model nonlinear functions directly. Traditional statistical tools often assume linearity unless explicitly told otherwise. AI models, especially neural networks, do the opposite: they assume nothing. Their architecture allows them to approximate any function - straight, curved, or wildly irregular - by adjusting internal parameters. This flexibility enables AI to capture relationships that humans overlook because they do not fit our mental templates. For example, a medical variable might increase risk up to a point and then decrease it beyond that threshold. A human analyst might miss this U‑shaped curve, but an AI model can detect it naturally.

A second advantage lies in AI’s capacity to explore high‑dimensional interactions. Curved relationships often emerge only when multiple variables interact. A single variable may appear to have no meaningful effect, but when combined with two or three others, a curved pattern suddenly becomes visible. Humans struggle to visualize relationships beyond two dimensions. AI systems, by contrast, can analyze hundreds of variables simultaneously, mapping how they bend and twist together. This is particularly valuable in fields like genomics, where the effect of one gene may depend on the presence or absence of many others, creating curved interactions that only appear in high‑dimensional space.

Another key factor is AI’s ability to detect local patterns rather than forcing global assumptions. Humans tend to look for one overarching rule that explains everything. AI models can break a dataset into many small regions and learn different relationships in each one. A relationship might be linear in one region, curved in another, and flat in a third. Decision trees, random forests, and gradient boosting machines excel at this kind of local pattern detection. They can identify subtle bends in the data that only appear under specific conditions. This ability to adapt to local curvature allows AI to uncover patterns that would otherwise remain hidden.

AI also benefits from its capacity to learn from noise rather than be overwhelmed by it. Curved relationships are often subtle, emerging only after filtering out randomness. Humans tend to see noise as a distraction; AI treats it as part of the landscape. By analyzing massive datasets, AI can distinguish between random fluctuations and genuine curvature. This is essential in fields like climate science, where long‑term curved trends are buried beneath short‑term variability.

Finally, AI’s ability to detect curved relationships is strengthened by continuous learning and iterative refinement. As new data arrives, AI systems can update their internal models, refining the shape of the relationships they detect. Curved patterns often evolve over time - markets shift, ecosystems adapt, diseases mutate. Humans struggle to update their mental models quickly. AI can adjust in real time, capturing new bends and inflection points as they emerge.

The ability of AI to detect curved relationships in complex patterns is not merely a technical achievement. It expands our understanding of the world, revealing structures that lie beyond the reach of intuition. When humans and AI work together - combining human insight with machine‑level pattern detection - we gain a richer, more accurate view of the systems that shape our lives.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

03 May 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 177: How AI Can Detect Multi‑Step Relationships That Span Many Variables)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on how AI can detect multi‑step relationships that span many variables"

Introduction

One of the most remarkable capabilities of modern Artificial Intelligence (AI) is its ability to uncover multi‑step relationships that span across many variables - patterns so complex, layered, and indirect that they remain invisible to human reasoning. Humans excel at intuitive leaps and conceptual understanding, but we struggle when relationships unfold across multiple dimensions, involve subtle dependencies, or require tracking interactions over long chains of influence. AI, by contrast, thrives in precisely these environments. Its computational architecture allows it to map relationships that are not linear, not obvious, and not easily captured by traditional analytical methods.

The first reason AI can detect multi‑step relationships is its ability to operate in high‑dimensional spaces. Humans can mentally juggle only a handful of variables at once before cognitive overload sets in. But many real‑world systems - biological networks, financial markets, climate dynamics - operate across hundreds or thousands of interacting factors. AI models, especially deep neural networks, are designed to navigate these high‑dimensional landscapes. They can track how small changes in one variable propagate through a chain of dependencies, eventually influencing outcomes far removed from the original cause. This ability to follow long causal threads is something human intuition simply cannot replicate at scale.

A second advantage lies in AI’s capacity to model nonlinear interactions. Multi‑step relationships often involve nonlinearities: effects that amplify, dampen, or transform as they move through a system. Humans tend to assume linearity because it is cognitively simple. AI does not make this assumption. Neural networks, decision trees, and attention‑based architectures can capture nonlinear transformations at every layer. This allows AI to detect relationships where the influence of one variable depends on the state of several others - patterns that only emerge when multiple conditions align in specific ways.

Another key factor is AI’s ability to learn hierarchical representations. Deep learning models build understanding layer by layer. Early layers detect simple features; deeper layers combine these features into more abstract concepts. This hierarchical structure mirrors the multi‑step nature of complex relationships. For example, in medical diagnostics, an AI system might first detect subtle biomarkers, then combine them into intermediate patterns, and finally infer a higher‑level diagnosis. Each step builds on the previous one, allowing the model to trace relationships that unfold across multiple conceptual levels.

AI also excels at temporal reasoning, which is essential for detecting multi‑step relationships that evolve over time. Recurrent neural networks, transformers, and sequence models can track dependencies across long time horizons. They can identify how an event today influences outcomes weeks or months later, even when the connection is indirect. This is particularly valuable in fields like supply chain forecasting, epidemiology, and macroeconomic modeling, where delayed effects are the norm rather than the exception.

A further strength comes from AI’s ability to integrate heterogeneous data sources. Multi‑step relationships often span different types of information - numerical data, text, images, signals, or categorical variables. Humans struggle to synthesize such diverse inputs. AI systems, however, can fuse them into a unified representation. This multimodal integration allows AI to detect relationships that cross boundaries between data types, revealing patterns that would remain hidden if each source were analyzed in isolation.

Finally, AI’s ability to detect multi‑step relationships is amplified by continuous learning and iterative refinement. As new data arrives, AI systems can update their internal models, strengthening or revising the relationships they have inferred. This dynamic adaptation allows them to track evolving systems where relationships shift over time. Humans, by contrast, tend to cling to outdated mental models, even when the underlying reality has changed.

AI’s ability to detect multi‑step relationships across many variables is not a replacement for human judgment. Instead, it expands our analytical reach, revealing structures that lie beyond the limits of intuition. When humans and AI collaborate - combining human understanding with machine‑level pattern detection - we gain a deeper, more accurate view of the complex systems that shape our world.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

19 April 2026

🤖〽️Prompt Engineering: Copilot Unabridged (Part 167: How AI Can Generalize Beyond Its Training Examples)

Prompt Engineering Series
Prompt Engineering Series


Prompt: "write a post of 600 words on how AI can generalize beyond its training examples"

Introduction

Generalization is the defining feature of modern artificial intelligence. A model trained on finite data can answer questions it has never encountered, solve problems it was never explicitly taught, and synthesize ideas that do not appear anywhere in its training set. This is not accidental - it is the result of architecture, optimization, scale, and the mathematical structure of learning. Understanding how Artificial Intelligence (AI) generalizes reveals why these systems are so powerful and why they sometimes fail.

1. AI Learns Abstract Patterns, Not Just Examples

Traditional views of machine learning assumed that models interpolate within the boundaries of their training data. But large‑scale models behave differently. They learn abstract structures that allow them to infer rules rather than memorize instances. As one analysis notes, modern models 'do not memorize. They abstract… They infer… They move beyond the dataset'.

This abstraction allows AI to respond meaningfully to prompts it has never seen before.

2. High‑Dimensional Representations Enable Flexible Reasoning

AI models encode information as vectors in high‑dimensional spaces. These representations capture subtle relationships between concepts, enabling the model to:

  • Recognize analogies
  • Infer missing information
  • Map new inputs onto learned structures

This geometric structure is what allows models like CLIP to classify images into categories they were never explicitly trained on - a phenomenon known as zero‑shot generalization. 

3. Optimization Drives Models Toward General Solutions

Generalization is not just a byproduct of data; it emerges from the optimization process itself. Research on 'grokking' shows that models may initially memorize training examples but later undergo a sudden shift, discovering the underlying algorithmic structure and generalizing perfectly - even without new data. 

This demonstrates that training dynamics can push models toward deeper understanding.

4. Scale Expands the Model’s Capacity to Generalize

Large models trained on diverse datasets develop internal mechanisms that support in‑context learning - the ability to learn new tasks from a few examples provided at inference time. This capability emerges even when the model is trained only on next‑token prediction. 

Scale allows the model to encode broad patterns that can be recombined in novel ways.

5. Reinforcement Learning Encourages Adaptation to New Situations

Generalization is not limited to language models. Reinforcement learning (RL) systems can learn policies that adapt to new environments. Studies show that RL agents trained in one set of conditions can perform well in different, previously unseen conditions - such as navigating new physical layouts or adjusting industrial control settings. 

RL encourages models to learn strategies rather than rote responses.

6. Statistical Learning Theory Provides the Foundations

Generalization is grounded in the principles of statistical learning theory, which explains how models can perform well on unseen data by learning underlying patterns rather than memorizing noise. Concepts like bias‑variance tradeoff, risk minimization, and model complexity help explain why some models generalize better than others. 

These foundations guide how models are designed and evaluated.

7. Diverse Training Data Expands the Model’s Conceptual Space

The broader and more varied the training data, the more robust the model’s generalization. Exposure to diverse linguistic styles, cultural contexts, and problem types allows the model to build flexible representations that transfer across domains.

This is why large, heterogeneous datasets are essential for modern AI performance.

Closing Statement

AI can generalize beyond its training examples because it learns abstract patterns, builds high‑dimensional representations, and develops internal mechanisms that support inference rather than memorization. Through scale, optimization, reinforcement learning, and diverse data, modern models can handle novel tasks, interpret unfamiliar inputs, and synthesize new ideas. Generalization is not a side effect - it is the core of what makes AI powerful. Understanding how it works helps us design systems that are more reliable, more capable, and more aligned with human needs.

Disclaimer: The whole text was generated by Copilot (under Windows 11) at the first attempt. This is just an experiment to evaluate feature's ability to answer standard general questions, independently on whether they are correctly or incorrectly posed. Moreover, the answers may reflect hallucinations and other types of inconsistent or incorrect reasoning.

Previous Post <<||>> Next Post

23 March 2025

💫🗒️ERP Systems: Microsoft Dynamics 365's Financial Tags [Notes]

Disclaimer: This is work in progress intended to consolidate information from the various sources and not to provide a complete overview of all the features. Please refer to the documentation for a complete overview!

Last updated: 23-Mar-2025

[Dynamics 365] Financial Tags

  • {def} user-defined metadata elements used to track additional information on accounting entries for analytics or processes purpose
    • provide an additional layer of metadata
    • {objective} eliminate the need to use document numbers, descriptions, or financial dimensions [1]
      • stored on the accounting entries that are created for the transactions [1]
    • {benefit} improved accuracy 
      • ensure each transaction is linked with the correct accounting and auditing elements, enhancing the accuracy in financial reporting and compliance [8]
    • {benefit} streamlined processes 
      • by automating the categorization of financial transactions, financial tags affect a more efficient invoicing process [8]
    • {benefit} better financial track 
      •  allow for granular tracking of expenses and revenues, enabling more detailed financial analysis [8]
    • shown as separate columns on voucher transactions and similar GL inquiry forms 
    • legal entity specific
    • can be shared by using the Shared data feature [3]
    • designed to support any amount of reuse
    • do not default from master data
      • {feature|planned} defaulting will be enabled through user-defined rules
    • similar to financial dimensions
      • an alternative to creating financial dimensions
      • structured (account structures, account rules, validation) 
      • designed for medium to high reuse 
      • the two are most likely mutually exclusive
      • every transaction that supports dimensions will eventually support financial tags 
    • unstructured 
      • no structure, no rules, no validation
    • require a delimiter between the tag values
      • via General ledger parameters >> Financial tags
      • it can be deactivated but not deleted 
        • ⇐ helps ensure that the tag values remain available for reporting on posted general ledger entries can easily be activated and deactivated at any time
    • the label of each financial tag can be changed at any time, even after transactions are posted
      • if transactions have been posted for a specific financial tag, the tag values don't change
    • tag values
      • are associated with an accounting entry
      • can be reused 
      • have header to line defaulting
      • are stored as simple text 
      • do not reference other data 
      • are not validated at any time, including during entry and posting
      • can be entered or edited at any time prior to posting 
      • can be changed at any time after posting 
        • by enabling "Allow edits to internal data on general ledger vouchers" feature
    • up to 20 financial tags can be defined
      • e.g. Customers, Vendors, Projects, PO numbers, Payment references
      • each is 100 characters [1]
  • {type} text 
    • free text with no lookup 
  • {type} custom list
    • free text with lookup 
  • {type} list
    • predefined list of many common types of data with lookup 
      • list values are also not validated
  • supported by
    • general journals
    • customer and vendor payment journals, including entities 
  • {operation} editing
    • values can be entered or edited at any time prior to posting 
    • values can be changed at any time after posting 
      • by enabling "Allow edits to internal data on general ledger vouchers" feature
  • can be disabled at any time [1]
    • any values that were entered for financial tags on transactions will be maintained in the database [1]
      • values will no longer be visible on any transactions or in inquiries [1]
  • journals and transactions support for tags
    • [10.0.32] introduced
    • [10.0.37] [1]
      • general journal, including entities 
      • global general journal
      • allocation journal
      • fixed asset journal
      • all asset leasing journals
      • periodic journal
      • reporting currency adjustment journal
      • customer payment journal, including entities 
      • vendor payment journal, including entities 
      • invoice journal (vendor)
      • global invoice journal (vendor)
      • invoice register
      • SO documents 
        • Sales order, packing slip and customer invoice
        • {feature} "Enable financial tags for sales order invoicing"
      • voucher transactions and Transactions for [account] forms 
      • general journal account entry reporting entity 
      • ledger settlement (manual settlement)
    • [10.0.41|PP] PO documents
      • {feature} "Enable financial tags for purchase order invoicing"
  • {feature} [10.0.42] financial tag rules 
    • allow to enter default value or automatically populate values in financial tags [7]
    • {benefit} ensure consistency and efficiency in transaction tagging [7]
      • ⇐ essential for accurate financial tracking and reporting [7]
    • journals support [7]
      • general journal
      • global general journal
      • allocation journal
      • reporting currency adjustment journal
      • invoice journal (vendor)
    • {operation} Create a financial tag rule
      • via General ledger >> Chart of accounts >> Financial tags >> Financial tags >> New >>
    • {operation} Copy a financial tag rule within legal entity
      • copies a rule that is defined for one transaction entry point to another entry point in the same legal entity [7]
    • {operation} Copy a financial tag to other legal entity
      • copies rules to any legal entity where financial tags are defined and active. Select one or more rules to copy to another legal entity [7]
  • {feature} rule-based defaulting engine for financial tags 
    • e.g. default the vendor name to financial tag XX 
  • {feature} financial tag defaulting rules
  • {feature} valuate storing financial tags directly on subledger data 
    • e.g. store financial tag values in the bank subledger to use with advanced bank reconciliation matching rules

References:
[1] Microsoft Learn (2025) Dynamics 365 Finance: Financial tags [link]
[2] Microsoft Learn (2025) Dynamics 365 Finance: Differences between financial tags and financial dimensions [link]
[3] Microsoft Learn (2025) Dynamics 365 Finance: Microsoft Learn (2022) Financial dimensions [link]
[4] Dynamics on Demand (2025) Financial Tags in Microsoft Dynamics 365 Finance | 10.0.32 [link]
[5] Ramit Paul (2025) Financial Tags in Microsoft Dynamics 365 Finance and Operations [link]
[6] Microsoft Learn (2025) Dynamics 365 Finance: Financial tag rule reference (preview) [link]
[7] Microsoft Learn (2025) Dynamics 365 Finance: Financial tag rules (preview) [link]
[8] Dynamics Global Edge IT Solutions (2024) Financial Tags For Purchase Order Invoicing In MS Dynamics365 F&O [link]

Resources:
[R1] Dynamics365lab (2024) Ep. 120:4 Exploring Financial Tags in Dynamics 365 F&O [link]
[R2] Nextone Consulting (2024) New Feature: Financial Tag Rules in Dynamics 365 SCM 10.0.42 [link]
[R3] Dynamics on Demand (2024) Financial Tags in Microsoft Dynamics 365 Finance | 10.0.32 [link]
[R4] Axcademy (2023) Is this the end to Financial dimensions in D365FO as we know them? [link]
[R5] HItachi Solutions (2024) New Feature in Dynamics 365 Finance - Financial Tags [link]

Acronyms:
D365 F&O - Dynamics 365 for Finance and Operations
GL - General Ledger
GA - General Availability
LE - Legal Entity
PO - Purchase Order
PP - Public Preview
SO - Sales Order

18 May 2024

📊Graphical Representation: Graphics We Live By (Part IV: Area Charts in MS Excel)

Graphical Representation
Graphical Representation

An area chart or area graph (see A) is a graphical representation of quantitative data based on a line chart for which the areas between axis and the lines of the series are commonly emphasized with colors, textures, or hatchings (Wikipedia). It resembles a combination between line and bar charts. Each data series results in the formation of a region (aka area), allowing thus to identify the overlapping and do comparisons between the lines within the same visual display. This approach works usually well for two or three data series if the lines don't overlap, though if more data series are added to the chart, the higher are the chances for lines to overlap or for one area to be covered by another (see B). This can easily become more than the chart can handle, even if the data series can be filtered dynamically.

Area Charts
Area Charts

Stacked area charts are a variation of area charts in which the areas are stacked, much like stacked bar charts (see C). Research papers abound with such charts, probably because they allow to stack together multiple data series within a small area, reflecting thus the many variables involved. Such charts allow to track individual as well as intermediary and total aggregated trends.

Stacked Area Charts
Stacked Area Charts

Unfortunately, besides the fact that some areas are barely distinguishable or that distant areas can't be compared (especially when one area in between has strong fluctuations), the lack of ticks and/or gridlines (see D) makes it difficult to interpret such charts. Moreover, when the lines are smoothed, it becomes even more difficult to identify the actual points. To address this it makes sense to use markers for data points to show that one works with discrete and not continuous points (see further paragraphs).

In general, it's recommended to reduce the number of data series to 3-5. For example, one can split the data series into 2-3 groups or categories based on series' characteristics (e.g. concentrate on the high values in one chart, respectively the low values in another, or group the low values under an "others" category) which would allow to make better comparisons.

Being able to sort the time series on their average value or other criteria (e.g. showing the areas with minimal variations first) can improve the readability of such charts.

Moreover, areas under curves can easily hide missing data (see F) and occasionally negative values (which is the case of the 8th example), or distort the rate of change when the charts are wider than needed (compare F with C). 

Line Chart, respectively Area Chart based on a subset
Area Charts Variations

Area charts seem to encode a dimension as area, though that's not necessarily the case. It seems natural to display time series of different granularities (day, month, quarter, year), though one needs to be careful about one important aspect! On a time scale, the more one moves away from the day to weeks and months as time units, the bigger the distance between points is. In the end, all the points in a series are discrete points (not continuous), though the bigger the distance, the more category-like these series become (compare F with C, the charts have the same width).

Using the area under the curve as dimension makes sense when there's continuity or the discrete points are close enough to each other to resemble continuity. Thus, area charts are useful when the number of points is high (and the distance between them becomes neglectable), e.g. showing daily values within a year or the months over several years. 

According to [2], [3] and several other sources, using the area to encode quantitative information is a poor graphical method and this applies to pie charts and area charts altogether. By contrast, for a bar chart (see G) one has either height or width to use for comparisons while the points are always as bars delimited. Scatter plots (see H), even if they might miss the time dimension, they better reflect the dispersion of the points along the lines delimited by encoding the color (compare H with E). 

Column Chart and Scatter Plot
Alternatives for Area Charts

The more category-like and the fewer data points the data series have, the higher the chances for other graphical representation tools to be able to better represent the data. For example, year or even quarter-based data can be better visualized with Sankey charts (unfortunately, not available as standard Excel visual yet).

Conversely, there are situations in which the area chart isn't supposed to convey specific values but to get a feeling of areas' shape, or its simplicity is more appropriate, situations in which area charts do a good job. In the end, a graphical representation's utility is linked to a chart's purpose (and audience, of course). 

References:
[1] Wikipedia (2023) Area charts (link)
[2] William S Cleveland (1993) Visualizing Data
[3] Robert L Harris (1996) Information Graphics: A Comprehensive Illustrated Reference

20 November 2018

🔭Data Science: Dimensionality (Just the Quotes)

"[…] the intrinsic value of a small-scale model is that it compensates for the renunciation of sensible dimensions by the acquisition of intelligible dimensions." (Claude Levi- Strauss, "The Savage Mind", 1962)

"The idea of knowledge as an improbable structure is still a good place to start. Knowledge, however, has a dimension which goes beyond that of mere information or improbability. This is a dimension of significance which is very hard to reduce to quantitative form. Two knowledge structures might be equally improbable but one might be much more significant than the other." (Kenneth E Boulding, "Beyond Economics: Essays on Society", 1968)

"A time series is a sequence of observations, usually ordered in time, although in some cases the ordering may be according to another dimension. The feature of time series analysis which distinguishes it from other statistical analysis is the explicit recognition of the importance of the order in which the observations are made. While in many problems the observations are statistically independent, in time series successive observations may be dependent, and the dependence may depend on the positions in the sequence. The nature of a series and the structure of its generating process also may involve in other ways the sequence in which the observations are taken." (Theodore W Anderson, "The Statistical Analysis of Time Series", 1971)

"The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.(Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"In addition to dimensionality requirements, chaos can occur only in nonlinear situations. In multidimensional settings, this means that at least one term in one equation must be nonlinear while also involving several of the variables. With all linear models, solutions can be expressed as combinations of regular and linear periodic processes, but nonlinearities in a model allow for instabilities in such periodic solutions within certain value ranges for some of the parameters." (Courtney Brown, "Chaos and Catastrophe Theories", 1995)

"The dimensionality and nonlinearity requirements of chaos do not guarantee its appearance. At best, these conditions allow it to occur, and even then under limited conditions relating to particular parameter values. But this does not imply that chaos is rare in the real world. Indeed, discoveries are being made constantly of either the clearly identifiable or arguably persuasive appearance of chaos. Most of these discoveries are being made with regard to physical systems, but the lack of similar discoveries involving human behavior is almost certainly due to the still developing nature of nonlinear analyses in the social sciences rather than the absence of chaos in the human setting."  (Courtney Brown, "Chaos and Catastrophe Theories", 1995)

"A system may be called complex here if its dimension (order) is too high and its model (if available) is nonlinear, interconnected, and information on the system is uncertain such that classical techniques can not easily handle the problem." (M Jamshidi, "Autonomous Control on Complex Systems: Robotic Applications", Current Advances in Mechanical Design and Production VII, 2000)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical Modeling: The two cultures" Statistical Science 16(3), 2001)

"Three key aspects of presenting high dimensional data are: rendering, manipulation, and linking. Rendering determines what is to be plotted, manipulation determines the structure of the relationships, and linking determines what information will be shared between plots or sections of the graph." (Gerald van Belle, "Statistical Rules of Thumb", 2002)

"With the ever increasing amount of empirical information that scientists from all disciplines are dealing with, there exists a great need for robust, scalable and easy to use clustering techniques for data abstraction, dimensionality reduction or visualization to cope with and manage this avalanche of data."  (Jörg Reichardt, "Structure in Complex Networks", 2009)

"The more dimensions used in quantitative comparisons, the larger are the disparities that can be accommodated. As irony would have it, however, the ease of comparison generally diminishes in direct proportion to the number of dimensions involved." (Joel Katz, "Designing Information: Human factors and common sense in information design", 2012)

"Dimensionality reduction and regression modeling are particularly hard to interpret in terms of original attributes, when the underlying data dimensionality is high. This is because the subspace embedding is defined as a linear combination of attributes with positive or negative coefficients. This cannot easily be intuitively interpreted in terms specific properties of the data attributes." (Charu C Aggarwal, "Outlier Analysis", 2013)

"Dimensionality reduction is essential for coping with big data - like the data coming in through your senses every second. A picture may be worth a thousand words, but it’s also a million times more costly to process and remember. [...] A common complaint about big data is that the more data you have, the easier it is to find spurious patterns in it. This may be true if the data is just a huge set of disconnected entities, but if they’re interrelated, the picture changes." (Pedro Domingos, "The Master Algorithm", 2015)

"The correlational technique known as multiple regression is used frequently in medical and social science research. This technique essentially correlates many independent (or predictor) variables simultaneously with a given dependent variable (outcome or output). It asks, 'Net of the effects of all the other variables, what is the effect of variable A on the dependent variable?' Despite its popularity, the technique is inherently weak and often yields misleading results. The problem is due to self-selection. If we don’t assign cases to a particular treatment, the cases may differ in any number of ways that could be causing them to differ along some dimension related to the dependent variable. We can know that the answer given by a multiple regression analysis is wrong because randomized control experiments, frequently referred to as the gold standard of research techniques, may give answers that are quite different from those obtained by multiple regression analysis." (Richard E Nisbett, "Mindware: Tools for Smart Thinking", 2015)

"Understanding reduces the complexity of data by collapsing the dimensionality of information to a lower set of known variables. s revolutions, be they tiny or vast, technological or social." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"Dimensionality reduction is a way of reducing a large number of different measures into a smaller set of metrics. The intent is that the reduced metrics are a simpler description of the complex space that retains most of the meaning." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The higher the dimension, in other words, the higher the number of possible interactions, and the more disproportionally difficult it is to understand the macro from the micro, the general from the simple units. This disproportionate increase of computational demands is called the curse of dimensionality." (Nassim N Taleb, "Skin in the Game: Hidden Asymmetries in Daily Life", 2018)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"We all know that the numerical values on each side of an equation have to be the same. The key to dimensional analysis is that the units have to be the same as well. This provides a convenient way to keep careful track of units when making calculations in engineering and other quantitative disciplines, to make sure one is computing what one thinks one is computing. When an equation exists only for the sake of mathiness, dimensional analysis often makes no sense." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"Smoothing and aggregating can help us see important features and relationships, but when we have only a handful of observations, smoothing techniques can be misleading. With just a few observations, we prefer rug plots over histograms, box plots, and density curves, and we use scatterplots rather than smooth curves and density contours. This may seem obvious, but when we have a large amount of data, the amount of data in a subgroup can quickly dwindle. This phenomenon is an example of the curse of dimensionality." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

19 February 2015

📊Business Intelligence: Measures (Definitions)

"A quantitative, numerical column in a fact table. Measures typically represent the values that are analyzed. See also dimension." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A metric is a measurable or quantitative value." (Microsoft Corporation, "Microsoft SQL Server 7.0 Data Warehouse Training Kit", 2000)

"A measure is a dimensional modeling term that refers to values, usually numeric, that measure some aspect of the business. Measures reside in fact tables. The dimensional terms measure and attribute, taken together, are equivalent to the relational modeling use of the term attribute." (Claudia Imhoff et al, "Mastering Data Warehouse Design", 2003)

"(1) A mapping from empirical properties to quantities in a formal mathematical model called a measurement scale. (2) To obtain a measurement." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"In Dimensional modeling, a specific data item that describes a fact or aggregation of facts. Measures are implemented as metric facts." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling" 2nd Ed., 2005)

"A summarizable numerical value used to monitor business activity; it is also known as a fact. " (Reed Jacobsen & Stacia Misner, "Microsoft SQL Server 2005 Analysis Services Step by Step", 2006)

"A column of quantifiable data mapped to a dimension within a cube. Measures are often used to provide access to aggregations of data (such as annual sales of a product or a store), while also giving the ability to drill down into the details (such as quarterly or monthly sales)." (Robert D. Schneider and Darril Gibson, "Microsoft SQL Server 2008 All-In-One Desk Reference For Dummies", 2008)

[business measure:] "Business performance metric captured by an operational system and represented as a physical or computed fact in a dimensional model." (Ralph Kimball, "The Data Warehouse Lifecycle Toolkit", 2008)

"A set of usually numeric values from a fact table that is aggregated in a cube across all dimensions." (Jim Joseph et al, Microsoft® SQL Server 2008 Reporting Services Unleashed, 2009)

[business measures:] "The complete set of facts, base and derived, that are defined and made available for reporting and analysis." (Laura Reeves, "A Manager's Guide to Data Warehousing", 2009)

"A quantitative performance indicator or success factor that can be traced on an ongoing basis to determine successful operation and progress toward objectives and goals." (David Lyle & John G. Schmidt, "Lean Integration", 2010)

"1.Loosely used, a metric. 2.In data modeling, a quantified characteristic; the unit used to quantify the dimensions, capacity, or amount of something." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Value assigned (noun) or the process of assigning a value (verb) to an object through calculation, appraisal, estimation, or some other method." (Leslie G Eldenburg & Susan K. Wolcott, "Cost Management" 2nd Ed., 2011)

"In a cube, a set of values that are usually numeric and are based on a column in the fact table of the cube. Measures are the central values that are aggregated and analyzed." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The act of identifying what to measure as well as actually collecting the measures that would help an organization understand if the process is operating within acceptable limits." (Project Management Institute, "Organizational Project Management Maturity Model (OPM3®)" 3rd Ed., 2013)

"Metrics such as count, maximum, minimum, sum, or average that are used in a fact table. Measures can be calculated with an SQL expression or mapped directly to a numeric value in a column." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"The number or category assigned to an attribute of an entity by making a measurement. (ISO 14598)

19 December 2014

🕸Systems Engineering: Feedback (Just the Quotes)

"Feedback is a method of controlling a system by reinserting into it the results of its past performance. If these results are merely used as numerical data for the criticism of the system and its regulation, we have the simple feedback of the control engineers. If, however, the information which proceeds backward from the performance is able to change the general method and pattern of performance, we have a process which may be called learning." (Norbert Wiener, 1954)

"[...] the concept of 'feedback', so simple and natural in certain elementary cases, becomes artificial and of little use when the interconnexions between the parts become more complex. When there are only two parts joined so that each affects the other, the properties of the feedback give important and useful information about the properties of the whole. But when the parts rise to even as few as four, if every one affects the other three, then twenty circuits can be traced through them; and knowing the properties of all the twenty circuits does not give complete information about the system. Such complex systems cannot be treated as an interlaced set of more or less independent feedback circuits, but only as a whole. For understanding the general principles of dynamic systems, therefore, the concept of feedback is inadequate in itself. What is important is that complex systems, richly cross-connected internally, have complex behaviours, and that these behaviours can be goal-seeking in complex patterns." (W Ross Ashby, "An Introduction to Cybernetics", 1956)

"Traditional organizational theories have tended to view the human organization as a closed system. This tendency has led to a disregard of differing organizational environments and the nature of organizational dependency on environment. It has led also to an over-concentration on principles of internal organizational functioning, with consequent failure to develop and understand the processes of feedback which are essential to survival." (Daniel Katz, "The Social Psychology of Organizations", 1966)

"The structure of a complex system is not a simple feedback loop where one system state dominates the behavior. The complex system has a multiplicity of interacting feedback loops. Its internal rates of flow are controlled by non‐linear relationships. The complex system is of high order, meaning that there are many system states (or levels). It usually contains positive‐feedback loops describing growth processes as well as negative, goal‐seeking loops." (Jay W Forrester, "Urban Dynamics", 1969)

"To model the dynamic behavior of a system, four hierarchies of structure should be recognized: closed boundary around the system; feedback loops as the basic structural elements within the boundary; level variables representing accumulations within the feedback loops; rate variables representing activity within the feedback loops." (Jay W Forrester, "Urban Dynamics", 1969)

"Effect spreads its 'tentacles' not only forwards (as a new cause giving rise to a new effect) but also backwards, to the cause which gave rise to it, thus modifying, exhausting or intensifying its force. This interaction of cause and effect is known as the principle of feedback. It operates everywhere, particularly in all self-organising systems where perception, storing, processing and use of information take place, as for example, in the organism, in a cybernetic device, and in society. The stability, control and progress of a system are inconceivable without feedback." (Alexander Spirkin, "Dialectical Materialism", 1983)

"Ultimately, uncontrolled escalation destroys a system. However, change in the direction of learning, adaptation, and evolution arises from the control of control, rather than unchecked change per se. In general, for the survival and co-evolution of any ecology of systems, feedback processes must be embodied by a recursive hierarchy of control circuits." (Bradford P Keeney, "Aesthetics of Change", 1983)

"Every system of whatever size must maintain its own structure and must deal with a dynamic environment, i.e., the system must strike a proper balance between stability and change. The cybernetic mechanisms for stability (i.e., homeostasis, negative feedback, autopoiesis, equifinality) and change (i.e., positive feedback, algedonodes, self-organization) are found in all viable systems." (Barry Clemson, "Cybernetics: A New Management Tool", 1984) 

"The term closed loop-learning process refers to the idea that one learns by determining what s desired and comparing what is actually taking place as measured at the process and feedback for comparison. The difference between what is desired and what is taking place provides an error indication which is used to develop a signal to the process being controlled." (Harold Chestnut, 1984) 

"The term chaos is used in a specific sense where it is an inherently random pattern of behaviour generated by fixed inputs into deterministic (that is fixed) rules (relationships). The rules take the form of non-linear feedback loops. Although the specific path followed by the behaviour so generated is random and hence unpredictable in the long-term, it always has an underlying pattern to it, a 'hidden' pattern, a global pattern or rhythm. That pattern is self-similarity, that is a constant degree of variation, consistent variability, regular irregularity, or more precisely, a constant fractal dimension. Chaos is therefore order (a pattern) within disorder (random behaviour)." (Ralph D Stacey, "The Chaos Frontier: Creative Strategic Control for Business", 1991)

"In many parts of the economy, stabilizing forces appear not to operate. Instead, positive feedback magnifies the effects of small economic shifts; the economic models that describe such effects differ vastly from the conventional ones. Diminishing returns imply a single equilibrium point for the economy, but positive feedback – increasing returns – makes for many possible equilibrium points. There is no guarantee that the particular economic outcome selected from among the many alternatives will be the ‘best’ one."  (W Brian Arthur, "Returns and Path Dependence in the Economy", 1994)

“[…] self-organization is the spontaneous emergence of new structures and new forms of behavior in open systems far from equilibrium, characterized by internal feedback loops and described mathematically by nonlinear equations.” (Fritjof  Capra, “The web of life: a new scientific understanding of living  systems”, 1996)

"Something of the previous state, however, survives every change. This is called in the language of cybernetics (which took it form the language of machines) feedback, the advantages of learning from experience and of having developed reflexes." (Guy Davenport, "The Geography of the Imagination: Forty Essays", 1997)

"Cybernetics is the science of effective organization, of control and communication in animals and machines. It is the art of steersmanship, of regulation and stability. The concern here is with function, not construction, in providing regular and reproducible behaviour in the presence of disturbances. Here the emphasis is on families of solutions, ways of arranging matters that can apply to all forms of systems, whatever the material or design employed. [...] This science concerns the effects of inputs on outputs, but in the sense that the output state is desired to be constant or predictable – we wish the system to maintain an equilibrium state. It is applicable mostly to complex systems and to coupled systems, and uses the concepts of feedback and transformations (mappings from input to output) to effect the desired invariance or stability in the result." (Chris Lucas, "Cybernetics and Stochastic Systems", 1999)

"All dynamics arise from the interaction of just two types of feedback loops, positive (or self-reinforcing) and negative (or self-correcting) loops. Positive loops tend to reinforce or amplify whatever is happening in the system […] Negative loops counteract and oppose change." (John D Sterman, "Business Dynamics: Systems thinking and modeling for a complex world", 2000)

"Much of the art of system dynamics modeling is discovering and representing the feedback processes, which, along with stock and flow structures, time delays, and nonlinearities, determine the dynamics of a system. […] the most complex behaviors usually arise from the interactions (feedbacks) among the components of the system, not from the complexity of the components themselves." (John D Sterman, "Business Dynamics: Systems thinking and modeling for a complex world", 2000)

“The phenomenon of emergence takes place at critical points of instability that arise from fluctuations in the environment, amplified by feedback loops." (Fritjof Capra, "The Hidden Connections: A Science for Sustainable Living", 2002)

"Thus, nonlinearity can be understood as the effect of a causal loop, where effects or outputs are fed back into the causes or inputs of the process. Complex systems are characterized by networks of such causal loops. In a complex, the interdependencies are such that a component A will affect a component B, but B will in general also affect A, directly or indirectly.  A single feedback loop can be positive or negative. A positive feedback will amplify any variation in A, making it grow exponentially. The result is that the tiniest, microscopic difference between initial states can grow into macroscopically observable distinctions." (Carlos Gershenson, "Design and Control of Self-organizing Systems", 2007)

"The work around the complex systems map supported a concentration on causal mechanisms. This enabled poor system responses to be diagnosed as the unanticipated effects of previous policies as well as identification of the drivers of the sector. Understanding the feedback mechanisms in play then allowed experimentation with possible future policies and the creation of a coherent and mutually supporting package of recommendations for change."  (David C Lane et al, "Blending systems thinking approaches for organisational analysis: reviewing child protection", 2015)

More quotes on "Feedback" at the-web-of-knowledge.blogspot.com.

23 May 2014

🔬Data Science: Fractal (Definitions)

"A fractal is a mathematical set or concrete object that is irregular or fragmented at all scales [...]" (Benoît Mandelbrot, "The Fractal Geometry of Nature", 1982)

"Objects (in particular, figures) that have the same appearance when they are seen on fine and coarse scales." (David Rincón & Sebastià Sallent, Scaling Properties of Network Traffic, 2008) 

"A collection of objects that have a power-law dependence of number on size." (Donald L Turcotte, "Fractals in Geology and Geophysics", 2009) 

"A fractal is a geometric object which is self-similar and characterized by an effective dimension which is not an integer." (Leonard M Sander, "Fractal Growth Processes", 2009) 

"A fractal is a structure which can be subdivided into parts, where the shape of each part is similar to that of the original structure." (Yakov M Strelniker, "Fractals and Percolation", 2009) 

"A fractal is an image that comprises two distinct attributes: infinite detail and self-similarity." (Daniel C Doolan et al, "Unlocking the Hidden Power of the Mobile", 2009)

"A geometrical object that is invariant at any scale of magnification or reduction." (Sidney Redner, "Fractal and Multifractal Scaling of Electrical Conduction in Random Resistor Networks", 2009) 

[Fractal structure:] "A pattern or arrangement of system elements that are self-similar at different spatial scales." (Michael Batty, "Cities as Complex Systems: Scaling, Interaction, Networks, Dynamics and Urban Morphologies", 2009) 

"A set whose (suitably defined) geometrical dimensionis non-integral. Typically, the set appears selfsimilar on all scales. A number of geometrical objects associated with chaos (e. g. strange attractors) are fractals." (Oded Regev, "Chaos and Complexity in Astrophysics", 2009) 

[Fractal system:] "A system characterized by a scaling law with a fractal, i. e., non-integer exponent. Fractal systems are self-similar, i. e., a magnification of a small part is statistically equivalent to the whole." (Jan W Kantelhardt, "Fractal and Multifractal Time Series", 2009) 

"An adjective or a noun representing complex configurations having scale-free characteristics or self-similar properties. Mathematically, any fractal can be characterized by a power law distribution." (Misako Takayasu & Hideki Takayasu, "Fractals and Economics", 2009) 

"Fractals are complex mathematical objects that are invariant with respect to dilations (self-similarity) and therefore do not possess a characteristic length scale. Fractal objects display scale-invariance properties that can either fluctuate from point to point (multifractal) or be homogeneous (monofractal). Mathematically, these properties should hold over all scales. However, in the real world, there are necessarily lower and upper bounds over which self-similarity applies." (Alain Arneodo et al, "Fractals and Wavelets: What Can We Learn on Transcription and Replication from Wavelet-Based Multifractal Analysis of DNA Sequences?", 2009) 

"Mathematical object usually having a geometrical representation and whose spatial dimension is not an integer. The relation between the size of the object and its “mass” does not obey that of usual geometrical objects." (Bastien Chopard, "Cellular Automata: Modeling of Physical Systems", 2009) 

 "A fragmented geometric shape that can be split up into secondary pieces, each of which is approximately a smaller replica of the whole, the phenomenon commonly known as self similarity." (Khondekar et al, "Soft Computing Based Statistical Time Series Analysis, Characterization of Chaos Theory, and Theory of Fractals", 2013) 

 "A natural phenomenon or a mathematical set that exhibits a repeating pattern which can be replicated at every scale." (Rohnn B Sanderson, "Understanding Chaos as an Indicator of Economic Stability", 2016) 

 "Geometric pattern repeated at progressively smaller scales, where each iteration is about a reproduction of the image to produce completely irregular shapes and surfaces that can not be represented by classical geometry. Fractals are generally self-similar (each section looks at all) and are not subordinated to a specific scale. They are used especially in the digital modeling of irregular patterns and structures in nature." (Mauro Chiarella, Folds and Refolds: Space Generation, Shapes, and Complex Components, 2016)

23 November 2011

📉Graphical Representation: Dimensions (Just the Quotes)

"Two dimensional charts for the representation of mathematical equations or experimental data are in very common use nowadays and are everywhere recognized as valuable devices for giving a clear conception of the manner in which the variables are related. Their application is generally restricted, however, to cases where there is but one variable and its function, if the variation to be shown is continuous. Nevertheless cases often arise in which there are two variables and a function to be represented and where it is desirable to show a continuousvariation for all three." (John B Peddle, "The Construction of Graphical Charts", 1910)

"Graphic comparisons, wherever possible, should be made in one dimension only." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"In general, the comparison of two circles of different size should be strictly avoided. Many excellent works on statistics approve the comparison of circles of different size, and state that the circles should always be drawn to represent the facts on an area basis rather than on a diameter basis. The rule, however, is not always followed and the reader has no way of telling whether the circles compared have been drawn on a diameter basis or on an area basis, unless the actual figures for the data are given so that the dimensions may be verified." (Willard C Brinton, "Graphic Methods for Presenting Facts", 1919)

"Readers of statistical diagrams should not be required to compare magnitudes in more than one dimension. Visual comparisons of areas are particularly inaccurate and should not be necessary in reading any statistical graphical diagram." (William C Marshall, "Graphical methods for schools, colleges, statisticians, engineers and executives", 1921)

"In short, the rule that no more dimensions or axes should be used in the chart than the data calls for, is fundamental. Violate this rule and you bring down upon your head a host of penalties. In the first place, you complicate your computing processes, or else achieve a grossly deceptive chart. If your chart becomes deceptive, it has defeated its purpose, which was to represent accurately. Unless, of course, you intended to deceive, in which case we are through with you and leave you to Mark Twain’s mercies. If you make your chart accurate, at the cost of considerable square or cube root calculating, you still have no hope, for the chart is not clear; your reader is more than likely to misunderstand it. Confusion, inaccuracy and deception always lie in wait for you down the path departing from the principle we have discussed - and one of them is sure to catch you." (Karl G Karsten, "Charts and Graphs", 1925)

"The bar chart is one of the most useful, simple, adaptable, and popular techniques in graphic presentation. The simple bar chart, with its many variations, is particularly appropriate for comparing the magnitude, or size, of coordinate items or of parts of a total. The basis of comparison in the bar chart is linear or one-dimensional. The length of each bar or of its components is proportional to the quantity or amount of each category' represented. " (Calvin F Schmid, "Handbook of Graphic Presentation", 1954)

"The common bar chart is particularly appropriate for comparing magnitude or size of coordinate items or parts of a total. It is one of the most useful, simple, and adaptable techniques in graphic presentation. The basis of comparison in the bar chart is linear or one-dimensional. The length of each bar or of its components is proportional to the quantity or amount of each category represented." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"An especially effective device for enhancing the explanatory power of time-series displays is to add spatial dimensions to the design of the graphic, so that the data are moving over space (in two or three dimensions) as well as over time. […] Occasionally graphics are belligerently multivariate, advertising the technique rather than the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Graphical integrity is more likely to result if these six principles are followed:
The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.
Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.
Show data variations, not design variations. 
In time-series displays of money, deflated and standardized units of monetary measurements are nearly always better than nominal units.
The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
Graphics must not quote data out of context." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The time-series plot is the most frequently used form of graphic design. With one dimension marching along to the regular rhythm of seconds, minutes, hours, days, weeks, months, years, centuries, or millennia, the natural ordering of the time scale gives this design a strength and efficiency of interpretation found in no other graphic arrangement." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Maximizing data ink (within reason) is but a single dimension of a complex and multivariate design task. The principle helps conduct experiments in graphical design. Some of those experiments will succeed. There remain, however, many other considerations in the design of statistical graphics - not only of efficiency, but also of complexity, structure, density, and even beauty." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"The ducks of information design are false escapes from flatland, adding pretend dimensions to impoverished data sets, merely fooling around with information." (Edward R Tufte, "Envisioning Information", 1990)

"We envision information in order to reason about, communicate, document, and preserve that knowledge - activities nearly always carried out on two-dimensional paper and computer screen. Escaping this flatland and enriching the density of data displays are the essential tasks of information design." (Edward R Tufte, "Envisioning Information", 1990)

"Binning has two basic limitations. First, binning sacrifices resolution. Sometimes plots of the raw data will reveal interesting fine structure that is hidden by binning. However, advantages from binning often outweigh the disadvantage from lost resolution. [...] Second, binning does not extend well to high dimensions. With reasonable univariate resolution, say 50 regions each covering 2% of the range of the variable, the number of cells for a mere 10 variables is exceedingly large. For uniformly distributed data, it would take a huge sample size to fill a respectable fraction of the cells. The message is not so much that binning is bad but that high dimensional space is big. The complement to the curse of dimensionality is the blessing of large samples. Even in two and three dimensions having lots of data can bc very helpful when the observations are noisy and the structure non-trivial." (Daniel B Carr, "Looking at Large Data Sets Using Binned Data Plots", [in "Computing and Graphics in Statistics"] 1991)

"Fitting is essential to visualizing hypervariate data. The structure of data in many dimensions can be exceedingly complex. The visualization of a fit to hypervariate data, by reducing the amount of noise, can often lead to more insight. The fit is a hypervariate surface, a function of three or more variables. As with bivariate and trivariate data, our fitting tools are loess and parametric fitting by least-squares. And each tool can employ bisquare iterations to produce robust estimates when outliers or other forms of leptokurtosis are present." (William S Cleveland, "Visualizing Data", 1993)

"The visual representation of a scale - an axis with ticks - looks like a ladder. Scales are the types of functions we use to map varsets to dimensions. At first glance, it would seem that constructing a scale is simply a matter of selecting a range for our numbers and intervals to mark ticks. There is more involved, however. Scales measure the contents of a frame. They determine how we perceive the size, shape, and location of graphics. Choosing a scale (even a default decimal interval scale) requires us to think about what we are measuring and the meaning of our measurements. Ultimately, that choice determines how we interpret a graphic." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"It is tempting to make charts more engaging by introducing fancy graphics or three dimensions so they leap off the page, but doing so obscures the real data and misleads people, intentionally or not." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"One way a chart can lie is through overemphasis of the size and scale of items, particularly when the dimension of depth isnʼt considered." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Using colour, itʼs possible to increase the density of information even further. A single colour can be used to represent two variables simultaneously. The difficulty, however, is that there is a limited amount of information that can be packed into colour without confusion." (Brian Suda, "A Practical Guide to Designing with Data", 2010)

"Bear in mind is that the use of color doesn’t always help. Use it sparingly and with a specific purpose in mind. Remember that the reader’s brain is looking for patterns, and will expect both recurrence itself and the absence of expected recurrence to carry meaning. If you’re using color to differentiate categorical data, then you need to let the reader know what the categories are. If the dimension of data you’re encoding isn’t significant enough to your message to be labeled or explained in some way - or if there is no dimension to the data underlying your use of difference colors - then you should limit your use so as not to confuse the reader." (Noah Iliinsky & Julie Steel, "Designing Data Visualizations", 2011)

"[...] the human brain is not good at calculating surface sizes. It is much better at comparing a single dimension such as length or height. [...] the brain is also a hopelessly lazy machine." (Alberto Cairo, "The Functional Art", 2011)

"Explanatory data visualization is about conveying information to a reader in a way that is based around a specific and focused narrative. It requires a designer-driven, editorial approach to synthesize the requirements of your target audience with the key insights and most important analytical dimensions you are wishing to convey." (Andy Kirk, "Data Visualization: A successful design process", 2012)

"A signal is a useful message that resides in data. Data that isn’t useful is noise. […] When data is expressed visually, noise can exist not only as data that doesn’t inform but also as meaningless non-data elements of the display (e.g. irrelevant attributes, such as a third dimension of depth in bars, color variation that has no significance, and artificial light and shadow effects)." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"When we use the number of dimensions as the classification criterion of visual displays, we get four distinct groups: charts, networks, and maps, along with figurative visualizations as a special group." (Jorge Camões, "Data at Work: Best practices for creating effective charts and information graphics in Microsoft Excel", 2016)

"A time series is a sequence of values, usually taken in equally spaced intervals. […] Essentially, anything with a time dimension, measured in regular intervals, can be used for time series analysis." (Andy Kriebel & Eva Murray, "#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time", 2018)

"Color is difficult to use effectively. A small number of well-chosen colors can be highly distinguishable, particularly for categorical data, but it can be difficult for users to distinguish between more than a handful of colors in a visualization. Nonetheless, color is an invaluable tool in the visualization toolbox because it is a channel that can carry a great deal of meaning and be overlaid on other dimensions. […] There are a variety of perceptual effects, such as simultaneous contrast and color deficiencies, that make precise numerical judgments about a color scale difficult, if not impossible." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Maps also have the disadvantage that they consume the most powerful encoding channels in the visualization toolbox - position and size - on an aspect that is held constant. This leaves less effective encoding channels like color for showing the dimension of interest." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.