16 October 2024

Graphical Representation: Composition (Just the Quotes)

"Nothing is so illuminating as a set of properly proportioned diagrams. [...] In addition to the significance of graphics in analytical work, it is likewise a valuable aid to the memory. A picture is manifestly more readily retained in mind than a description of the same subject, no matter how vividly it may have been expressed. A pictorial or diagrammatic illustration usually produces a firmer and more lasting impression than any composition of words or tabulation of figures, however well they may be arranged or set forth." (Allan C Haskell, "How to Make and Use Graphic Charts", 1919)

"Without adequate planning, it is seldom possible to achieve either proper emphasis of each component element within the chart or a presentation that is pleasing in its entirely. Too often charts are developed around a single detail without sufficient regard for the work as a whole. Good chart design requires consideration of these four major factors:" (1) size," (2) proportion," (3) position and margins, and" (4) composition." (Anna C Rogers, "Graphic Charts Handbook", 1961)

"As a general rule, plotted points and graph lines should be given more 'weight' than the axes. In this way the 'meat' will be easily distinguishable from the 'bones'. Furthermore, an illustration composed of lines of unequal weights is always more attractive than one in which all the lines are of uniform thickness. It may not always be possible to emphasise the data in this way however. In a scattergram, for example, the more plotted points there are, the smaller they may need to be and this will give them a lighter appearance. Similarly, the more curves there are on a graph, the thinner the lines may need to be. In both cases, the axes may look better if they are drawn with a somewhat bolder line so that they are easily distinguishable from the data." (Linda Reynolds & Doig Simmonds, "Presentation of Data in Science" 4th Ed, 1984)

"While visuals are an essential part of data storytelling, data visualizations can serve a variety of purposes from analysis to communication to even art. Most data charts are designed to disseminate information in a visual manner. Only a subset of data compositions is focused on presenting specific insights as opposed to just general information. When most data compositions combine both visualizations and text, it can be difficult to discern whether a particular scenario falls into the realm of data storytelling or not." (Brent Dykes, "Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals", 2019)

"A semantic approach to visualization focuses on the interplay between charts, not just the selection of charts themselves. The approach unites the structural content of charts with the context and knowledge of those interacting with the composition. It avoids undue and excessive repetition by instead using referential devices, such as filtering or providing detail-on-demand. A cohesive analytical conversation also builds guardrails to keep users from derailing from the conversation or finding themselves lost without context. Functional aesthetics around color, sequence, style, use of space, alignment, framing, and other visual encodings can affect how users follow the script." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Aligning on data ink can be a powerful way to build relationships across charts. It can be used to obscure the lines between charts, making the composition feel more seamless. [....] Alignment paradigms can also influence the layout design needed. [...] The layout added to the alignment further supports this relationship." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Beyond basic charts, practitioners must also learn to compose visualizations together elegantly. The perceptual stage focuses on making the literal charts more precise as well as working to de-emphasize the entire piece. Design choices start to consider distractions, reducing visual clutter and centering on the message. Minimalism is espoused as a core value with an emphasis on shifting toward precision as accuracy. This is the most common next step for practitioners. Minimalism is also a key stage in maturation. It is experimentation at one extreme that helps practitioners distill down to core, shared practices." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Chart choices can also create weight within the entire composition. Presenting information as a comprehensive visualization, such as in a dashboard, requires thinking beyond individual charts. In writing, we not only craft sentences, but write the composition as an entire piece. Certain sentences may drive the writing more, but all sentences play a role in conveying the message." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

"Visualizations are abstractions, relying on primary graphicacy skills to fully understand the composition." (Vidya Setlur & Bridget Cogley, "Functional Aesthetics for data visualization", 2022)

Daniel B Carr - Collected Quotes

"Binning has two basic limitations. First, binning sacrifices resolution. Sometimes plots of the raw data will reveal interesting fine structure that is hidden by binning. However, advantages from binning often outweigh the disadvantage from lost resolution. [...] Second, binning does not extend well to high dimensions. With reasonable univariate resolution, say 50 regions each covering 2% of the range of the variable, the number of cells for a mere 10 variables is exceedingly large. For uniformly distributed data, it would take a huge sample size to fill a respectable fraction of the cells. The message is not so much that binning is bad but that high dimensional space is big. The complement to the curse of dimensionality is the blessing of large samples. Even in two and three dimensions having lots of data can bc very helpful when the observations are noisy and the structure non-trivial." (Daniel B Carr, "Looking at Large Data Sets Using Binned Data Plots", [in "Computing and Graphics in Statistics"] 1991)

"There is an interplay between statistical models and graphics, so it is advantageous to think about models before making a series of plots." (Daniel B Carr, "Looking at Large Data Sets Using Binned Data Plots", [in "Computing and Graphics in Statistics"] 1991)

"Working with binned data directly addresses large data set issues of computation and plotting speed. Almost everything that can bc done with the original data can be done faster with binned data. Further, working with binned data allows image processing algorithms to be adapted and applied to bin cells. Thus tools can bc brought to bare that are not traditionally associated with exploratory data analysis." (Daniel B Carr, "Looking at Large Data Sets Using Binned Data Plots", [in "Computing and Graphics in Statistics"] 1991)

"A scatterplot would show the relationship between [...] two variables in more detail, but would not convey the spatial patterns shown in […] micromap panels. Using conditioning to define a comparative grid of panels, […] changes an investigation from a sequential filtering of one variable at a time to more of a multivariable approach. In this context we can assess functional relationships, densities, or geospatial patterns within panels as well as changes across panels." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Another method used to simplify the appearance of a graphic is smoothing. A regression line overlaid on a scatterplot is a smooth representation of the relationship between the two graph variables. For time series data, a moving average of the data over time is often used to smooth out the variation over small time steps in order to illustrate the overall trend." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Designing good visual displays with an easy-to-use interactive system is difficult. The designer’s first attempts will usually fail, so it is critical that proposed systems be tested on at least several sets of typical users. These usability tests help the designer iterate to the best possible system." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Given the small size of micromaps, the blocks of color on choropleth maps have the advantage of being more visible than if the values were displayed by small symbols or hatch patterns on the map. Using highly saturated colors makes small areas stand out even more. On the other hand, the eye can be drawn to large blocks of color that represent small populations […] A micromap re-design may attempt to mitigate this areal bias by increasing the size of small […] states, but the analyst needs to be aware of this potential problem when using micromaps to communicate to others. The conditioned micromap design can partially address this issue by conditioning on population." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Hue is the color dimension that is associated with wavelength of light and with names of colors, such as red, yellow, and blue. Most languages around the world include words for black, white, red, green, yellow, blue, brown, pink, purple, orange, and gray. Differences in hue are best used for encoding different attributes, as in a qualitative graph or unordered variables. Different wavelengths have different focal lengths, so what we 'see' is a compromise between the actual and perceived distance to the image. Most people perceive long-wavelength colors, such as red and orange, as being closer to their eyes than short-wavelength colors, such as blue and green." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"In addition to smoothing boundaries, we can smooth the data. The simultaneous smoothing of variation over space, time, or attributes can help us to see the central patterns that would otherwise be hidden by local variation (noise). Local averaging of values usually can provide less biased estimates of spatial and temporal processes, just as the regression line can provide an unbiased estimate of a linear relationship between variables. However, smoothing can actually mask patterns, particularly important outliers, if we smooth over places that are dissimilar in some relevant attribute." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Micromap graphics differ from most of [other] methodology in two ways. First, by definition, micromaps always include maps among the views of study units. Second, micromaps use different methods to highlight study units. Linked micromaps sort the study units, partition them into small subsets, and systematically highlight these subsets. The conditioned micromaps and many comparative micromaps use a three-class slider to partition." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Much of a statistician’s training, especially in thinking about patterns, is related to the statistical tasks of describing and comparing distributions and to creating and refining models that describe how variables are related. There is little direct focus on the tasks of pattern identification, distribution comparison, and model building in the web page design and usability literature. Instead, that community is more focused on searching for and filtering information, drilling down to find a specific piece of information and navigation on the web. Nonetheless, good tools for one purpose often can be adapted to another purpose." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"People have different approaches to reasoning about data, depending on their skills and experience, but research has shown that there are commonalities in their processing steps. Some researchers call this sense making. A classical statistical analysis is usually straightforward, consisting of sequential steps of experimental design, the conduct of the experiment, and a statistical summary of results. An exploratory analysis is often interactive and less structured. Usually there is a phase of information gathering and preliminary processing, followed by choice of the representation method that will address the question at hand or questions raised by preliminary graphics." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"[…] perceptual accuracy decreases with distance, so columns that are to be compared should be side by side. Current linked micromap software requires the user." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Saturation, also referred to as chroma or intensity, measures the purity of the color. A highly saturated color has little or no gray in it, while a highly desaturated color is almost gray, with none of the original color. You may be more familiar with the term shade, which refers to a mix of pigment and black paint, or tint, a mix of pigment and white paint. We only perceive a few different steps of varying saturation, so changing saturation alone is not effective for encoding a quantitative variable. However, the eye is drawn to highly saturated colors, so these can be used to good effect for drawing attention to a part of the visualization. In addition, highly saturated colors stand out more and so can be used as fill colors to improve the visibility of small symbols or areas." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Scatterplots are the preferred medium for adding smooth curves to show a causal functional relationship or an association […] However, despite the advantage of the scatterplot for seeing some types of patterns, the linked micromap design adds geographic location to the information displayed and so enables searches for geographic patterns that the scatterplot omits." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"Statistical models typically decompose observed values into fit and residuals. Mapping fitted values shows broad patterns that may help us to understand and explain the process that generated the data. Mapping residuals can show us a mixture of noise and anomalies. Sometimes we are more interested in the broad patterns, but at other times we wish to identify the anomalies, e.g., where some corrective action needs to be taken." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"The power of graphics to aid understanding is well recognized, but with power comes the risk of misuse. Some people advocate the restriction of graphs and data to avoid misuse or to avoid drawing attention to problems. As educators we seek to provide both tools and education with the hope that learning will continue. Graphics can be misused, but our position is that people can learn from mistakes. We also believe that when many people can see and share perspectives, we are in a better position to see constructively and shape the world." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

"The use of color is so fundamental in visualization design that its perception requires an in-depth discussion [...]. Using color well is not easy. Color is one of those concepts that everyone thinks they understand, but that is really more complex than it first appears." (Daniel B Carr & Linda W Pickle, "Visualizing Data Patterns with Micromaps", 2010)

15 October 2024

Data Management: Data Governance (Part III: Taming the Complexity)

Data Management Series
Data Management Series

The Chief Data Officer (CDO) or the “Head of the Data Team” is one of the most challenging jobs because is more of a "political" than a technical role. It requires the ideal candidate to be able to throw and catch curved balls almost all the time, and one must be able to play ball with all the parties having an interest in data (aka stakeholders). It’s a full-time job that requires the combination of management and technical skillsets, and both are important. The focus will change occasionally in one direction more than in the other, with considerable fluctuations. 

Moreover, even if one masters the technical and managerial aspects, the combination of the two gives birth to situations that require further expertise – applied systems thinking being probably the most important. This, also because there are so many points of failure that it's challenging to address all the important causes. Therefore, it’s critical to be a system thinker, to have an experienced team and make use adequately of its experience! 

In a complex word, in which even the smallest constraint or opportunity can have an important impact especially when it’s involved in the early stages of the processes taking place in organizations. It relies on the manager’s and team’s skillset, their inspiration, the way the business reacts to the tasks involved and probably many other aspects that make things work. It takes considerable effort until the whole mechanism works, and even more time to make things work efficiently. The best metaphor is probably the one of a small combat team in which everybody has their place and skillset in the mechanism, independently if one talks about strategy, tactics or operations. 

Unfortunately, building such teams takes time, and the more people are involved, the more complex this endeavor becomes. The manager and the team must meet somewhere in the middle in what concerns the philosophy, the execution of the various endeavors, the way of working together to achieve the same goals. There are multiple forces pulling in all directions and it takes time until one can align the goals, respectively the effort. 

The most challenging forces are the ones between the business and the data team, respectively the business and data requirements, forces that don’t necessarily converge. Working in small organizations, the two parties have in theory more challenges to overcome the challenges and a team’s experience can weight a lot in the process, though as soon the scale changes, the number of challenges to be overcome changes exponentially (there are however different exponential functions in which the basis and exponent make the growth rapid). 

In big organizations can appear other parties that have the same force to pull the weight in one direction or another. Thus, the political aspects become more complex to the degree that the technologies must follow the political decisions, with all the positive and negative implications deriving from this. As comparison, think about the challenges from moving from two to three or more moving bodies orbiting each other, resulting in a chaotic dynamical system for most initial conditions. 

Of course, a business’ context doesn’t have to create such complexity, though when things are unchecked, when delays in decision-making as well as other typical events occur, when there’s no structure, strategy, coordinated effort, or any other important components, the chances for chaotic behavior are quite high with the pass of time. This is just a model to explain real life situations that seem similar on the surface but prove to be quite complex when diving deeper. That’s probably why a CDO’s role as tamer of complexity is important and challenging!

Previous Post <<||>> Next Post

12 October 2024

Bart Kosko - Collected Quotes

"A bell curve shows the 'spread' or variance in our knowledge or certainty. The wider the bell the less we know. An infinitely wide bell is a flat line. Then we know nothing. The value of the quantity, position, or speed could lie anywhere on the axis. An infinitely narrow bell is a spike that is infinitely tall. Then we have complete knowledge of the value of the quantity. The uncertainty principle says that as one bell curve gets wider the other gets thinner. As one curve peaks the other spreads. So if the position bell curve becomes a spike and we have total knowledge of position, then the speed bell curve goes flat and we have total uncertainty (infinite variance) of speed." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"Bivalence trades accuracy for simplicity. Binary outcomes of yes and no, white and black, true and false simplify math and computer processing. You can work with strings of 0s and 1s more easily than you can work with fractions. But bivalence requires some force fitting and rounding off [...] Bivalence holds at cube corners. Multivalence holds everywhere else." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"Fuzziness has a formal name in science: multivalence. The opposite of fuzziness is bivalence or two-valuedness, two ways to answer each question, true or false, 1 or 0. Fuzziness means multivalence. It means three or more options, perhaps an infinite spectrum of options, instead of just two extremes. It means analog instead of binary, infinite shades of gray between black and white." (Bart Kosko, "Fuzzy Thinking: The new science of fuzzy logic", 1993)

"The binary logic of modern computers often falls short when describing the vagueness of the real world. Fuzzy logic offers more graceful alternatives." (Bart Kosko & Satoru Isaka, "Fuzzy Logic,” Scientific American Vol. 269, 1993)

"A bit involves both probability and an experiment that decides a binary or yes-no question. Consider flipping a coin. One bit of in-formation is what we learn from the flip of a fair coin. With an unfair or biased coin the odds are other than even because either heads or tails is more likely to appear after the flip. We learn less from flipping the biased coin because there is less surprise in the outcome on average. Shannon's bit-based concept of entropy is just the average information of the experiment. What we gain in information from the coin flip we lose in uncertainty or entropy." (Bart Kosko, "Noise", 2006)

"A signal has a finite-length frequency spectrum only if it lasts infinitely long in time. So a finite spectrum implies infinite time and vice versa. The reverse also holds in the ideal world of mathematics: A signal is finite in time only if it has a frequency spectrum that is infinite in extent." (Bart Kosko, "Noise", 2006)

"Bell curves don't differ that much in their bells. They differ in their tails. The tails describe how frequently rare events occur. They describe whether rare events really are so rare. This leads to the saying that the devil is in the tails." (Bart Kosko, "Noise", 2006)

"Chaos can leave statistical footprints that look like noise. This can arise from simple systems that are deterministic and not random. [...] The surprising mathematical fact is that most systems are chaotic. Change the starting value ever so slightly and soon the system wanders off on a new chaotic path no matter how close the starting point of the new path was to the starting point of the old path. Mathematicians call this sensitivity to initial conditions but many scientists just call it the butterfly effect. And what holds in math seems to hold in the real world - more and more systems appear to be chaotic." (Bart Kosko, "Noise", 2006)

"'Chaos' refers to systems that are very sensitive to small changes in their inputs. A minuscule change in a chaotic communication system can flip a 0 to a 1 or vice versa. This is the so-called butterfly effect: Small changes in the input of a chaotic system can produce large changes in the output. Suppose a butterfly flaps its wings in a slightly different way. can change its flight path. The change in flight path can in time change how a swarm of butterflies migrates." (Bart Kosko, "Noise", 2006)

"I wage war on noise every day as part of my work as a scientist and engineer. We try to maximize signal-to-noise ratios. We try to filter noise out of measurements of sounds or images or anything else that conveys information from the world around us. We code the transmission of digital messages with extra 0s and 1s to defeat line noise and burst noise and any other form of interference. We design sophisticated algorithms to track noise and then cancel it in headphones or in a sonogram. Some of us even teach classes on how to defeat this nemesis of the digital age. Such action further conditions our anti-noise reflexes." (Bart Kosko, "Noise", 2006)

"Linear systems do not benefit from noise because the output of a linear system is just a simple scaled version of the input [...] Put noise in a linear system and you get out noise. Sometimes you get out a lot more noise than you put in. This can produce explosive effects in feedback systems that take their own outputs as inputs." (Bart Kosko, "Noise", 2006)

"Many scientists who work not just with noise but with probability make a common mistake: They assume that a bell curve is automatically Gauss's bell curve. Empirical tests with real data can often show that such an assumption is false. The result can be a noise model that grossly misrepresents the real noise pattern. It also favors a limited view of what counts as normal versus non-normal or abnormal behavior. This assumption is especially troubling when applied to human behavior. It can also lead one to dismiss extreme data as error when in fact the data is part of a pattern." (Bart Kosko, "Noise", 2006)

"Noise is a signal we don't like. Noise has two parts. The first has to do with the head and the second with the heart. The first part is the scientific or objective part: Noise is a signal. [...] The second part of noise is the subjective part: It deals with values. It deals with how we draw the fuzzy line between good signals and bad signals. Noise signals are the bad signals. They are the unwanted signals that mask or corrupt our preferred signals. They not only interfere but they tend to interfere at random." (Bart Kosko, "Noise", 2006)

"Noise is an unwanted signal. A signal is anything that conveys information or ultimately anything that has energy. The universe consists of a great deal of energy. Indeed a working definition of the universe is all energy anywhere ever. So the answer turns on how one defines what it means to be wanted and by whom." (Bart Kosko, "Noise", 2006)

"The central limit theorem differs from laws of large numbers because random variables vary and so they differ from constants such as population means. The central limit theorem says that certain independent random effects converge not to a constant population value such as the mean rate of unemployment but rather they converge to a random variable that has its own Gaussian bell-curve description." (Bart Kosko, "Noise", 2006)

"The flaw in the classical thinking is the assumption that variance equals dispersion. Variance tends to exaggerate outlying data because it squares the distance between the data and their mean. This mathematical artifact gives too much weight to rotten apples. It can also result in an infinite value in the face of impulsive data or noise. [...] Yet dispersion remains an elusive concept. It refers to the width of a probability bell curve in the special but important case of a bell curve. But most probability curves don't have a bell shape. And its relation to a bell curve's width is not exact in general. We know in general only that the dispersion increases as the bell gets wider. A single number controls the dispersion for stable bell curves and indeed for all stable probability curves - but not all bell curves are stable curves." (Bart Kosko, "Noise", 2006)

More quotes from Bart Kosko at QuotableMath.blogspot.com.

11 October 2024

Business Intelligence: Perspectives (Part VII: Creating Value for Organizations)

Business Intelligence Series
Business Intelligence Series

How does one create value for an organization in BI area? This should be one of the questions the BI professional should ask himself and eventually his/her colleagues on a periodic basis because the mere act of providing reports and good-looking visualizations doesn’t provide value per se. Therefore, it’s important to identify the critical to success and value drivers within each area!

One can start with the data, BI or IT strategies, when organizations invest the time in their direction, respectively with the considered KPIs and/or OKRs defined, and hopefully the organizations already have something similar in place! However, these are just topics that can be used to get a bird view over the overall landscape and challenges. It’s advisable to dig deeper, especially when the strategic, tactical and operational plans aren’t in sync, and let’s be realistic, this happens probably in many organizations, more often than one wants to admit!

Ideally, the BI professional should be able to talk with the colleagues who could benefit from having a set of reports or dashboards that offer a deeper perspective into their challenges. Talking with each of them can be time consuming and not necessarily value driven. However, giving each team or department the chance to speak their mind, and brainstorm what can be done, could in theory bring more value. Even if their issues and challenges should be reflected in the strategy, there’s always an important gap between the actual business needs and those reflected in formal documents, especially when the latter are not revised periodically. Ideally, such issues should be tracked back to a business goal, though it’s questionable how much such an alignment is possible in practice. Exceptions will always exist, no matter how well structured and thought a strategy is!

Unfortunately, this approach also involves some risks. Despite their local importance, the topics raised might not be aligned with what the organization wants, and there can be a strong case against and even a set of negative aspects related to this. However, talking about the costs involved by losing an opportunity can hopefully change the balance favorably. In general, transposing the perspective of issues into the area of their associated cost for the organization has (hopefully) the power to change people’s minds.

Organizations tend to bring forward the major issues, addressing the minor ones only after that, this having the effect that occasionally some of the small issues increase in impact when not addressed. It makes sense to prioritize with the risks, costs and quick wins in mind while looking at the broader perspective! Quick wins are usually addressed at strategic level, but apparently seldom at tactical and operational level, and at these levels one can create the most important impact, paving the way for other strategic measures and activities.

The question from the title is not limited only to BI professionals - it should be in each manager and every employee’s mind. The user is the closest to the problems and opportunities, while the manager is the one who has a broader view and the authority to push the topic up the waiting list. Unfortunately, the waiting lists in some organizations are quite big, while not having a good set of requests on the list might pinpoint that issues might exist in other areas!  

BI professionals and organizations probably know the theory well but prove to have difficulties in combining it with praxis. It’s challenging to obtain the needed impact (eventually the maximum effect) with a minimum of effort while addressing the different topics. Sooner or later the complexity of the topic kicks in, messing things around!

Previous Post <<||>> Next Post
Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.