SQL Troubles

04 December 2006

✏️Antony Unwin - Collected Quotes

"Data Visulization is related to Information Visualization, but there are important differences. Data Visualization is for exploration, for uncovering information, as well as for presenting information. It is certainly a goal of Data Visualization to present any information in the data, but another goal is to display the raw data themselves, revealing the inherent variability and uncertainty." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Deciding on which graphics to use is often a matter of taste. What one person thinks are good graphics for illustrating information may not appeal to someone else. It may also happen that different people interpret the same graphic in quite different ways." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Histograms use area to represent counts of a distribution. This makes them somewhat related to barcharts and mosaic plots, although the number or the width of the bins of a histogram is not determined a priori and the bins are drawn without gaps between them reflecting the continuous scale of the data. Whereas barcharts and mosaic plots show the exact distribution of the sample, a histogram is always just one approximation to the distribution of the data. Sometimes histograms are also used as crude density estimators for some 'true', but usually unknown, underlying distribution for the data. There are much better density estimation methods that produce smooth distribution displays." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"How would a million be visualized today? If you have ever drawn a histogram or a scatterplot of a million cases, you know that it is possible, but that there are problems. The screen resolution of a computer cannot be high enough to show very small bars in the histogram, and in regions of high density the scatterplots look like black blobs with huge numbers of points piled on top of one another. (It is noteworthy - and useful - that the weaknesses of the two kinds of plot arise at opposite extremes of the distributional densities.) So what should be visualized? If the distributional form of the bulk of the data is of interest, then the histogram will be fine for one-dimensional views (and it may give some information about outliers too). If individual outliers are of interest, then the scatterplot will be pretty good (and it will give a fair bit of distributional information as well). One aim might be described as global, attempting to summarise the main structure, and the other as local, attempting to identify individual features. Ideally, both kinds of plot are needed to satisfy both aims." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Largeness comes in different forms and has many different effects. Whereas some tasks remain easy, others become obstinately difficult. Largeness is not just an increase in dataset size. [...] Largeness may mean more complexity - more variables, more detail (additional categories, special cases), and more structure (temporal or spatial components, combinations of relational data tables). Again this is not so much of a problem with small datasets, where the complexity will be by definition limited, but becomes a major problem with large datasets. They will often have special features that do not fit the standard case by variable matrix structure well-known to statisticians." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Like parallel coordinates, networks are drawn with many lines, and so an increase in magnitude has a more dramatic effect on networks than it does on point or area plots. The main issue is not drawing optimal layouts but drawing informative and acceptable layouts fast enough to be useful. In particular, this chapter makes clear that having to analyze applications with a million nodes is not at all unusual. With trees, the task is different again. Large datasets do not lead to specially large trees, but complex datasets may lead to many, many trees, and the visualization here concentrates on the task of combining and summarizing the information from large numbers of trees. A broad range of innovative displays is introduced for these specialist tasks, though they all have their origins in existing plots." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Many different words can be used to describe graphic representations of data, but the overall aim is always to visualize the information in the data and so the term Data Visualization is the best universal term. Other terms have different connotations." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Mosaic plots […] are designed to show the dependencies and interactions between multiple categorical variables in one plot. […] . A spineplot can be regarded as a kind of one-dimensional mosaic plot. […] In contrast with a barchart, where the bars are aligned to an axis, the mosaic plot uses a rectangular region, which is subdivided into tiles according to the numbers of observations falling into the different classes. This subdivision is done recursively, or in statistical terms conditionally, as more variables are included." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Statistics has its own basic suite of domain-specific visualization tools. These statistical graphics can best be classified by the kind of data that they depict. Statistical data are usually characterized by their scale: nominal, ordinal (which are both categorical) or numerical (which is usually regarded as continuous). What is most important in distinguishing statistical graphics from other graphics is their universality: statistical graphics are not tailored towards only one specific application but are valid for any data measured on the appropriate scales." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Tables are fine for viewing sections of a dataset, but simple scrolling is no longer a practical navigational option." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"There are plenty of graphical displays that work well for small datasets and that can be found in the commonly available software packages, but they do not automatically scale up. Dotplots, scatterplots, and parallel coordinate plots all suffer from overplotting with large datasets; just think of drawing a scatterplot of a million points." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"The days of trawling through endless volumes of frequency tables for every variable and of contingency tables for every pair of variables are still sadly with us. Automatic filtering and storing of results are essential first steps to help analysts to concentrate on the important issues that require human input to interpret the result." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"The recursive construction of a mosaic plot means that the only limit for the number of variables included is the number of tiles to display, i.e. the number of possible combinations of the variables. […] If interactive queries are not available, the following strategy has proved to be helpful. Variables with only few categories should be put in the plot first, to keep the number of conditioned groups small. If one of the variables in the plot is a binary response, showing this variable via highlighting will reduce the number of tiles by half. Note that the gaps between the tiles are not part of the rectangular region that is used to build the tiles. The gaps are there to improve visual discrimination." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"The simplest way to plot univariate continuous data is a dotplot. Because the points are distributed along only one axis, overplotting is a serious problem, no matter how small the sample is. The usual technique to avoid overplotting is jittering, i.e., the data are randomly spread along a virtual second axis." (Antony Unwin et al [in "Graphics of Large Datasets: Visualizing a Million"], 2006)

"Clearly principles and guidelines for good presentation graphics have a role to play in exploratory graphics, but personal taste and individual working style also play important roles. The same data may be presented in many alternative ways, and taste and customs differ as to what is regarded as a good presentation graphic. Nevertheless, there are principles that should be respected and guidelines that are generally worth following. No one should expect a perfect consensus where graphics are concerned." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"Data visualization [...] expresses the idea that it involves more than just representing data in a graphical form (instead of using a table). The information behind the data should also be revealed in a good display; the graphic should aid readers or viewers in seeing the structure in the data. The term data visualization is related to the new field of information visualization. This includes visualization of all kinds of information, not just of data, and is closely associated with research by computer scientists." (Antony Unwin et al, "Introduction" [in "Handbook of Data Visualization"], 2008)

"For a given dataset there is not a great deal of advice which can be given on content and context. hose who know their own data should know best for their specific purposes. It is advisable to think hard about what should be shown and to check with others if the graphic makes the desired impression. Design should be let to designers, though some basic guidelines should be followed: consistency is important (sets of graphics should be in similar style and use equivalent scaling); proximity is helpful (place graphics on the same page, or on the facing page, of any text that refers to them); and layout should be checked (graphics should be neither too small nor too large and be attractively positioned relative to the whole page or display)." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"There are two main reasons for using graphic displays of datasets: either to present or to explore data. Presenting data involves deciding what information you want to convey and drawing a display appropriate for the content and for the intended audience. [...] Exploring data is a much more individual matter, using graphics to find information and to generate ideas.Many displays may be drawn. They can be changed at will or discarded and new versions prepared, so generally no one plot is especially important, and they all have a short life span." (Antony Unwin, "Good Graphics?" [in "Handbook of Data Visualization"], 2008)

"Eye-catching data graphics tend to use designs that are unique (or nearly so) without being strongly focused on the data being displayed. In the world of Infovis, design goals can be pursued at the expense of statistical goals. In contrast, default statistical graphics are to a large extent determined by the structure of the data (line plots for time series, histograms for univariate data, scatterplots for bivariate nontime-series data, and so forth), with various conventions such as putting predictors on the horizontal axis and outcomes on the vertical axis. Most statistical graphs look like other graphs, and statisticians often think this is a good thing." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Providing the right comparisons is important, numbers on their own make little sense, and graphics should enable readers to make up their own minds on any conclusions drawn, and possibly see more. On the Infovis side, computer scientists and designers are interested in grabbing the readers' attention and telling them a story. When they use data in a visualization (and data-based graphics are only a subset of the field of Infovis), they provide more contextual information and make more effort to awaken the readers' interest. We might argue that the statistical approach concentrates on what can be got out of the available data and the Infovis approach uses the data to draw attention to wider issues. Both approaches have their value, and it would probably be best if both could be combined." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"Statisticians tend to use standard graphic forms (e.g., scatterplots and time series), which enable the experienced reader to quickly absorb lots of information but may leave other readers cold. We personally prefer repeated use of simple graphical forms, which we hope draw attention to the data rather than to the form of the display." (Andrew Gelman & Antony Unwin, "Infovis and Statistical Graphics: Different Goals, Different Looks" , Journal of Computational and Graphical Statistics Vol. 22(1), 2013)

"[…] we do see a tension between the goal of statistical communication and the more general goal of communicating the qualitative sense of a dataset. But graphic design is not on one side or another of this divide. Rather, design is involved at all stages, especially when several graphics are combined to contribute to the overall picture, something we would like to see more of." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)

"Yes, it can sometimes be possible for a graph to be both beautiful and informative […]. But such synergy is not always possible, and we believe that an approach to data graphics that focuses on celebrating such wonderful examples can mislead people by obscuring the tradeoffs between the goals of visual appeal to outsiders and statistical communication to experts." (Andrew Gelman & Antony Unwin, "Tradeoffs in Information Graphics", Journal of Computational and Graphical Statistics, 2013)

✏️Naomi B Robbins - Collected Quotes

"Choose an aspect ratio that shows variation in the data." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Choose scales wisely, as they have a profound influence on the interpretation of graphs. Not all scales require that zero be included, but bar graphs and other graphs where area is judged do require it." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Creating a more effective graph involves choosing a graphical construction in which the visual decoding uses tasks as high as possible on the ordered list of elementary graphical tasks while balancing this ordering with consideration of distance and detection." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Distance and detection also play a role in our ability to decode information from graphs. The closer together objects are, the easier it is to judge attributes that compare them. As distance between objects increases, accuracy of judgment decreases. It is certainly easier to judge the difference in lengths of two bars if they are next to one another than if they are pages apart." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Graphs are for the forest and tables are for the trees. Graphs give you the big picture and show you the trends; tables give you the details." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Graphs are pictorial representations of numerical quantities. It therefore seems reasonable to expect that the visual impression we get when looking at a graph is proportional to the numbers that the graph represents. Unfortunately, this is not always the case." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"One graph is more effective than another if its quantitative information can be decoded more quickly or more easily by most observers. […] This definition of effectiveness assumes that the reason we draw graphs is to communicate information - but there are actually many other reasons to draw graphs." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"The principles of drawing effective graphs are the same no matter what the medium: strive for clarity and conciseness. However, since a reader may spend more time studying a written report than is possible during a presentation, more detail can be included." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Use a logarithmic scale when it is important to understand percent change or multiplicative factors. […] Showing data on a logarithmic scale can cure skewness toward large values." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Use a scale break only when necessary. If a break cannot be avoided, use a full scale break. Taking logs can cure the need for a break." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"We make angle judgments when we read a pie chart, but we don't judge angles very well. These judgments are biased; we underestimate acute angles (angles less than 90°) and overestimate obtuse angles (angles greater than 90°). Also, angles with horizontal bisectors (when the line dividing the angle in two is horizontal) appear larger than angles with vertical bisectors." (Naomi B Robbins, "Creating More effective Graphs", 2005)

03 December 2006

✏️Martin Theus - Collected Quotes

"Any conclusion drawn from an analysis of a transformed variable must be retranslated into the original domain - which is usually not an easy task. A special handling of outliers, be it a complete removal, or just visual suppression such as hot-selection or shadowing, must have a cogent motivation. At any rate, transformations of data are usually part of a data preprocessing step that might precede a data analysis. Also it can be motivated by initial findings in a data analysis which revealed yet undiscovered problems in the dataset." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Basically, one can distinguish three motivations for weighted data. The first is a technical motivation. Whenever we look at purely categorical data, it is not necessary to supply a dataset case by case. A breakdown summary can capture the dataset without loss of any information. […] The second situation in which weights are introduced is when sampling unequally from a population. Statistics and graphics must then account for the weights. A third reason to use weights is a change of the sampling population." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Choropleth maps are most effective when the range of the color-shading is fully used, i.e., the visual discrimination is maximized. A skewed distribution [...] will shrink the chosen colors to just a fraction of the possible color range. Using a continuously differentiable transformation function [...] is one way to expand the range of colors used. A more effective way to maximize the visual discrimination in a choropleth map is to transform the data to match a target distribution. One option is to force all colors to have the same frequency, i.e., to force the target distribution to be uniform. Another option is to force a normal target distribution. Obviously, the transfer function needed for this transformation is data dependent and piecewise linear." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Due to their recursive definition, switching the order of variables in a mosaic plot has a strong impact on what can be read from the plot. For instance, exchanging the two variables in a two-dimensional mosaic plot results in a completely new plot rather than in a mere graphically transposed version of the original plot." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Histograms are powerful in cases where meaningful class breaks can be defined and classes are used to select intervals and groups in the data. However, they often perform poorly when it comes to the visualization of a distribution." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Log-linear models aim at modeling interactions between more than just two variables. Depending on how many variables are investigated simultaneously and how many interactions are included in the model/data, different model types can be distinguished by simply looking at the corresponding mosaic plot. Each of these models exhibits a specific pattern in a mosaic plot. If there are less than four variables included in the model, the specific interaction-structure of a model can be read from the mosaic plot." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Mosaic plots are defined recursively, i.e., each variable that is introduced in a mosaic plot is plotted conditioned on the groups already established in the plot. As with barcharts, the area of bars or tiles is proportional to the number of observations (or the sum of the observation weights of a class). The direction along which bars are divided by a newly introduced variable is usually alternating, starting with the x-direction." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Mosaic plots become more difficult to read for variables with more than two or three categories. One way out is to assign a constant space for all possible crossings of categories. This way, the data from the r×c table are plotted in a table-like layout. Whereas this regular layout makes it much easier to compare values across rows and columns, the plot space is used less efficiently than in a mosaic plot." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Multivariate techniques often summarize or classify many variables to only a few groups or factors (e.g., cluster analysis or multi-dimensional scaling). Parallel coordinate plots can help to investigate the influence of a single variable or a group of variables on the result of a multivariate procedure. Plotting the input variables in a parallel coordinate plot and selecting the features of interest of the multivariate procedure will show the influence of different input variables." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"No other statistical graphic can hold so much information at a time than the parallel coordinate plot. Thus this plot is ideal to get an initial overview of a dataset, or at the very least a large subgroup of the variables." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"One big advantage of parallel coordinate plots over scatterplot matrices. (i.e., the matrix of scatterplots of all variable pairs) is that parallel coordinate plots need less space to plot the same amount of data. On the other hand, parallel coordinate plots with p variables show only p − 1 adjacencies. However, adjacent variables reveal most of the information in a parallel coordinate plot. Reordering variables in a parallel coordinate plot is therefore essential." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Parallel coordinate plots are often overrated concerning their ability to depict multivariate features. Scatterplots are clearly superior in investigating the relationship between two continuous variables and multivariate outliers do not necessarily stick out in a parallel coordinate plot. Nonetheless, parallel coordinate plots can help to find and understand features such as groups/clusters, outliers and multivariate structures in their multivariate context. The key feature is the ability to select and highlight individual cases or groups in the data, and compare them to other groups or the rest of the data." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Presentation graphics face the challenge to depict a key message in - usually a single - graphic which needs to fit very many observers at a time, without the chance to give further explanations or context. Exploration graphics, in contrast, are mostly created and used only by a single researcher, who can use as many graphics as necessary to explore particular questions. In most cases none of these graphics alone gives a comprehensive answer to those questions, but must be seen as a whole in the context of the analysis." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Raster maps - often also called raster images - represent measurements on a regular grid. They are usually a result of remote sensing techniques via satellites or airborne surveillance systems. They fit neither the construct of scatterplots nor that of maps. Nevertheless, both scatterplots and maps can be used to display raster maps within statistics software which has no extra GIS capabilities." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Shingling is the process of dividing a continuous variable into - possibly overlapping - intervals in order to convert a continuous variable into a discrete variable. Shingling is quite different from conditioning on categorical variables. Overlapping shingles/intervals lead to multiple representation of data within a trellis display, which is not the case for categorical variables. Furthermore, it is challenging to judge which intervals/cases have been chosen to build a shingle. Trellis displays represent the shingle interval visually by an interval of the strip label. Although no plotting space is wasted, the information on the intervals is difficult to read from the strip label. Despite these drawbacks, there is a valid motivation for shingling […]." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Spineplots have the nice property that highlighted proportions can be compared directly. However, it must be noted that the x axis in a spinogram is no longer linear. It is only piecewise linear within the bars. Although this might be confusing at first sight, it yields two interesting characteristics. Areas where only very few cases have been observed are squeezed together and thus get less visual weight. [...] Spineplots use normalized bar lengths while the bar widths are proportional to the number of cases in the category" (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Sorting data is one of the most efficient actions to derive different views of data in order to see the variables from many angles. Sorting is usually not applied to the data itself, but to statistical objects of a plot. We might want to sort the bars in a barchart, the variables in a parallel boxplot or the categories in a boxplot y by x." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"The problem of overplotting can be as severe that (smaller) groups can disappear completely, which will not only lead to quantitatively biased inferences, but even to qualitatively inappropriate conclusions." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"There are many reasons for the existence of missing values: the failure of a sensor, different recording standards for different parts of a sample, or structural differences of the objects observed that make it impossible to record all attributes for all observed instances." (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Trellis displays introduce the concept of shingling. Shingling is the process of dividing a continuous variable into - possibly overlapping - intervals in order to convert a continuous variable into a discrete variable. Shingling is quite different from conditioning on categorical variables. Overlapping shingles/intervals lead to multiple representation of data within a trellis display, which is not the case for categorical variables. Furthermore, it is challenging to judge which intervals/cases have been chosen to build a shingle. Trellis displays represent the shingle interval visually by an interval of the strip label. Although no plotting space is wasted, the information on the intervals is difficult to read from the strip label. Despite these drawbacks, there is a valid motivation for shingling," (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

"Trellis displays use a lattice-like arrangement to place plots onto so-called panels. Each plot in a trellis display is conditioned upon at least one other variable. The same scales are used in all the panel plots in order to make them comparable across rows and columns. […] Trellis displays are an ideal tool to compare models for different subsets. " (Martin Theus & Simon Urbanek, "Interactive Graphics for Data Analysis: Principles and Examples", 2009)

✏️Gene Zelazny - Collected Quotes

"[…] a chart is a picture of relationships, and only the picture counts. Everything else - titles, labels, scale values - merely identifies and explains. The most important feature of the picture is the impression you receive. Scaling has an important controlling effect on that impression." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"A component comparison can best be demonstrated using a pie chart. Because a circle gives such a clear impression of being a total, a pie chart is ideally suited for the one - and only - purpose it serves: showing the size of each part as a percentage of some whole, such as companies that make up an industry." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"A correlation comparison shows whether the relationship between two variables follows - or fails to follow - the pattern you would normally expect." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

“[…] any point from the data you wish to emphasize - will always lead to one of five basic kinds of comparison, which I’ve chosen to call component, item, time series, frequency distribution, and correlation." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"A component comparison can best be demonstrated using a pie chart. Because a circle gives such a clear impression of being a total, a pie chart is ideally suited for the one - and only - purpose it serves: showing the size of each part as a percentage of some whole, such as companies that make up an industr" (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"Choosing a chart form without a message in mind is like trying to color coordinate your wardrobe while blindfolded. Choosing the correct chart form depends completely on your being clear about what your message is. It is not the data - be they dollars, percentages, liters, yen, etc. - that determine the chart. It is not the measure - be it profits, return on investment, compensation, etc. - that determines the chart. Rather, it is your message, what you want to show, the specific point you want to make." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"Don’t necessarily settle for the first idea that grabs you. Keep looking, playing with the diagrams, so that you find the right fit." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"I’ve observed that the pie chart is the most popular. It shouldn’t be; it’s the least practical and should account for little more than 5 percent of the charts used in a presentation or report. On the other hand, the bar chart is the least appreciated. It should receive much more attention; it’s the most versatile and should account for as much as 25percent of all charts used. I consider the column chart to be 'good old reliable' and the line chart to be the workhorse; these two should account for half of all charts used. While possibly intimidating at first glance, the dot chart has its place 10 percent of the time." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"In choosing between a column and a line chart, you can also be guided by the nature of the data. A column chart emphasizes levels or magnitudes and is more suitable for data on activities that occur within a set period of time, suggesting a fresh start for each period. […] A line chart emphasizes movement and angles of change and is therefore the best form for showing data that have a 'carry-over' from one time to the next." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"In preparing bar charts, make certain that the space separating the bars is smaller than the width of the bars. Use the most contrasting color or shading to emphasize the important item, thereby reinforcing the message title." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"Naturally, scale values are used in practice, but omitting them should not obscure the relationship each chart illustrates. In fact, it is a good test of your own charts to see whether messages come across clearly without showing the scales. This does not mean that scaling considerations are unimportant to the design of charts. On the contrary, the wrong scale can lead to producing a chart that is misleading or worse, dishonest." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"[…] no matter what your message is, it will always imply one of the five kinds of comparison. It should come as no surprise that, no matter what the comparison is, it will always lead to one of the five basic chart forms: the pie chart, the bar chart, the column chart, the line chart, and the dot chart." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"The suggestions for making the most of bar charts also apply to column charts: make the space between the columns smaller than the width of the columns; and use color or shading to emphasize one point in time more than others or to distinguish, say, historical from projected data." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"When showing numbers, round out the figures and omit decimals whenever they have little effect on your message; […]" (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"When preparing a line chart, make sure the trend line is bolder than the baseline and that the baseline, in turn, is a little bit heavier than the vertical and horizontal scale lines that shape the reference grid." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

"Whenever the form becomes more important than the content - that is, whenever the design of the chart interferes with a clear grasp of the relationship - it does a disservice to the audience or readers who may be basing decisions on the strength of what they see." (Gene Zelazny. "Say It with Charts: The executive’s guide to visual communication" 4th Ed., 2001)

02 December 2006

✏️Howard Wainer - Collected Quotes

"Although arguments can be made that high data density does not imply that a graphic will be good, nor one with low density bad, it does reflect on the efficiency of the transmission of information. Obviously, if we hold clarity and accuracy constant, more information is better than less. One of the great assets of graphical techniques is that they can convey large amounts of information in a small space." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984)

"The essence of a graphic display is that a set of numbers having both magnitudes and an order are represented by an appropriate visual metaphor - the magnitude and order of the metaphorical representation match the numbers. We can display data badly by ignoring or distorting this concept." (Howard Wainer, "How to Display Data Badly", The American Statistician Vol. 38(2), 1984)

"The standard error of most statistics is proportional to 1 over the square root of the sample size. God did this, and there is nothing we can do to change it." (Howard Wainer, "Improving Tabular Displays, With NAEP Tables as Examples and Inspirations", Journal of Educational and Behavioral Statistics Vol 22 (1), 1997)

"[…] a graph is nothing but a visual metaphor. To be truthful, it must correspond closely to the phenomena it depicts: longer bars or bigger pie slices must correspond to more, a rising line must correspond to an increasing amount. If a graphical depiction of data does not faithfully follow this principle, it is almost sure to be misleading. But the metaphoric attachment of a graphic goes farther than this. The character of the depiction ism a necessary and sufficient condition for the character of the data. When the data change, so too must their depiction; but when the depiction changes very little, we assume that the data, likewise, are relatively unchanging. If this convention is not followed, we are usually misled." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"A graphic display has many purposes, but it achieves its highest value when it forces us to see what we were not expecting." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Nothing that had been produced before was even close. Even today, after more than two centuries of graphical experience, Playfair’s graphs remain exemplary standards for clearcommunication of quantitative phenomena. […] Graphical forms were available before Playfair, but they were rarely used to plot empirical information." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Oftentimes a statistical graphic provides the evidence for a plausible story, and the evidence, though perhaps only circumstantial, can be quite convincing. […] But such graphical arguments are not always valid. Knowledge of the underlying phenomena and additional facts may be required." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Placing a fact within a context increases its value greatly. […] . An efficacious way to add context to statistical facts is by embedding them in a graphic. Sometimes the most helpful context is geographical, and shaded maps come to mind as examples. Sometimes the most helpful context is temporal, and time-based line graphs are the obvious choice. But how much time? The ending date (today) is usually clear, but where do you start? The starting point determines the scale. […] The starting point and hence the scale are determined by the questions that we expect the graph to answer." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Simpson’s Paradox can occur whenever data are aggregated. If data are collapsed across a subclassification (such as grades, race, or age), the overall difference observed may not represent what is going on. Standardization can help correct this, but nothing short of random assignment of individuals to groups will prevent the possibility of yet another subclassificatiion, as yet unidentified, changing things around again. But I believe that knowing of the possibility helps us, so that we can contain the enthusiasm of our impulsive first inferences." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The appearance, and hence the perception, of any statistical graphic is massively influenced by the choice of scale. If the scale of the vertical axis is too narrow relative to the scale of the horizontal axis, random meanders look meaningful." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The difficult task of properly setting the scale of a graph remains difficult but not mysterious. There is agreement among experts spanning two hundred years. The default option should be to choose a scale that fills the plot with data. We can deviate from this under circumstances when it is better not to fill the plot with data, but those circumstances are usually clear. It is important to remember that the sin of using too small a scale is venial; the viewer can correct it. The sin of using too large a scale cannot be corrected without access to the original data; it can be mortal." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Usually the effectiveness of a good display increases with the complexity of the data. When there are only a few points, almost anything will do; even a pie chart with only three or four categories is usually comprehensible." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"Thus when we look at, or prepare, a time-based statistical graphic, it is important to ask what is the right time scale, the right context, for the questions of greatest interest. The answer to this question is sometimes complex, but the very act of asking it provides us with some protection against surprises." (Howard Wainer, "Graphic Discovery: A trout in the milk and other visuals" 2nd, 2008)

"The only thing we know for sure about a missing data point is that it is not there, and there is nothing that the magic of statistics can do change that. The best that can be managed is to estimate the extent to which missing data have influenced the inferences we wish to draw." (Howard Wainer, "14 Conversations About Three Things", Journal of Educational and Behavioral Statistics Vol. 35(1, 2010)

"For an analyst to willfully avoid learning about the science is akin to malfeasance. Of course, it is likely that a deep understanding both of the science and of data analytic methods does not reside in the same person. When it does not, data analysis should be done jointly. It is my understanding that data mining is not often done as a team. This is unfortunate, for then it is too easy to miss what might have been found." (Howard Wainer, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)

"Too often there is a disconnect between the people who run a study and those who do the data analysis. This is as predictable as it is unfortunate. If data are gathered with particular hypotheses in mind, too often they (the data) are passed on to someone who is tasked with testing those hypotheses and who has only marginal knowledge of the subject matter. Graphical displays, if prepared at all, are just summaries or tests of the assumptions underlying the tests being done. Broader displays, that have the potential of showing us things that we had not expected, are either not done at all, or their message is not able to be fully appreciated by the data analyst." (Howard Wainer, Comment, Journal of Computational and Graphical Statistics Vol. 20(1), 2011)

01 December 2006

✏️Kristen Sosulski - Collected Quotes

"A heat map is a graphical representation of a table of data. The individual values are arranged in a table/matrix and represented by colors. Use grayscale or gradient for coloring. Sorting of the variables changes the color pattern." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"A picture may be worth a thousand words, but not all pictures are readable, interpretable, meaningful, or relevant." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Avoid using irrelevant words and pictures. Only use charts that add to your message. […] In addition, words should be read or heard - not both. Decide which one supports the key takeaway for your audience." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Building on the prior knowledge of your audience can foster understanding. Ask yourself, what does my audience already know about the topic? What don’t they yet know?" (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Data graphics are used to show findings, new insights, or results. The data graphic serves as the visual evidence presented to the audience. The data graphic makes the evidence clear when it shows an interpretable result such as a trend or pattern. Data graphics are only as good as the insight or message communicated." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Ensure high contrast values for colors. Allow even those with a color vision deficiency or color blindness to distinguish the different shades by using contrasting colors. Convert graphs to grayscale or print them out in black and white to test contrast." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Pitfall #1: not sharing your work with others prior to your presentation [...]
Pitfall #2: lack of audience engagement [...]
Pitfall #3: little or no eye contact with the audience [...]
Pitfall #4: making your work unreadable (small font) [...]
Pitfall #5: over the time limit [...]
Pitfall #6: showing too much information on a single slide [...]
Pitfall #7: failing to use appropriate data graphics to show insights [...]
Pitfall #8: showing a chart without an explanation [...]
Pitfall #9: presenting a chart without a clear takeaway [...]
Pitfall #10: showing so many variables on a single visual display that they impair the readability of the chart or graph" (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Stories can begin with a question or line of inquiry." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Good data visualizations are persuasive graphics that help tell your data story. When you begin any visualization project, how do you know if your audience will understand your message? Your audience has input in the data visualization process. Consider what they already know and don’t know. Determine how you will support them in identifying and understanding your key points. " (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Use color only when it corresponds to differences in the data. Reserve color for highlighting a single data point or for differentiating a data series. Avoid thematic or decorative presentations. For example, avoid using red and green together. Be cognizant of the cultural meanings of the colors you select and the impact they may have on your audience." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"When there are few data points, place the data labels directly on the data. Data density refers to the amount of data shown in a visualization through encodings (points, bars, lines, etc.). A common mistake is presenting too much data in a single data graph. The data itself can obscure the insight. It can make the chart unreadable because the data values are not discernible. Examples include: overlapping data points, too many lines in a line chart, or too many slices in a pie chart. Selecting the appropriate amount of data requires a delicate balance. It is your job to determine how much detail is necessary." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

✏️Roy D G Allen - Collected Quotes

"A knowledge of statistical methods is not only essential for those who present statistical arguments it is also needed by those on the receiving end." (Roy D G Allen, "Statistics for Economists", 1951)

"All statistical data are subject to errors in collection." (Roy D G Allen, "Statistics for Economists", 1951)

"Any time series can now be plotted in two ways. Time is measured along the horizontal axis on a natural scale; the variable is measured along the vertical axis either on a natural or on a ratio scale. A graph of the second kind is the new construction; it is often called a semi-logarithmic graph since the ratio or logarithmic scale is used on one of the two axes of the graph."

"As with tabulation, however, skill in constructing diagrams is only acquired after long experience. The main point can be easily made; a graph or diagram should be clear and simple since it adds nothing to our understanding if it does not show up the trends and relations of our data more obviously than in the original tables. A chart is meant to 'help out' in drawing broad conclusions from a table which may be quite complicated. Inevitably the graph or diagram is less exact and shows less detail than the table; it is a step in the constant process of summarizing data. This must not be overdone. It is only too easy to simplify so drastically as to be misleading." (Roy D G Allen, "Statistics for Economists", 1951)

"Graphs and diagrams help to show up trends and relations but they do not define or measure them precisely. This can be achieved by calculations on the numerical data and, in particular, by the derivation of figures to summarize and relate the significant facts in a table. The main purpose of statistical analysis is to make comparisons. A single figure has no meaning by itself; it only becomes significant and "alive" when compared explicitly or implicitly with another figure. Our first task in analysis is to make the comparisons explicit, to express the relation between one figure and another." (Roy D G Allen, "Statistics for Economists", 1951)

"Graphs [for time series] can be misleading, however, and we need to subject our first impression to a closer scrutiny. We must develop more precise methods of analysis of time series. The variations of a time series are of many kinds which can be grouped under three heads. There is, first, the general direction of movement or the trend of the variable over the long period. Then there are oscillations of various types, of greater or less regularity, superimposed on the trend. Finally, there are residual or irregular variations which may arise from isolated events such as a war or general strike, or which may be due to the operation of random influences." (Roy D G Allen, "Statistics for Economists", 1951)

"It is only by experience that skill is acquired in the framing of tables. It is partly a matter of design, to get a neat and concise layout which is both cheap to print and easy on the eye. It is partly a question of making sure that no essential information is omitted so as to leave the meaning of the table uncertain." (Roy D G Allen, "Statistics for Economists", 1951)

"Not even the most subtle and skilled analysis can overcome completely the unreliability of basic data." (Roy D G Allen, "Statistics for Economists", 1951)

"One very simple but effective form of statistical analysis is to represent the tabular data by drawing graphs or diagrams. If made with skill and care in avoiding bias, a diagram will show the data in a graphical form in which the salient features leap to the eye. The risk is that diagrams can be misleading when drawn by the unskilled and they can be very dangerous tools in unscrupulous hands." (Roy D G Allen, "Statistics for Economists", 1951)

"Summarization of statistical data into tabular form is an art rather than a routine following a set of formal rules. Tabulation inevitably implies a loss of detail. The original data are far too voluminous to be appreciated and understood; the significant details are mixed up with much that is irrelevant. The art of tabulation lies in the sacrifice of detail which is less significant for the purposes in hand so that what is really important can be emphasized. Tabulation implies classification, the grouping of items into classes according to various characteristics. And classification depends on clear and precise definitions." (Roy D G Allen, "Statistics for Economists", 1951)

"The error in a sum or difference of any number of rounded figures is the sum of the errors in the separate figures. [...] The relative error in a product or quotient. of two rounded figures is approximately the sum of the relative errors in the separate figures. [...] It is generally safe to write a product or quotient as correct to one less significant figure than the less accurate of the two values in the product or quotient." (Roy D G Allen, "Statistics for Economists", 1951)

"The function of the regression lines, as approximate representations of means of arrays, is to isolate the mean value of one variable corresponding to any given value of the other; the variation of the first variable about its mean is ignored. A regression line is an average relation, and with it there is a variation of values about the average. In the regression of y on x, the variation ignored is in the vertical direction, a variation of y up and down about the line." (Roy D G Allen, "Statistics for Economists", 1951)

30 November 2006

🎯David Parmenter - Collected Quotes

"All good KPIs that I have come across, that have made a difference, had the CEO’s constant attention, with daily calls to the relevant staff. [...] A KPI should tell you about what action needs to take place. [...] A KPI is deep enough in the organization that it can be tied down to an individual. [...] A good KPI will affect most of the core CSFs and more than one BSC perspective. [...] A good KPI has a flow on effect." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"If the KPIs you currently have are not creating change, throw them out because there is a good chance that they may be wrong. They are probably measures that were thrown together without the in-depth research and investigation KPIs truly deserve." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Many management reports are not a management tool; they are merely memorandums of information. As a management tool, management reports should encourage timely action in the right direction, by reporting on those activities the Board, management, and staff need to focus on. The old adage “what gets measured gets done” still holds true." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Reporting to the Board is a classic 'catch-22' situation. Boards complain about getting too much information too late, and management complains that up to 20% of their time is tied up in the Board reporting process. Boards obviously need to ascertain whether management is steering the ship correctly and the state of the crew and customers before they can relax and 'strategize' about future initiatives. The process of assessing the current status of the organization from the most recent Board report is where the principal problem lies. Board reporting needs to occur more efficiently and effectively for both the Board and management." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Financial measures are a quantification of an activity that has taken place; we have simply placed a value on the activity. Thus, behind every financial measure is an activity. I call financial measures result indicators, a summary measure. It is the activity that you will want more or less of. It is the activity that drives the dollars, pounds, or yen. Thus financial measures cannot possibly be KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"'Getting it right the first time' is a rare achievement, and ascertaining the organization’s winning KPIs and associated reports is no exception. The performance measure framework and associated reporting is just like a piece of sculpture: you can be criticized on taste and content, but you can’t be wrong. The senior management team and KPI project team need to ensure that the project has a just-do-it culture, not one in which every step and measure is debated as part of an intellectual exercise." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"In order to get measures to drive performance, a reporting framework needs to be developed at all levels within the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Key performance indicators (KPIs) are those indicators that focus on the aspects of organizational performance that are the most critical for the current and future success of the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Key Performance Indicators (KPIs) in many organizations are a broken tool. The KPIs are often a random collection prepared with little expertise, signifying nothing. [...] KPIs should be measures that link daily activities to the organization’s critical success factors (CSFs), thus supporting an alignment of effort within the organization in the intended direction." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Most organizational measures are very much past indicators measuring events of the last month or quarter. These indicators cannot be and never were KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"The traditional balanced-scorecard (BSC) approach uses performance measures to monitor the implementation of the strategic initiatives, and measures are typically cascaded down from a top-level organizational measure such as return on capital employed. This cascading of measures from one another will often lead to chaos, with hundreds of measures being monitored by staff in some form of BSC reporting application." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"We need indicators of overall performance that need only be reviewed on a monthly or bimonthly basis. These measures need to tell the story about whether the organization is being steered in the right direction at the right speed, whether the customers and staff are happy, and whether we are acting in a responsible way by being environmentally friendly. These measures are called key result indicators (KRIs)." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Every day spent producing reports is a day less spent on analysis and projects." (David Parmenter)

🎯Zachary Karabell - Collected Quotes

"Culture is fuzzy, easy to caricature, amenable to oversimplifications, and often used as a catchall when all other explanations fail." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Defining an indicator as lagging, coincident, or leading is connected to another vital notion: the business cycle. Indicators are lagging or leading based on where economists believe we are in the business cycle: whether we are heading into a recession or emerging from one." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] economics is a profession grounded in the belief that 'the economy' is a machine and a closed system. The more clearly that machine is understood, the more its variables are precisely measured, the more we will be able to manage and steer it as we choose, avoiding the frenetic expansions and sharp contractions. With better indicators would come better policy, and with better policy, states would be less likely to fall into depression and risk collapse." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] humans make mistakes when they try to count large numbers in complicated systems. They make even greater errors when they attempt - as they always do - to reduce complicated systems to simple numbers." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"In the absence of clear information - in the absence of reliable statistics - people did what they had always done: filtered available information through the lens of their worldview." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Most people do not relate to or retain columns of numbers, however much those numbers reflect something that they care about deeply. Statistics can be cold and dull." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Our needs going forward will be best served by how we make use of not just this data but all data. We live in an era of Big Data. The world has seen an explosion of information in the past decades, so much so that people and institutions now struggle to keep pace. In fact, one of the reasons for the attachment to the simplicity of our indicators may be an inverse reaction to the sheer and bewildering volume of information most of us are bombarded by on a daily basis. […] The lesson for a world of Big Data is that in an environment with excessive information, people may gravitate toward answers that simplify reality rather than embrace the sheer complexity of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics are meaningless unless they exist in some context. One reason why the indicators have become more central and potent over time is that the longer they have been kept, the easier it is to find useful patterns and points of reference." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics are what humans do with the data they assemble; they are constructs meant to make sense of information. But the raw material is itself equally valuable, and rarely do we make sufficient use of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics represents the fusion of mathematics with the collection and analysis of data." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The concept that an economy (1) is characterized by regular cycles that (2) follow familiar patterns (3) illuminated by a series of statistics that (4) determine where we are in that cycle has become part and parcel of how we view the world." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The indicators - through no particular fault of anyone in particular - have not kept up with the changing world. As these numbers have become more deeply embedded in our culture as guides to how we are doing, we rely on a few big averages that can never be accurate pictures of complicated systems for the very reason that they are too simple and that they are averages. And we have neither the will nor the resources to invent or refine our current indicators enough to integrate all of these changes." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The search for better numbers, like the quest for new technologies to improve our lives, is certainly worthwhile. But the belief that a few simple numbers, a few basic averages, can capture the multifaceted nature of national and global economic systems is a myth. Rather than seeking new simple numbers to replace our old simple numbers, we need to tap into both the power of our information age and our ability to construct our own maps of the world to answer the questions we need answering." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"We don’t need new indicators that replace old simple numbers with new simple numbers. We need instead bespoke indicators, tailored to the specific needs and specific questions of governments, businesses, communities, and individuals." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Yet our understanding of the world is still framed by our leading indicators. Those indicators define the economy, and what they say becomes the answer to the simple question 'Are we doing well?'" (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

28 November 2006

🎯Piethein Strengholt - Collected Quotes

"For advanced analytics, a well-designed data pipeline is a prerequisite, so a large part of your focus should be on automation. This is also the most difficult work. To be successful, you need to stitch everything together." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"One of the patterns from domain-driven design is called bounded context. Bounded contexts are used to set the logical boundaries of a domain’s solution space for better managing complexity. It’s important that teams understand which aspects, including data, they can change on their own and which are shared dependencies for which they need to coordinate with other teams to avoid breaking things. Setting boundaries helps teams and developers manage the dependencies more efficiently." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"The logical boundaries are typically explicit and enforced on areas with clear and higher cohesion. These domain dependencies can sit on different levels, such as specific parts of the application, processes, associated database designs, etc. The bounded context, we can conclude, is polymorphic and can be applied to many different viewpoints. Polymorphic means that the bounded context size and shape can vary based on viewpoint and surroundings. This also means you need to be explicit when using a bounded context; otherwise it remains pretty vague." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"The transformation of a monolithic application into a distributed application creates many challenges for data management." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"A domain aggregate is a cluster of domain objects that can be treated as a single unit. When you have a collection of objects of the same format and type that are used together, you can model them as a single object, simplifying their usage for other domains." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Decentralization involves risks, because the more you spread out activities across the organization, the harder it gets to harmonize strategy and align and orchestrate planning, let alone foster the culture and recruit the talent needed to properly manage your data." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Enterprises have difficulties in interpreting new concepts like the data mesh and data fabric, because pragmatic guidance and experiences from the field are missing. In addition to that, the data mesh fully embraces a decentralized approach, which is a transformational change not only for the data architecture and technology, but even more so for organization and processes. This means the transformation cannot only be led by IT; it’s a business transformation as well." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"The data fabric is an approach that addresses today’s data management and scalability challenges by adding intelligence and simplifying data access using self-service. In contrast to the data mesh, it focuses more on the technology layer. It’s an architectural vision using unified metadata with an end-to-end integrated layer (fabric) for easily accessing, integrating, provisioning, and using data." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"The data mesh is an exciting new methodology for managing data at large. The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. It puts an emphasis on the human factor and addressing the challenges of managing the increasing complexity of data architectures." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

27 November 2006

🔢Jordan Morrow - Collected Quotes

"A data visualization, or dashboard, is great for summarizing or describing what has gone on in the past, but if people don’t know how to progress beyond looking just backwards on what has happened, then they cannot diagnose and find the ‘why’ behind it." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Along with the important information that executives need to be data literate, there is one other key role they play: executives drive data literacy learning and initiatives at the organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data fluency, as defined in this book, is the ability to speak and understand the language of data; it is essentially an ability to communicate with and about data. In different cases around the world, the term data fluency has sometimes been used interchangeably with data literacy. That is not the approach of this book. This book looks to define data literacy as the ability to read, work with, analyze, and communicate with data. Data fluency is the ability to speak and understand the language of data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy empowers us to know the usage of data and how an algorithm can potentially be misleading, biased, and so forth; data literacy empowers us with the right type of skepticism that is needed to question everything." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy is for the masses, and data visualization is powerful to simplify what could be very complicated." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy is not a change in an individual’s abilities, talents, or skills within their careers, but more of an enhancement and empowerment of the individual to succeed with data. When it comes to data and analytics succeeding in an organization’s culture, the increase in the workforces’ skills with data literacy will help individuals to succeed with the strategy laid in front of them. In this way, organizations are not trying to run large change management programs; the process is more of an evolution and strengthening of individual’s talents with data. When we help individuals do more with data, we in turn help the organization’s culture do more with data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"[...] data literacy is the ability to read, work with, analyze, and communicate with data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data science is, in reality, something that has been around for a very long time. The desire to utilize data to test, understand, experiment, and prove out hypotheses has been around for ages. To put it simply: the use of data to figure things out has been around since a human tried to utilize the information about herds moving about and finding ways to satisfy hunger. The topic of data science came into popular culture more and more as the advent of ‘big data’ came to the forefront of the business world." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data scientists are advanced in their technical skills. They like to do coding, statistics, and so forth. In its purest form, data science is where an individual uses the scientific method on data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data visualization is a simplified approach to studying data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Ensure you build into your data literacy strategy learning on data quality. If the individuals who are using and working with data do not understand the purpose and need for data quality, we are not sitting in a strong position for great and powerful insight. What good will the insight be, if the data has no quality within the model?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"I agree that data visualizations should be visually appealing, driving and utilizing the appeal and power for individuals to utilize it effectively, but sometimes this can take too much time, taking it away from more valuable uses in data. Plus, if the data visualization is not moving the needle of a business goal or objective, how effective is that visualization?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021

"I think sometimes organizations are looking at tools or the mythical and elusive data driven culture to be the strategy. Let me emphasize now: culture and tools are not strategies; they are enabling pieces." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"In the world of data and analytics, people get enamored by the nice, shiny object. We are pulled around by the wind of the latest technology, but in so doing we are pulled away from the sound and intelligent path that can lead us to data and analytical success. The data and analytical world is full of examples of overhyped technology or processes, thinking this thing will solve all of the data and analytical needs for an individual or organization. Such topics include big data or data science. These two were pushed into our minds and down our throats so incessantly over the past decade that they are somewhat of a myth, or people finally saw the light. In reality, both have a place and do matter, but they are not the only solution to your data and analytical needs. Unfortunately, though, organizations bit into them, thinking they would solve everything, and were left at the alter, if you will, when it came time for the marriage of data and analytical success with tools." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"One main reason descriptive analytics is so prevalent is the lack of data literacy skills that exist in the world. If one thinks about it, if you do not have a good understanding of how to use data, then how are you going to be good at the four levels of analytics?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Overall [...] everyone also has a need to analyze data. The ability to analyze data is vital in its understanding of product launch success. Everyone needs the ability to find trends and patterns in the data and information. Everyone has a need to ‘discover or reveal (something) through detailed examination’, as our definition says. Not everyone needs to be a data scientist, but everyone needs to drive questions and analysis. Everyone needs to dig into the information to be successful with diagnostic analytics. This is one of the biggest keys of data literacy: analyzing data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Statistics is a field of probabilities and sometimes probabilities do not go the way we want." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"The process of asking, acquiring, analyzing, integrating, deciding, and iterating should become second nature to you. This should be a part of how you work on a regular basis with data literacy. Again, without a decision, what is the purpose of data literacy? Data literacy should lead you as an individual, and organizations, to make smarter decisions." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"The reality is, the majority of a workforce doesn’t need to be data scientists, they just need comfort with data literacy." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"When it comes to data literacy learning, there is one key aspect to ensure the program and project works and is successful: the role of leadership. It’s unlikely a project will succeed if you fail to secure the full buy-in from those in charge." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"When we are empowered with skills in data literacy, we have the ability to understand where our data is going, how it is being utilized, and so forth. Then, we can make smarter, data literacy informed decisions with regards to how we log in, create accounts and so forth. Data literacy gives a direct empowerment towards our personal data usage." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

🔢Mike Fleckenstein - Collected Quotes

"A big part of data governance should be about helping people (business and technical) get their jobs done by providing them with resources to answer their questions, such as publishing the names of data stewards and authoritative sources and other metadata, and giving people a way to raise, and if necessary escalate, data issues that are hindering their ability to do their jobs. Data governance helps answer some basic data management questions." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"A data lake is a storage repository that holds a very large amount of data, often from diverse sources, in native format until needed. In some respects, a data lake can be compared to a staging area of a data warehouse, but there are key differences. Just like a staging area, a data lake is a conglomeration point for raw data from diverse sources. However, a staging area only stores new data needed for addition to the data warehouse and is a transient data store. In contrast, a data lake typically stores all possible data that might be needed for an undefined amount of analysis and reporting, allowing analysts to explore new data relationships. In addition, a data lake is usually built on commodity hardware and software such as Hadoop, whereas traditional staging areas typically reside in structured databases that require specialized servers." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Data governance presents a clear shift in approach, signals a dedicated focus on data management, distinctly identifies accountability for data, and improves communication through a known escalation path for data questions and issues. In fact, data governance is central to data management in that it touches on essentially every other data management function. In so doing, organizational change will be brought to a group is newly - and seriously - engaging in any aspect of data management." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Data is owned by the enterprise, not by systems or individuals. The enterprise should recognize and formalize the responsibilities of roles, such as data stewards, with specific accountabilities for managing data. A data governance framework and guidelines must be developed to allow data stewards to coordinate with their peers and to communicate and escalate issues when needed. Data should be governed cooperatively to ensure that the interests of data stewards and users are represented and also that value to the enterprise is maximized." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"In truth, all three of these perspectives - process, technology, and data - are needed to create a good data strategy. Each type of person approaches things differently and brings different perspectives to the table. Think of this as another aspect of diversity. Just as a multicultural team and a team with different educational backgrounds will produce a better result, so will a team that includes people with process, technology and data perspectives." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Lack of trust is closely associated with uncertainty about the quality of the data, such as its sourcing, content definition, or content accuracy. The issue is not only that the data source has quality issues, but that the issues that it may or may not have are unknown." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"The desire to collect as much data as possible must be balanced with an approximation of which data sources are useful to address a business issue. It is worth mentioning that often the value of internal data is high. Most internal data has been cleansed and transformed to suit the mission. It should not be overlooked simply because of the excitement of so much other available data." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Typically, a data steward is responsible for a data domain (or part of a domain) across its life cycle. He or she supports that data domain across an entire business process rather than for a specific application or a project. In this way, data governance provides the end user with a go-to resource for data questions and requests. When formally applied, data governance also holds managers and executives accountable for data issues that cannot be resolved at lower levels. Thus, it establishes an escalation path beginning with the end user. Most important, data governance determines the level - local, departmental or enterprise - at which specific data is managed. The higher the value of a particular data asset, the more rigorous its data governance." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

26 November 2006

🎯Cindi Howson - Collected Quotes

"A common misconception about BI standardization is the assumption that all users must use the same tool. It would be a mistake to pursue this strategy. Instead, successful BI companies use the right tool for the right user. For a senior executive, the right tool might be a dashboard. For a power user, it might be a business query tool. For a call center agent, it might be a custom application or a BI gadget embedded in an operational application."(Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"A key secret to making BI a killer application within your company is to provide a business intelligence environment that is flexible enough to adapt to a changing business environment at the pace of the business environment - fast and with frequent change." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"A key sign of successful business intelligence is the degree to which it impacts business performance." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Achieving a high level of data quality is hard and is affected significantly by organizational and ownership issues. In the short term, bandaging problems rather than addressing the root causes is often the path of least resistance." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Attracting the best people and keeping the BI team motivated is only possible when the importance of BI is recognized by senior management. When it’s not, the best BI people will leave." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Business intelligence tools can only present the facts. Removing biases and other errors in decision making are dynamics of company culture that affect how well business intelligence is used." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Communicate loudly and widely where there are data quality problems and the associated risks with deploying BI tools on top of bad data. Also advise the different stakeholders on what can be done to address data quality problems - systematically and organizationally. Complaining without providing recommendations fixes nothing." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Data quality is such an important issue, and yet one that is not well understood or that excites business users. It’s often perceived as being a problem for IT to handle when it’s not: it’s for the business to own and correct." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Depending on the extent of the data quality issues, be careful about where you deploy BI. Without a reasonable degree of confidence in the data quality, BI should be kept in the hands of knowledge workers and not extended to frontline workers and certainly not to customers and suppliers. Deploy BI in this limited fashion as data quality issues are gradually exposed, understood, and ultimately, addressed. Don’t wait for every last data quality issue to be resolved; if you do, you will never deliver any BI capabilities, business users will never see the problem, and quality will never improve." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Even if you have previously tried to engage tech-wary users and were met with a lackluster response, try again. Technical and information literacy is evolutionary. BI tools have gotten significantly easier to use with more interface options to suit diverse user requirements, even for users with less affinity for information technology." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Knowledge workers and BI experts must continually evaluate the reports, dashboards, alerts, and other mechanisms for disseminating factual information to ensure the design facilitates insight." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"I would argue that every BI deployment needs an OLAP component; not only is it necessary to facilitate analysis, but also it can significantly reduce the number of reports either IT developers or business users have to create." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"If you give users with low data literacy access to a business query tool and they create incorrect queries because they didn’t understand the different ways revenue could be calculated, the BI tool will be perceived as delivering bad data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Successful BI companies start with a vision - whether it’s to improve air travel, improve patient care, or drive synergies. The business sees an opportunity to exploit the data to fulfill a broader vision. The detail requirements are not precisely known. Creativity and exploration are necessary ingredients to unlock these business opportunities and fulfill those visions." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Successful business intelligence is influenced by both technical aspects and organizational aspects. In general, companies rate organizational aspects (such as executive level sponsorship) as having a higher impact on success than technical aspects. And yet, even if you do everything right from an organizational perspective, if you don’t have high quality, relevant data, your BI initiative will fail." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The data architecture is the most important technical aspect of your business intelligence initiative. Fail to build an information architecture that is flexible, with consistent, timely, quality data, and your BI initiative will fail. Business users will not trust the information, no matter how powerful and pretty the BI tools. However, sometimes it takes displaying that messy data to get business users to understand the importance of data quality and to take ownership of a problem that extends beyond business intelligence, to the source systems and to the organizational structures that govern a company’s data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The frustration and divide between the business and IT has ramifications far beyond business intelligence. Yet given the distinct aspect of this technology, lack of partnership has a more profound effect in BI’s success. As both sides blame one another, a key secret to reducing blame and increasing understanding is to recognize how these two sides are different." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The problem is when biases and inaccurate data also get filtered into the gut. In this case, the gut-feel decision making should be supported with objective data, or errors in decision making may occur." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"There is one crucial aspect of extending the reach of business intelligence that has nothing to do with technology and that is Relevance. Understanding what information someone needs to do a job or to complete a task is what makes business intelligence relevant to that person. Much of business intelligence thus far has been relevant to power users and senior managers but not to front/line workers, customers, and suppliers." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

🎯Margaret Y Chu - Collected Quotes

"An organization needs to know the condition and quality of its data to be more effective in fixing them and making them blissful. Unfortunately, pride, shame, and a fear of looking incompetent all play a part when people are asked to openly discuss dirty data issues. Because data are an asset, some people are unwilling to share their data. They think this gives them control and power over others. The role of politics in the organization is the dirty secret of dirty data." (Margaret Y Chu, "Blissful Data", 2004)

"Blissful data consist of information that is accurate, meaningful, useful, and easily accessible to many people in an organization. These data are used by the organization’s employees to analyze information and support their decision-making processes to strategic action. It is easy to see that organizations that have reached their goal of maximum productivity with blissful data can triumph over their competition. Thus, blissful data provide a competitive advantage." (Margaret Y Chu, "Blissful Data", 2004)

"Business rules should be simple and owned and defined by the business; they are declarative, indivisible, expressed in clear, concise language, and business oriented." (Margaret Y Chu, "Blissful Data", 2004)

"Clear goals, multiple strategies, clear roles and responsibilities, boldness, teamwork, speed, flexibility, the ability to change, managing risk, and seizing opportunities when they arise are important characteristics in gaining objectives." (Margaret Y Chu, "Blissful Data", 2004)

"[…] dirt and stains are more noticeable on white or light-colored clothing. In the same way, dirty data and data quality issues have existed for a long time. But due to the inherent nature of operational data these issues have not been as visible or immense enough to affect the bottom line. Just as dark clothing hides spills and stains, dirty data have been hidden or ignored in operational data for decades." (Margaret Y Chu, "Blissful Data", 2004)

"Gauging the quality of the operational data becomes an important first step in predicting potential dirty data issues for an organization. But many organizations are reluctant to commit the time and expense to assess their data. Some organizations wait until dirty data issues blow up in their faces. The greater the pain being experienced, the bigger the commitment to improving data quality." (Margaret Y Chu, "Blissful Data", 2004)

"[...] incomplete, inaccurate, and invalid data can cause problems for an organization. These problems are not only embarrassing and awkward but will also cause the organization to lose customers, new opportunities, and market share." (Margaret Y Chu, "Blissful Data", 2004)

"Let’s define dirty data as: ‘… data that are incomplete, invalid, or inaccurate’. In other words, dirty data are simply data that are wrong. […] Incomplete or inaccurate data can result in bad decisions being made. Thus, dirty data are the opposite of blissful data. Problems caused by dirty data are significant; be wary of their pitfalls." (Margaret Y Chu, "Blissful Data", 2004)

"Organizations must know and understand the current organizational culture to be successful at implementing change. We know that it is the organization’s culture that drives its people to action; therefore, management must understand what motivates their people to attain goals and objectives. Only by understanding the current organizational culture will it be possible to begin to try and change it." (Margaret Y Chu, "Blissful Data", 2004)

"Processes must be implemented to prevent bad data from entering the system as well as propagating to other systems. That is, dirty data must be intercepted at its source. The operational systems are often the source of informational data; thus dirty data must be fixed at the operational data level. Implementing the right processes to cleanse data is, however, not easy." (Margaret Y Chu, "Blissful Data", 2004)

"So business rules are just like house rules. They are policies of an organization and contain one or more assertions that define or constrain some aspect of the business. Their purpose is to provide a structure and guideline to control or influence the behavior of the organization. Further, business rules represent the business and guide the decisions that are made by the people in the organization." (Margaret Y Chu, "Blissful Data", 2004)

"Vision and mission statements are important, but they are not an organization’s culture; they are its goals. A vision is the ideal they are striving to achieve. There may be a huge gap between the ideal and the current state of actions and behaviors."(Margaret Y Chu, "Blissful Data", 2004)

"What management notices and rewards is the best indication of the organization’s culture." (Margaret Y Chu, "Blissful Data", 2004)

🔢James Serra - Collected Quotes

"A common data model (CDM) is a standardized structure for storing and organizing data that is typically used when building a data warehouse solution. It provides a consistent way to represent data within tables and relationships between tables, making it easy for any system or application to understand the data." (James Serra, "Deciphering Data Architectures", 2024)

"A data architecture defines a high-level architectural approach and concept to follow, outlines a set of technologies to use, and states the flow of data that will be used to build your data solution to capture big data. [...] Data architecture refers to the overall design and organization of data within an information system." (James Serra, "Deciphering Data Architectures", 2024)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules." (James Serra, "Deciphering Data Architectures", 2024)

"At its core, a data fabric is an architectural framework, designed to be employed within one or more domains inside a data mesh. The data mesh, however, is a holistic concept, encompassing technology, strategies, and methodologies." (James Serra, "Deciphering Data Architectures", 2024)

"Be aware that data product is not the same thing as data as a product. Data as a product describes the idea that data owners treat data as a fully contained product that they are responsible for, rather than a byproduct of a process that others manage, and should make the data available to other domains and consumers. Data product refers to the architecture of implementing data as a product." (James Serra, "Deciphering Data Architectures", 2024)

"Choosing the right data ingestion strategy is a significant business decision that partially determines how well your organization can leverage its data for business decision making and operations. The stakes are high; the wrong strategy can lead to poor data quality, performance issues, increased costs, and even regulatory compliance breaches." (James Serra, "Deciphering Data Architectures", 2024)

"Data governance is the overall management of data in an organization. It involves establishing policies and procedures for collecting, storing, securing, transforming, and reporting data." (James Serra, "Deciphering Data Architectures", 2024)

"Delta Lake is a transactional storage software layer that runs on top of an existing data lake and adds RDW-like features that improve the lake’s reliability, security, and performance. Delta Lake itself is not storage. In most cases, it’s easy to turn a data lake into a Delta Lake; all you need to do is specify, when you are storing data to your data lake, that you want to save it in Delta Lake format (as opposed to other formats, like CSV or JSON)." (James Serra, "Deciphering Data Architectures", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures." (James Serra, "Deciphering Data Architectures", 2024)

"The data fabric architecture is an evolution of the modern data warehouse (MDW) architecture: an advanced layer built onto the MDW to enhance data accessibility, security, discoverability, and availability. [...] The most important aspect of the data fabric philosophy is that a data fabric solution can consume any and all data within the organization." (James Serra, "Deciphering Data Architectures", 2024)

"The goal of any data architecture solution you build should be to make it quick and easy for any end user, no matter what their technical skills are, to query the data and to create reports and dashboards." (James Serra, "Deciphering Data Architectures", 2024)

"The term data lakehouse is a portmanteau (blend) of data lake and data warehouse. [...] The concept of a lakehouse is to get rid of the relational data warehouse and use just one repository, a data lake, in your data architecture." (James Serra, "Deciphering Data Architectures", 2024)

"With all the hype, you would think building a data mesh is the answer to all of these 'problems' with data warehousing. The truth is that while data warehouse projects do fail, it is rarely because they can’t scale enough to handle big data or because the architecture or the technology isn’t capable. Failure is almost always because of problems with the people and/or the process, or that the organization chose the completely wrong technology." (James Serra, "Deciphering Data Architectures", 2024)