SQL Troubles: authors

Showing posts with label authors. Show all posts

27 December 2016

✏️Karl G Karsten - Collected Quotes

"All of this information might be useful and even, for certain purposes, necessary. It is, so to speak, the statistical data of the question. But it yields no picture. A map or a globe gives us this mental picture almost in a flash. And that is precisely the use and service of a chart." (Carl Snyder, [in Karl G Karsten, "Charts and Graphs", 1925] 1923)

"A circular, like a square, area varies with the square of its linear measurements. If you make the radius of one circle twice as great as the radius of the other, the first area will be four times as great as the first. If you make the areas proportionate, the radii must be in the relation of 1 to the square root of 2. Both circle and square require the more or less tedious computation of square roots and repay this labor with inaccurate and ambiguous results." (Karl G Karsten, "Charts and Graphs", 1925)

"A curve cannot, however, always be used in the place of a bar-chart, for the line which connects the various points implies that the data itself can be considered connected. Much data can not be so considered. A careful inspection of the data will soon show whether it is connected or not, for the stubs of connected data always form a variable." (Karl G Karsten, "Charts and Graphs", 1925)

"A further detail of the 100% bar and its labelling, is the scale. This should generally be in hundredths or percents. The data may be entirely in absolute quantities, but nevertheless the scale should show percentages. To prevent the confusion of scale and divisions of the bar, the scale should be outside the bar, and the best practice seems to be to indicate the scale by little notches or short perpendicular lines dropped below the bar, from its lower edge." (Karl G Karsten, "Charts and Graphs", 1925)

"A quantity can always be illustrated by a straight line, or, as it is commonly called, a 'bar'. Bars are the simplest and often the best form of erate The total length of the line then represents the total value of the quantity. When we speak of a line in charting, we do not mean an imaginary straight line having neither width nor depth, for that would be invisible and could not, of course, be actually used in illustrations. In its place we use the bar, with a visible width (and the actual depth or thickness of a layer of ink). But it is still proper to speak of this bar as being a line or one-dimension chart, for its width and thickness are constants, necessary to give visibility to the line, and its length alone is significant." (Karl G Karsten, "Charts and Graphs", 1925)

"A series ot quantities or values can be most simply and often best shown by a series of corresponding lines or bars. All bars being drawn against one and the same scale, their lengths vary with the amounts which they represent." (Karl G Karsten, "Charts and Graphs", 1925)

"Another principle which will quickly appeal to your common sense, is the rule that when zero is real, the zero-line should be extra heavy to make it prominent. Remember that it takes the place of the floor or lower end of the bars in the bar-chart. It should stand out, therefore, in such a way that the reader can easily grasp its significance and compare with it the heights of the points on the curve. The rule is particularly important in cases where the chart extends down below the zero line into the negative side in order to show negative and positive values. On the same principle the 100% line, when it occurs in a chart, should be similarly heavy as it also may be considered a base for zero points, being the point of zero loss or gain. In fact, the rule may be extended to all cases of lines showing significant constant values, and the zero line should not be heavy, unless it has a special significance." (Karl G Karsten, "Charts and Graphs", 1925)

"Bar-charts are most flexible and can be varied to suit the individual whims of the maker. In general, however, there is one style or form which will be found most satisfactory. It consists of a horizontal grouping of bars alongside of the data. The chart is arranged in tabular form, with items or stubs in a column to the left, with figures in a column beside the stubs and with bars in a column beside the figures. Several columns of figures are sometimes desirable, just as in the table of data, to show sources or original figures from which the charted figures are obtained. In any case, the bars should represent the most important set or column of figures, and there should be normally but one column of bars."(Karl G Karsten, "Charts and Graphs", 1925)

"Having confessed so little patience with the doctrine of the incomprehensible per se, we have naturally sought to empty the entire bag of tricks, and to tell the whole story of the chart in the simplest words that we command. Our belief has been that it is a lesser sin to be too easily understood than never understood at all. But at the same time, we have sought to make the story full and complete." (Karl G Karsten, "Charts and Graphs", 1925)

"Having prepared your data, you will next decide upon a 'scale’ or ratio of reduction to use in the drawing, that is, what value or distance on the actual floor shall be represented by each space or distance between lines on the paper. It is important to pick a scale which is neither too large nor too small, so that the drawing will be the right size on the sheet." (Karl G Karsten, "Charts and Graphs", 1925)

"In all chart-making, the material to be shown must be accurately compiled before it can be charted. For an understanding of the classification chart, we must delve somewhat into the mysteries of the various methods of classification and indexing. The art of classifying calls into play the power of visualizing a 'whole' together with all its 'parts'. Even in the most exact science, it is not always easy to break up a whole into a complete set of the distinct, mutually exclusive parts which together exactly compose it." (Karl G Karsten, "Charts and Graphs", 1925)

"In fact, it can be laid down as a general rule that both the compound and the multiple bar-charts are too elaborate and complicated. A chart is always better the simpler it is, and we should make strong efforts to simplify these charts, and if possible reduce them to simple bar-charts. It usually pays well for sacrifices we make in this way, in legibility and interest to the reader, and after all, the chart of this type 1s generally directed at a reader, rather than at the maker." (Karl G Karsten, "Charts and Graphs", 1925)

"In short, the pie-chart appears to be a two-dimension (area) chart used for one-dimension data. The fact is, however, that, as in the case of the 100% bar, the area of the chart varies directly with one dimension, the other dimension being constant. In the 100% bar the width of the bar was constant in the 100% circle the radius must be constant for all circles compared. Then the area of the segments varies directly with their arcs or angles and the chart has but one significant dimension. It is only an apparent exception to the rule." (Karl G Karsten, "Charts and Graphs", 1925)

"In short, the rule that no more dimensions or axes should be used in the chart than the data calls for, is fundamental. Violate this rule and you bring down upon your head a host of penalties. In the first place, you complicate your computing processes, or else achieve a grossly deceptive chart. If your chart becomes deceptive, it has defeated its purpose, which was to represent accurately. Unless, of course, you intended to deceive, in which case we are through with you and leave you to Mark Twain’s mercies. If you make your chart accurate, at the cost of considerable square or cube root calculating, you still have no hope, for the chart is not clear; your reader is more than likely to misunderstand it. Confusion, inaccuracy and deception always lie in wait for you down the path departing from the principle we have discussed - and one of them is sure to catch you." (Karl G Karsten, "Charts and Graphs", 1925)

"In short, the scales on which a curve is drawn can affect very much our impressions of the data by magnifying or minimizing the apparent movements of the curve itself. Of course, this does not mean that the relative height from the base-line of the various points on the curve have been altered. If you have been careful to show the base-line always, the base-line itself will approach nearer to the curve as the vertical scale is reduced and the wiggles are flattened out, and will recede farther from the curve as the vertical scale is enlarged and the wiggles are exaggerated. But it means that the oscillation or fluctuation of the curve will have been made to appear more violent or milder according as either of the scales is changed. And it therefore behooves us to give serious thought to the matter of scales before’ we determine upon them finally for any particular chart. As a matter of fact, we may have to try out several combinations of scales before we find one which gives just the right amount of emphasis to curve fluctuations to suit us." (Karl G Karsten, "Charts and Graphs", 1925)

"In the labelling of the pie-chart, you will furthermore encounter typographical difficulties. It is not ordinarily a good thing to make a reader crane his neck at various angles to read writing along every point of the compass, so you should not, as so many do, write on radii from the center of the circle. On the other hand, unless the chart and its segments are very large as compared with the size of the printing, you will introduce tricky optical illusions if you write all labels in the same directions inside the segments." (Karl G Karsten, "Charts and Graphs", 1925)

"Moreover in the pipe-organ cr vertical-bar chart, we first encounter labelling or data difficulties. And if there is one motto which we should like to print at the bottom of every page in bold-face type, as do the publishers of other valuable reference-books, it is this: 'Never separate your chart from its data'. On the contrary, incorporate the data in the chart. For a chart without its data is a poor lost thing indeed. And the unhappy reader wishing to know what it means must hunt and hunt and hunt till he locates the particular information in some distant table. As a matter of fact, he won’t do it, for before he has found his data he has lost his interest in the matter, and then what good is your chart." (Karl G Karsten, "Charts and Graphs", 1925)

"Most of the good things in this world involve some sacrifice. Curves are no exception. In a curve the direct visible connection between the curve itself and the zero line, or x-axis, is sacrificed. As time goes on and you become more and more used to the curve chart, you will begin to think of its values as in some mysterious manner floating disembodied along the connecting line which forms the curve. You will be tempted to forget that the quantities rest very substantially upon the floor (base line, zero line, x-axis or whatever you want to call it), and that it is only their tops which reach the points plotted in the curve. And forgetting this, you will try to save space by omitting the zero line and lower part of the chart, and by showing only that small portion or band of the chart through which the plotted curve travels." (Karl G Karsten, "Charts and Graphs", 1925)

"Multiple curves are far better than multiple bar charts. A number of curves wiggling across the page at the tops of invisible bars are eminently more satisfactory than actual bars interlarded. In the first place, comparison of several series of data is greatly facilitated in curves - because each set has been condensed and simplified into a single line. There is no difficulty in comparing values of each series with each other. In the second place, such a comparison is more accurate in curves because all similar points on various sets or series have been brought together upon a single vertical line." (Karl G Karsten, "Charts and Graphs", 1925)

"Note also, and this is important, that if through standing too close you should take a picture showing only the upper ends of the upright boards, but not their full lengths, you would consider the resulting picture not only a failure but actually deceptive. In other words, you must not omit the zero-line or base-line. While you would succeed in showing the variation of the top ends more clearly you would no longer have comparable lengths." (Karl G Karsten, "Charts and Graphs", 1925)

"Now figures are not in themselves necessarily dry and dull - in fact the figures of your bank-account may be very engrossing to you. But figures on uninteresting subjects are a sure cure for insomnia, to all of us. And it goes without saying that if the figures are not of consequence, the chart of these figures will deserve equally little attention. The point is that a chart is as weak as its own data, and a chart-maker must carefully weigh and consider his data before permitting himself the pleasure of illustrating them with a chart." (Karl G Karsten, "Charts and Graphs", 1925)

"The advantage of the pie-chart is psychological. It instantly commands the reader’s attention. A circle is, of all geometrical patterns, the easiest resting spot for the eye. The fact is well known to advertisers, who frequently use circles and circular outlines to draw attentica to their advertisements. Hence if your chart is designed for publication, or for presenta tion to readers whose attention may be easily diverted, you will find the pie-chart a powerful means for presenting your facts. Attention will be focused upon it at once, and it is as simple to understand as its name - far too simple for anyone to misunderstand. Because it is circular, there is no question but that it represents a whole and the various slices of the pie belong to their respective items." (Karl G Karsten, "Charts and Graphs", 1925)

"The chief value of the 'pipe-organ char' [aka bar chart] as it is sometimes called, lies in the realistic picture it gives of quantities. From a base line these quantities are seen to rise the full length of the bars, as so much substantial material stacked neatly in piles where we can compare them. We view them from the ‘level or floor on which they are piled. We do not have to climb up and get a bird’s-eye view of them as in the ordinary bar-chart, where we seem to be looking down upon rows and rows of goods, but we see them from a natural view-point. Nor do we rely upon an arbitrary arrangement by which their left ends have been brought together as in the bar-chart, but we know instantly that if they are piled up, it is their tops which we must watch. The pipe-organ chart finds instant response in our minds, and appeals to us as both logical and natural. A child can comprehend it." (Karl G Karsten, "Charts and Graphs", 1925)

"The disadvantages of the pie-chart are many. It is worthless for study and research purposes. In the first place, the human eye cannot easily compare as to length the various arcs about the circle, lying as they do in different directions. In the second place, the human eye is not naturally skilled at comparing angles - those angles at the center of the circle, formed by the various rays or radii and subtending the various arcs. In the third place, the human eye is not an expert judge of comparative sizes of areas, especially those as irregular as the segments of parts of the circle. There is no way by which the parts of this round unit can be compared so accurately and quickly as the parts of a straight line or bar. Moreover, when, as frequently happens, several pie-charts are shown together, the various slices in one chart cannot be so easily compared with the corresponding slices in the next, as can the various parts of one 100% bar with corresponding parts of another bar." (Karl G Karsten, "Charts and Graphs", 1925)

"The division of a 'whole' into its 'parts' is logically one of the first steps in any analysis. Usually the graph illustrating this division belongs at the beginning of a statistical report. Thus, if your report covers the sales of the company, your first chart would break up total sales into the individual sales for each line or for each district. The remainder of the report, treating of details of the various 'parts' (e.g., lines or districts) will then follow a summary chart which has established their relative importance." (Karl G Karsten, "Charts and Graphs", 1925)

"The greatest contribution to chart-making, from any single source, is the Gantt Progress Chart. This chart is, unquestionably, the most powerful graphic device for business and for all executive and managerial purposes. While the description has been rather full, as given herein, it is by no means complete; and the Gantt charting methods, in all their co-ordinated ramifications, constitute an independent system of accounting and of executive control,in this [...]" (Karl G Karsten, "Charts and Graphs", [preface] 1925)

"The technique of bar-charts is so simple and they are so very effective, that they should be used freely in printed text-matter. No drawing or plates are needed. Printers have 'rules' as they call them, which can be used to make solid bars, and these rules can easily be set up together with the type. The scale and field can be omitted and the bars alone will effectively tell the story of the main figures in the table. The combined table and chart can be used in printed text just as well as the table alone." (Karl G Karsten, "Charts and Graphs", 1925)

"These apparently arbitrary rules of thumb are justified only so long as they serve to produce the best results. Your real purpose is to show the data most clearly and simply, either to yourself or to someone else. The chart is a window, as it were, through which the reader looks out upon an illuminating picture of the facts he is considering. Through this window he sees, if you like, a chain of mountains, whose height tells him the values or quantities he is considering. That he may see them to the best advantage, the window must be low enough for him to see the base of the mountain-range and high enough for him to see at least some sky above the highest peak. In general, the best view of the mountains would show neither too much nor too little clear sky above. And if the window is crossed with a framework for small window-panes, he can further judge of heights by the crisscross window-pane lines. Your curve is the silhouette of that mountain-range, your field the tiny window-pane outlines, and you, the chart-maker, must use your own judgment and artistic sense to place the reader’s chair near or far, high or low, in front of that window, to give him the clearest view." s it were, through which the reader looks out upon an illu- minating picture of the facts he is considering. Through this window he sees, if you like, a chain of mountains, whose height tells him the values or quantities he is considering. That he may see them to the best advantage, the window must be low enough for him to see the base of the mountain-range" (Karl G Karsten, "Charts and Graphs", 1925)

"This practice of omitting the zero line is all too common, but it is not for that reason excusable. The amputated chart is a deceptive one, tempting the average reader to compare the heights of points on the curve from the false bottom of the amputated chart-field, rather than from the true zero line, far below and invisible. A curve-chart without a zero line is in general no whit less of a printed lie, than a vertical bar-chart in which the lower part of the bars themselves are cut away. The representation of comparative sizes has been distorted and the fluctuations (changes in value) exaggerated." (Karl G Karsten, "Charts and Graphs", 1925)

"Throughout your study of charts you will find some which are more useful for popular consumption than others, but you will not find many which are more purely popular in appeal than the 100% circle or pie diagram. For analytical purposes it has nothing to recommend it, but for sensational values it is in general without an equal." (Karl G Karsten, "Charts and Graphs", 1925)

"To make a bar-chart popular, knock it over flat on its side, so that the bars stand up on end. Simple, isn’t it? But that’s the rule. There being nothing more to discuss in the matter of making popular bar-charts, we are tempted to close the dis- cussion at this point and produce a pleasant surprise to all. But the vertical bar-chart [aka column chart] is rich in suggestions for the higher forms of charts which we are approaching, and it deserves a close study." (Karl G Karsten, "Charts and Graphs", 1925)

"We have so consistently inveighed against the use of areas to illustrate quantities that the reader will indeed be surprised at some coming retractions. [...] But the fact is that we now propose to turn to advantage the very feature of areas which has previously been their greatest fault. [...] We now come to data in which we wish to show simultaneously three ratios or sets of ratios, one of which is always the product of the other two. In other words, we wish to show two factors or sets of factors and their product." (Karl G Karsten, "Charts and Graphs", 1925)

"When several curves are shown upon the same chart, it is often desirable to use different scales for them. That is, the same horizontal lines may be given two or even more different values for different curves. But even in these cases, it is better to place both scales, once and for all, at the left hand side. The practise of placing one of these scales at the right hand side, and another at the left hand side, has little to recommend it. Theoretically, at least, the left hand end of your chart is normally the y-axis itself, and the scale or ‘scales should logically be attached immediately thereto. In practice this logical position is justified." (Karl G Karsten, "Charts and Graphs", 1925)

26 December 2016

✏️Emile Cheysson - Collected Quotes

"If statistical graphics, although born just yesterday, extends its reach every day, it is because it replaces long tables of numbers and it allows one not only to embrace at glance the series of phenomena, but also to signal the correspondence or anomalies, to find the causes, to identify the laws." (Émile Cheysson, circa 1877)

"Geometric statistics compel the merchant who wishes to consult it to undertake a careful self-examination and deep investigation—steps he might not have felt necessary without this pressing summons. Indeed, this may be one of the method’s greatest benefits: it forces him to scrutinize countless factors that surround him daily yet go unnoticed, and to become aware of all the elements that, sometimes without his knowledge, influence the final outcome. It does not settle for approximations; before offering its insights, it demands to be informed with both abundance and accuracy." (Emile Cheysson, "La Statistique géométrique", 1888)

"It is this combination of observation at the foundation and geometry at the summit that I wished to express by naming this method Geometric Statistics. It cannot be subject to the usual criticisms directed at the use of pure mathematics in economic matters, which are said to be too complex to be confined within a formula." (Emile Cheysson, "La Statistique géométrique", 1888)

"It then becomes a method of graphical interpolation or extrapolation, which involves hypothetically extending a curve within or beyond the range of known data points, assuming the continuity of its pattern. In this way, one can fill in gaps in past observations and even probe the depths of the future." (Emile Cheysson, "La Statistique géométrique", 1888)

"This method is what I call Geometric Statistics. But despite its somewhat forbidding name-which I’ll explain in a moment - it is not a mathematical abstraction or a mere intellectual curiosity accessible only to a select few. It is intended, if not for all merchants and industrialists, then at least for that elite who lead the masses behind them. Practice is both its starting point and its destination. It was inspired in me more than fifteen years ago by the demands of the profession, and if I’ve decided to present it today, it’s because I’ve since verified its advantages through various applications, both in private industry and in public service." (Emile Cheysson,"La Statistique géométrique", 1888)

"Whenever it is a matter of resolving delicate questions where the solution depends on contradictory elements whose outcome is difficult to determine, Geometric Statistics has a clear role to play and can intervene usefully." (Emile Cheysson,"La Statistique géométrique", 1888)

"Graphical statistics thus possess a variety of resources that it deploys depending on the case, in order to find the most expressive and visually appealing way to depict the phenomenon. One must especially avoid trying to convey too much at once and becoming obscure by striving for completeness. Its main virtue - or one might say, its true reason for being - is clarity. If a diagram becomes so cluttered that it loses its clarity, then it is better to use the numerical table it was meant to translate." (Emile Cheysson, "Albume de statistique graphique", 1889)

"This method not only has the advantage of appealing to the senses as well as to the intellect, and of illustrating facts and laws to the eye that would be difficult to uncover in long numerical tables. It also has the privilege of escaping the obstacles that hinder the easy dissemination of scientific work - obstacles arising from the diversity of languages and systems of weights and measures among different nations. These obstacles are unknown to drawing. A diagram is not German, English, or Italian; everyone immediately grasps its relationships of scale, area, or color. Graphical statistics are thus a kind of universal language, allowing scholars from all countries to freely exchange their ideas and research, to the great benefit of science itself." (Emile Cheysson, "Albume de statistique graphique", 1889)

"Today, there is hardly any field of human activity that does not make use of graphical statistics. Indeed, it perfectly meets a dual need of our time: the demand for information that is both rapid and precise. Graphical methods fulfill these two conditions wonderfully. They allow us not only to grasp an entire series of phenomena at a glance, but also to highlight relationships or anomalies, identify causes, and extract underlying laws. They advantageously replace long tables of numbers, so that - without compromising the precision of statistics - they broaden and popularize its benefits." (Emile Cheysson, "Albume de statistique graphique", 1889)

"When a law is contained in figures, it is buried like metal in an ore; it is necessary to extract it. This is the work of graphical representation. It points out the coincidences, the relationships between phenomena, their anomalies, and we have seen what a powerful means of control it puts in the hands of the statistician to verify new data, discover and correct errors with which they have been stained." (Emile Cheysson, "Les methods de la statistique", 1890)

Sources: Bibliothéque Nationale de la France [>>]

23 November 2016

🔢Aniruddha Deswandikar - Collected Quotes

"A data contract is a definition of how two parties (the producer and the consumer) must exchange data. It defines the structure, format, service level agreement, sensitivity, and any other information that could be important for the producer or the consumer of the data." (Aniruddha Deswandikar, "Engineering Data Mesh in Azure Cloud", 2024)

"A data mesh is an architectural pattern implemented on top of a standardized enterprise cloud infrastructure." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"A data mesh splits the boundaries of the exchange of data into multiple data products. This provides a unique opportunity to partially distribute the responsibility of data security. Each data product team can be made responsible for how their data should be accessed and what privacy policies should be applied." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"A data quality solution cannot be taken up as one large project. It needs to be built brick by brick. Implement the easy checks first and then tackle the complex quality requirements." (Aniruddha Deswandikar, "Engineering Data Mesh in Azure Cloud", 2024)

"Accuracy questions whether the value of the data is as it should be. [...] Completeness indicates whether all the necessary data has been included. [...] Consistency means that the data is consistent across systems. [...] Timeliness is about how recent the data is. All data has an original source. [...] Validity refers to whether the data is valid. Data must adhere to some business rules. [...] Uniqueness means that the data should only appear once in a dataset. [...] Reliability ensures that data is being collected from the source of truth or from another verified data source." (Aniruddha Deswandikar, "Engineering Data Mesh in Azure Cloud", 2024)

"Authentication means validating a user by using credentials to ensure that they are a valid user on the enterprise system. Authorization validates their rights to access a particular resource or perform certain operations on it. [...] Authorization is the process of granting or denying a set of actions that can be performed on a resource based on a set of permissions." (Aniruddha Deswandikar, "Engineering Data Mesh in Azure Cloud", 2024)

"Data security is about preventing unauthorized access to data and the policies and methods surrounding this access. It also protects the system from hackers and malicious users who could steal data. Data privacy, on the other hand, is about collecting, retaining, and recycling personal and sensitive data. There could be a few overlaps between security and privacy." (Aniruddha Deswandikar, "Engineering Data Mesh in Azure Cloud", 2024)

"Federation is about providing autonomy to each data product owner to make their own decisions about the storage, computing, and sharing of data. However, this autonomy cannot come at a risk to the security and compliance standards of the company." (Aniruddha Deswandikar, "Engineering Data Mesh in Azure Cloud", 2024)

"To explain a data mesh in one sentence, a data mesh is a centrally managed network of decentralized data products. The data mesh breaks the central data lake into decentralized islands of data that are owned by the teams that generate the data. The data mesh architecture proposes that data be treated like a product, with each team producing its own data/output using its own choice of tools arranged in an architecture that works for them. This team completely owns the data/output they produce and exposes it for others to consume in a way they deem fit for their data." (Aniruddha Deswandikar, "Engineering Data Mesh in Azure Cloud", 2024)

05 December 2011

✏️Nancy Organ - Collected Quotes

"A line graph looks similar to a scatterplot, but each point is connected to form a wiggly line that runs from left to right. The values on the x-axis are either ordinal or numerical data that tell us the order of each data point. The connections between each point make it easier to see how much the values on the y-axis change from one point to the next. Because line charts show data in a particular order, a line in a line chart can only have one point for each value on the x-axis." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"[...] a sunburst chart where the center is either a pie chart or a donut chart of the biggest categories surrounded in donuts that show each of the other levels. The outside donut has the leaf nodes [...]" (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Another way to make points visible on a crowded visualization is to change the opacity of the points. This makes it easier to see where the points overlap. Opacity is a way of describing how hard it is to see though something. If it’s hard to see through, then it’s opaque or has a high opacity. Transparency is the opposite: if something is easy to see through, you can say that it is transparent." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Fuel gauges are another common place to see data shown with angles. Depending on the direction that the needle points and how slanted it is, we can decide if it’s time to stop at the gas station. [...] Many kinds of meters, gauges, dials, knobs, and faucets tell us what’s happening by the angle of a needle, marker, or handle." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"[...] meter charts are a third type of visualization that uses angles. Meter charts, which are sometimes called gauge charts, are named after things like electric meters and gas gauges. These visualizations are shaped like donut charts with a bite taken out. They’re mostly used for showing progress toward a goal, or how empty, full, or extreme something is." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Meter charts use angle and sometimes color to show amounts of something or progress toward a goal." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Networks or network graphs show relationships between things using nodes and links. Nodes are similar to the points on a scatterplot - they show one data point each. Links are the lines or arrows that show how the nodes connect or relate to each other." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Pie charts, donut charts, and meter charts are really just stacked bar charts that have been bent - but remember that they should always add up to 100%. Radar charts use angle to show categories and position to show amounts. You can also use angle with position to create charts that show movement, direction, or change - on maps and on graphs with number axes, as well as on visualizations with category axes." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Scatterplots and bubble charts are useful for showing the relationship between variables, but they aren’t very useful for showing the ordering of data points. If you want to understand the change from point-to-point in a certain order, you’ll need to use a different type of visualization, like a line graph." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Tree maps show networks by arranging rectangular branches and leaf nodes into a big block. Each branch in a tree map is packed with leaf nodes of the same color. Sometimes, the leaf nodes are in different sizes to show different amounts." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

"Visualizations that use different lengths of rectangles to show quantities are called bar charts. The rectangles in bar charts are called bars, and each bar represents a single category from a categorical variable. [...] When the bars in a bar chart are standing up, these visualizations are sometimes called column charts. Column charts and bar charts work in exactly the same way, but you might choose one over the other to fit better on a page or because it suits the data better." (Nancy Organ, "Data Visualization for People of All Ages", 2024)

10 November 2007

🎯Rukmani Gopalan - Collected Quotes

"A cloud data warehouse is an enterprise data warehouse offered as a managed service (PaaS) on public clouds with optimized integrations for data ingestion, analytics processing, and BI analytics." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Churn refers to rapidly changing the activities and your plan when they are in flux - this is disruptive to your organization and slows your progress. Change refers to an inevitable movement in requirements and helps you plan for and execute this movement thoughtfully." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Data mesh relies on a distributed architecture that consists of domains. Each domain is an independent unit of data and its associated storage and compute components. When an organization contains various product units, each with its own data needs, each product team owns a domain that is operated and governed independently by the product team. […] Data mesh has a unique value proposition, not just offering scale of infrastructure and scenarios but also helping shift the organization’s culture around data," (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"If there is one thing I strongly recommend, it is to invest in a cloud data lake and start collecting and processing data that you believe is useful to your organization today." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"It’s true that data and data strategy are critical to the organization; however, it’s also true that data by itself is a means to the end of business or customer impact unless you’re a provider of data or data-related services." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Plan for customer impact, and prepare to learn and fine-tune as you progress. Make choices based on the impact they offer to customers, and stay consistent in your implementation while keeping open-minded for learnings. Especially if you are an early adopter of a technology, you can help develop the technology with the provider and thus get ample support from the technology provider in return. Similarly, identify highly motivated early adopters within your customer base and offer to develop your solution with them." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"Real-time stream processing refers to the ingestion, processing, and consumption of data with a specific focus on speed, targeting near real time - that is, almost instantaneous results. […] Real-time stream processing pipelines involve data that is arriving from its source at very high velocity; in other words, it is data that is streaming into the system, just like rain or a waterfall." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"The lakehouse provides a key advantage over the modern data warehouse by eliminating the need to have two places to store the same data. [...] Data lakehouses offer the key benefit of being able to run performant BI/SQL-based scenarios directly on the data lake, right alongside the other exploratory data science and machine learning scenarios." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"The promise of a cloud data lake architecture lies in the boundless diversity of scenarios that it enables." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"The very simple definition of cloud data lake storage is a service available as a cloud offering that can serve as a central repository for all kinds of data (structured, unstructured, and semistructured) and can support data and transactions at a large scale." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"When it comes to data lakes, some things usually stay constant: the storage and processing patterns. Change could come in any of the following ways: Adding new components and processing or consumption patterns to respond to new requirements. […] Optimizing existing architecture for better cost or performance" (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

31 December 2006

✏️Danyel Fisher - Collected Quotes

"A dimension is an attribute that groups, separates, or filters data items. A measure is an attribute that addresses the question of interest and that the analyst expects to vary across the dimensions. Both the measures and the dimensions might be attributes directly found in the dataset or derived attributes calculated from the existing data." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"A well-operationalized task, relative to the underlying data, fulfills the following criteria: (1) Can be computed based on the data; (2) Makes specific reference to the attributes of the data; (3) Has a traceable path from the high-level abstract questions to a set of concrete, actionable tasks." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"An actionable task means that it is possible to act on its result. That action might be to present a useful result to a decision maker or to proceed to a next step in a different result. An answer is actionable when it no longer needs further work to make sense of it." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Every dataset has subtleties; it can be far too easy to slip down rabbit holes of complications. Being systematic about the operationalization can help focus our conversations with experts, only introducing complications when needed." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Color is difficult to use effectively. A small number of well-chosen colors can be highly distinguishable, particularly for categorical data, but it can be difficult for users to distinguish between more than a handful of colors in a visualization. Nonetheless, color is an invaluable tool in the visualization toolbox because it is a channel that can carry a great deal of meaning and be overlaid on other dimensions. […] There are a variety of perceptual effects, such as simultaneous contrast and color deficiencies, that make precise numerical judgments about a color scale difficult, if not impossible." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Creating effective visualizations is hard. Not because a dataset requires an exotic and bespoke visual representation - for many problems, standard statistical charts will suffice. And not because creating a visualization requires coding expertise in an unfamiliar programming language [...]. Rather, creating effective visualizations is difficult because the problems that are best addressed by visualization are often complex and ill-formed. The task of figuring out what attributes of a dataset are important is often conflated with figuring out what type of visualization to use. Picking a chart type to represent specific attributes in a dataset is comparatively easy. Deciding on which data attributes will help answer a question, however, is a complex, poorly defined, and user-driven process that can require several rounds of visualization and exploration to resolve." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Dashboards are a type of multiform visualization used to summarize and monitor data. These are most useful when proxies have been well validated and the task is well understood. This design pattern brings a number of carefully selected attributes together for fast, and often continuous, monitoring - dashboards are often linked to updating data streams. While many allow interactivity for further investigation, they typically do not depend on it. Dashboards are often used for presenting and monitoring data and are typically designed for at-a-glance analysis rather than deep exploration and analysis." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Designing effective visualizations presents a paradox. On the one hand, visualizations are intended to help users learn about parts of their data that they don’t know about. On the other hand, the more we know about the users’ needs and the context of their data, the better we can design a visualization to serve them." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Dimensionality reduction is a way of reducing a large number of different measures into a smaller set of metrics. The intent is that the reduced metrics are a simpler description of the complex space that retains most of the meaning. […] Clustering techniques are similarly useful for reducing a large number of items into a smaller set of groups. A clustering technique finds groups of items that are logically near each other and gathers them together." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Maps also have the disadvantage that they consume the most powerful encoding channels in the visualization toolbox - position and size - on an aspect that is held constant. This leaves less effective encoding channels like color for showing the dimension of interest." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"[…] no single visualization is ever quite able to show all of the important aspects of our data at once - there just are not enough visual encoding channels. […] designing effective visualizations to make sense of data is not an art - it is a systematic and repeatable process." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"[…] the data itself can lead to new questions too. In exploratory data analysis (EDA), for example, the data analyst discovers new questions based on the data. The process of looking at the data to address some of these questions generates incidental visualizations - odd patterns, outliers, or surprising correlations that are worth looking into further." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The field of [data] visualization takes on that goal more broadly: rather than attempting to identify a single metric, the analyst instead tries to look more holistically across the data to get a usable, actionable answer. Arriving at that answer might involve exploring multiple attributes, and using a number of views that allow the ideas to come together. Thus, operationalization in the context of visualization is the process of identifying tasks to be performed over the dataset that are a reasonable approximation of the high-level question of interest." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The general concept of refining questions into tasks appears across all of the sciences. In many fields, the process is called operationalization, and refers to the process of reducing a complex set of factors to a single metric. The field of visualization takes on that goal more broadly: rather than attempting to identify a single metric, the analyst instead tries to look more holistically across the data to get a usable, actionable answer. Arriving at that answer might involve exploring multiple attributes, and using a number of views that allow the ideas to come together. Thus, operationalization in the context of visualization is the process of identifying tasks to be performed over the dataset that are a reasonable approximation of the high-level question of interest." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The goal of operationalization is to refine and clarify the question until the analyst can forge an explicit link between the data that they can find and the questions they would like to answer. […] To achieve this, the analyst searches for proxies. Proxies are partial and imperfect representations of the abstract thing that the analyst is really interested in. […] Selecting and interpreting proxies requires judgment and expertise to assess how well, and with what sorts of limitations, they represent the abstract concept." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The operationalization process is an iterative one and the end point is not precisely defined. The answer to the question of how far to go is, simply, far enough. The process is done when the task is directly actionable, using the data at hand. The analyst knows how to describe the objects, measures, and groupings in terms of the data - where to find it, how to compute, and how to aggregate it. At this point, they know what the question will look like and they know what they can do to get the answer." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"The intention behind prototypes is to explore the visualization design space, as opposed to the data space. A typical project usually entails a series of prototypes; each is a tool to gather feedback from stakeholders and help explore different ways to most effectively support the higher-level questions that they have. The repeated feedback also helps validate the operationalization along the way." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Rapid prototyping is a process of trying out many visualization ideas as quickly as possible and getting feedback from stakeholders on their efficacy. […] The design concept of 'failing fast' informs this: by exploring many different possible visual representations, it quickly becomes clear which tasks are supported by which techniques." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Too many simultaneous encodings will be overwhelming to the reader; colors must be easily distinguishable, and of a small enough number that the reader can interpret them." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

"Visualizations provide a direct and tangible representation of data. They allow people to confirm hypotheses and gain insights. When incorporated into the data analysis process early and often, visualizations can even fundamentally alter the questions that someone is asking." (Danyel Fisher & Miriah Meyer, "Making Data Visual", 2018)

✏️Dina Gray - Collected Quotes

"Although performance measurement is often linked to tools such as scorecards, dashboards, performance targets, indicators and information systems, it would be naïve to consider the measurement of performance as just a technical issue. Indeed, measurement is often used as a way of attempting to bring clarity to complex and confusing situations." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"'Big Data" is certainly changing the way organizations operate, and our capacity to do planning, budgeting and forecasting, as well as the management of our processes and supply chains, has radically improved. However, greater availability of data is also being accompanied by two major challenges: firstly, many managers are now required to develop data-oriented management systems to make sense of the phenomenal amount of data their organizations and their main partners are producing. Secondly, whilst the volume of data that we now have access to is certainly seductive and potentially very useful, it can also be overwhelming." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"[...] introducing an excessive number of measures is only the start of the problem. The other is that measures tend to stick, unless questioned and revised. As the world changes, so does the environment in which an organization operates. Priorities change, new drivers of performance emerge, and different operating models are employed. It would therefore make sense that the performance measurement system is also revised to reflect these changes." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"Measurement is often associated with the objectivity and neatness of numbers, and performance measurement efforts are typically accompanied by hope, great expectations and promises of change; however, these are then often followed by disbelief, frustration and what appears to be sheer madness." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"Measurement is often seen as a tool that helps reduce the complexity of the world. Organizations, with their uncertainty and confusion, are full of people, patterns and trends; and measurement seems to offer a promise of bringing order, rationality and control into this chaos." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"One of the most puzzling things about performance measurement is that, regardless of the countless negative experiences, as well as a constant stream of similar failures reported in the media, organizations continue to apply the same methods and constantly fall into the same traps. This is because commonly held beliefs about the measurement and management of performance are rarely challenged." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"Performance measures by themselves are simply tools that may or may not be used by managers and staff. However, if your organization has an addiction to measurement, sooner or later people will start relying on measures excessively, and common sense will gradually begin to be replaced by the measures themselves leading the organization into the eye of the measurement madness hurricane." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"Regularly, and unfortunately more often than might be expected, organizations can become so fixated on the narrow task of measuring and reporting performance that measures lose their meaning, and no one relies on them for real decision-making. [...] More worryingly, sometimes performance measures are introduced without any intention of providing meaningful data for making decisions in the first place. In this case, such indicators are often treated with contempt." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"Since perfect measures of performance do not exist, organizations use proxies - indicators that approximate or represent performance in the absence of perfect measures. [...] Over time, proxies are perceived to represent true performance." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

"When all you see and believe is numbers, it becomes increasingly difficult to decide when to react and intervene. [...] The most obvious course of action is to set aside the numbers and try to understand the underlying causes of these changes. However, the over-reliance on measurement instead drives many managers to design 'thresholds' or 'colour codes' for numbers, thus adding another layer of abstraction to measurement and keeping these managers firmly desensitized to the meaning of performance information." (Dina Gray et al, "Measurement Madness: Recognizing and avoiding the pitfalls of performance measurement", 2015)

✏️Edward R Tufte - Collected Quotes

"A good rule of thumb for deciding how long the analysis of the data actually will take is (1) to add up all the time for everything you can think of - editing the data, checking for errors, calculating various statistics, thinking about the results, going back to the data to try out a new idea, and (2) then multiply the estimate obtained in this first step by five." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Almost all efforts at data analysis seek, at some point, to generalize the results and extend the reach of the conclusions beyond a particular set of data. The inferential leap may be from past experiences to future ones, from a sample of a population to the whole population, or from a narrow range of a variable to a wider range. The real difficulty is in deciding when the extrapolation beyond the range of the variables is warranted and when it is merely naive. As usual, it is largely a matter of substantive judgment - or, as it is sometimes more delicately put, a matter of 'a priori nonstatistical considerations'." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"[…] fitting lines to relationships between variables is often a useful and powerful method of summarizing a set of data. Regression analysis fits naturally with the development of causal explanations, simply because the research worker must, at a minimum, know what he or she is seeking to explain." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Fitting lines to relationships between variables is the major tool of data analysis. Fitted lines often effectively summarize the data and, by doing so, help communicate the analytic results to others. Estimating a fitted line is also the first step in squeezing further information from the data." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"If two or more describing variables in an analysis are highly intercorrelated, it will be difficult and perhaps impossible to assess accurately their independent impacts on the response variable. As the association between two or more describing variables grows stronger, it becomes more and more difficult to tell one variable from the other. This problem, called 'multicollinearity' in the statistical jargon, sometimes causes difficulties in the analysis of nonexperimental data. […] No statistical technique can go very far to remedy the problem because the fault lies basically with the data rather than the method of analysis. Multicollinearity weakens inferences based on any statistical method - regression, path analysis, causal modeling, or cross-tabulations (where the difficulty shows up as a lack of deviant cases and as near-empty cells)." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"[…] it is not enough to say: 'There's error in the data and therefore the study must be terribly dubious'. A good critic and data analyst must do more: he or she must also show how the error in the measurement or the analysis affects the inferences made on the basis of that data and analysis." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Logging size transforms the original skewed distribution into a more symmetrical one by pulling in the long right tail of the distribution toward the mean. The short left tail is, in addition, stretched. The shift toward symmetrical distribution produced by the log transform is not, of course, merely for convenience. Symmetrical distributions, especially those that resemble the normal distribution, fulfill statistical assumptions that form the basis of statistical significance testing in the regression model." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Logging skewed variables also helps to reveal the patterns in the data. […] the rescaling of the variables by taking logarithms reduces the nonlinearity in the relationship and removes much of the clutter resulting from the skewed distributions on both variables; in short, the transformation helps clarify the relationship between the two variables. It also […] leads to a theoretically meaningful regression coefficient." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Our inability to measure important factors does not mean either that we should sweep those factors under the rug or that we should give them all the weight in a decision. Some important factors in some problems can be assessed quantitatively. And even though thoughtful and imaginative efforts have sometimes turned the 'unmeasurable' into a useful number, some important factors are simply not measurable. As always, every bit of the investigator's ingenuity and good judgment must be brought into play. And, whatever un- knowns may remain, the analysis of quantitative data nonetheless can help us learn something about the world - even if it is not the whole story." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Random data contain no substantive effects; thus if the analysis of the random data results in some sort of effect, then we know that the analysis is producing that spurious effect, and we must be on the lookout for such artifacts when the genuine data are analyzed." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Sometimes clusters of variables tend to vary together in the normal course of events, thereby rendering it difficult to discover the magnitude of the independent effects of the different variables in the cluster. And yet it may be most desirable, from a practical as well as scientific point of view, to disentangle correlated describing variables in order to discover more effective policies to improve conditions. Many economic indicators tend to move together in response to underlying economic and political events." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The problem of multicollinearity involves a lack of data, a lack of information. […] Recognition of multicollinearity as a lack of information has two important consequences: (1) In order to alleviate the problem, it is necessary to collect more data - especially on the rarer combinations of the describing variables. (2) No statistical technique can go very far to remedy the problem because the fault lies basically with the data rather than the method of analysis. Multicollinearity weakens inferences based on any statistical method - regression, path analysis, causal modeling, or cross-tabulations (where the difficulty shows up as a lack of deviant cases and as near-empty cells)." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Statistical techniques do not solve any of the common-sense difficulties about making causal inferences. Such techniques may help organize or arrange the data so that the numbers speak more clearly to the question of causality - but that is all statistical techniques can do. All the logical, theoretical, and empirical difficulties attendant to establishing a causal relationship persist no matter what type of statistical analysis is applied." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The language of association and prediction is probably most often used because the evidence seems insufficient to justify a direct causal statement. A better practice is to state the causal hypothesis and then to present the evidence along with an assessment with respect to the causal hypothesis - instead of letting the quality of the data determine the language of the explanation." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithmic transformation serves several purposes: (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The matching procedure often helps inform the reader what is going on in the data […] Matching has some defects, chiefly that it is difficult to do a very good job of matching in complex situations without a large number of cases. […] One limitation of matching, then, is that quite often the match is not very accurate. A second limitation is that if we want to control for more than one variable using matching procedures, the tables begin to have combinations of categories without any cases at all in them, and they become somewhat more difficult for the reader to understand." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The use of statistical methods to analyze data does not make a study any more 'scientific', 'rigorous', or 'objective'. The purpose of quantitative analysis is not to sanctify a set of findings. Unfortunately, some studies, in the words of one critic, 'use statistics as a drunk uses a street lamp, for support rather than illumination'. Quantitative techniques will be more likely to illuminate if the data analyst is guided in methodological choices by a substantive understanding of the problem he or she is trying to learn about. Good procedures in data analysis involve techniques that help to (a) answer the substantive questions at hand, (b) squeeze all the relevant information out of the data, and (c) learn something new about the world." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Typically, data analysis is messy, and little details clutter it. Not only confounding factors, but also deviant cases, minor problems in measurement, and ambiguous results lead to frustration and discouragement, so that more data are collected than analyzed. Neglecting or hiding the messy details of the data reduces the researcher's chances of discovering something new." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"An especially effective device for enhancing the explanatory power of time-series displays is to add spatial dimensions to the design of the graphic, so that the data are moving over space (in two or three dimensions) as well as over time. […] Occasionally graphics are belligerently multivariate, advertising the technique rather than the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Each part of a graphic generates visual expectations about its other parts and, in the economy of graphical perception, these expectations often determine what the eye sees. Deception results from the incorrect extrapolation of visual expectations generated at one place on the graphic to other places." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"For many people the first word that comes to mind when they think about statistical charts is 'lie'. No doubt some graphics do distort the underlying data, making it hard for the viewer to learn the truth. But data graphics are no different from words in this regard, for any means of communication can be used to deceive. There is no reason to believe that graphics are especially vulnerable to exploitation by liars; in fact, most of us have pretty good graphical lie detectors that help us see right through frauds." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Graphical excellence is the well-designed presentation of interesting data - a matter of substance, of statistics, and of design. Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency. Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Graphical excellence is nearly always multivariate. And graphical excellence requires telling the truth about the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Graphical competence demands three quite different skills: the substantive, statistical, and artistic. Yet now most graphical work, particularly at news publications, is under the direction of but a single expertise - the artistic. Allowing artist-illustrators to control the design and content of statistical graphics is almost like allowing typographers to control the content, style, and editing of prose. Substantive and quantitative expertise must also participate in the design of data graphics, at least if statistical integrity and graphical sophistication are to be achieved." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

" In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Inept graphics also flourish because many graphic artists believe that statistics are boring and tedious. It then follows that decorated graphics must pep up, animate, and all too often exaggerate what evidence there is in the data. […] If the statistics are boring, then you've got the wrong numbers." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Modern data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are instruments for reasoning about quantitative information. Often the most effective way to describe, explore, and summarize a set of numbers even a very large set - is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Nearly all those who produce graphics for mass publication are trained exclusively in the fine arts and have had little experience with the analysis of data. Such experiences are essential for achieving precision and grace in the presence of statistics. [...] Those who get ahead are those who beautified data, never mind statistical integrity." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Of course, false graphics are still with us. Deception must always be confronted and demolished, even if lie detection is no longer at the forefront of research. Graphical excellence begins with telling the truth about the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Of course, statistical graphics, just like statistical calculations, are only as good as what goes into them. An ill-specified or preposterous model or a puny data set cannot be rescued by a graphic (or by calculation), no matter how clever or fancy. A silly theory means a silly graphic." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Relational graphics are essential to competent statistical analysis since they confront statements about cause and effect with evidence, showing how one variable affects another." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The conditions under which many data graphics are produced - the lack of substantive and quantitative skills of the illustrators, dislike of quantitative evidence, and contempt for the intelligence of the audience-guarantee graphic mediocrity. These conditions engender graphics that (1) lie; (2) employ only the simplest designs, often unstandardized time-series based on a small handful of data points; and (3) miss the real news actually in the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies - to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"[…] the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. […] Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The problem with time-series is that the simple passage of time is not a good explanatory variable: descriptive chronology is not causal explanation. There are occasional exceptions, especially when there is a clear mechanism that drives the Y-variable." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The theory of the visual display of quantitative information consists of principles that generate design options and that guide choices among options. The principles should not be applied rigidly or in a peevish spirit; they are not logically or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper. Most principles of design should be greeted with some skepticism, for word authority can dominate our vision, and we may come to see only though the lenses of word authority rather than with our own eyes." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The time-series plot is the most frequently used form of graphic design. With one dimension marching along to the regular rhythm of seconds, minutes, hours, days, weeks, months, years, centuries, or millennia, the natural ordering of the time scale gives this design a strength and efficiency of interpretation found in no other graphic arrangement." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all his sentences short, or that heavoid all detail and treat his subjects only in outline, but that every word tell." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"A range-frame does not require any viewing or decoding instructions; it is not a graphical puzzle and most viewers can easily tell what is going on. Since it is more informative about the data in a clear and precise manner, the range-frame should replace the non-data bearing frame inmany graphical applications." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution." (Edward R Tufte, "Envisioning Information", 1990)

"Confusion and clutter are failures of design, not attributes of information. And so the point is to find design strategies that reveal detail and complexity - rather than to fault the data for an excess of complication. Or, worse, to fault viewers for a lack of understanding. Among the most powerful devices for reducing noise and enriching the content of displays is the technique of layering and separation, visually stratifying various aspects of the data." (Edward R Tufte, "Envisioning Information", 1990)

"Consider this unsavory exhibit at right – chockablock with cliché and stereotype, coarse humor, and a content-empty third dimension. [...] Credibility vanishes in clouds of chartjunk; who would trust a chart that looks like a video game?" (Edward R Tufte, "Envisioning Information", 1990) [on diamond charts]

"Graphics are almost always going to improve as they go through editing, revision, and testing against different design options. The principles of maximizing data-ink and erasing generate graphical alternatives and also suggest a direction in which revisions should move." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"Gray grids almost always work well and, with a delicate line, may promote more accurate data reading and reconstruction than a heavy grid. Dark grid lines are chartjunk. When a graphic serves as a look-up table (rare indeed), then a grid may help with reading and interpolation. But even then the grid should be muted relative to the data." (Edward R Tufte, "Envisioning Information", 1990)

"Information consists of differences that make a difference." (Edward R Tufte, "Envisioning Information", 1990)

"Lurking behind chartjunk is contempt both for information and for the audience. Chartjunk promoters imagine that numbers and details are boring, dull, and tedious, requiring ornament to enliven. Cosmetic decoration, which frequently distorts the data, will never salvage an underlying lack of content. If the numbers are boring, then you've got the wrong numbers." (Edward R Tufte, "Envisioning Information", 1990)

"Maximizing data ink (within reason) is but a single dimension of a complex and multivariate design task. The principle helps conduct experiments in graphical design. Some of those experiments will succeed. There remain, however, many other considerations in the design of statistical graphics - not only of efficiency, but also of complexity, structure, density, and even beauty." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"The ducks of information design are false escapes from flatland, adding pretend dimensions to impoverished data sets, merely fooling around with information." (Edward R Tufte, "Envisioning Information", 1990)

"Then there is the audience: will those looking at the new designs be confused? Some of the designs are selfexplanatory, as in the case of the range-frame. The dot-dash-plot is more difficult, although it still shows all the standard information found in the scatterplot. Nothing is lost to those puzzled by the frame of dashes, and something is gained by those who do understand. Moreover, it is a frequent mistake in thinking about statistical graphics to underestimate the audience. Instead, why not assume that if you understand it, most other readers will, too? Graphics should be as intelligent and sophisticated as the accompanying text." (Edward R Tufte, "Data-Ink Maximization and Graphical Design", Oikos Vol. 58 (2), 1990)

"Visual displays rich with data are not only an appropriate and proper complement to human capabilities, but also such designs are frequently optimal. If the visual task is contrast, comparison, and choice - as so often it is - then the more relevant information within eyespan, the better. Vacant, low-density displays, the dreaded posterization of data spread over pages and pages, require viewers to rely on visual memory - a weak skill - to make a contrast, a comparison, a choice." (Edward R Tufte, "Envisioning Information", 1990)

"We envision information in order to reason about, communicate, document, and preserve that knowledge - activities nearly always carried out on two-dimensional paper and computer screen. Escaping this flatland and enriching the density of data displays are the essential tasks of information design." (Edward R Tufte, "Envisioning Information", 1990)

"What about confusing clutter? Information overload? Doesn't data have to be ‘boiled down’ and ‘simplified’? These common questions miss the point, for the quantity of detail is an issue completely separate from the difficulty of reading. Clutter and confusion are failures of design, not attributes of information. Often the less complex and less subtle the line, the more ambiguous and less interesting is the reading. Stripping the detail out of data is a style based on personal preference and fashion, considerations utterly indifferent to substantive content." (Edward R Tufte, "Envisioning Information", 1990)

"Good information design is clear thinking made visible, while bad design is stupidity in action." (Edward Tufte, "Visual Explanations" , 1997)

"Audience boredom is usually a content failure, not a decoration failure." (Edward R Tufte, "The cognitive style of PowerPoint", 2003)

"If your words or images are not on point, making them dance in color won't make them relevant." (Edward R Tufte, "The cognitive style of PowerPoint", 2003)

"A sparkline is a small, intense, simple, word-sized graphic with typographic resolution. Sparklines mean that graphics are no longer cartoonish special occasions with captions and boxes, but rather sparkline graphics can be everywhere a word or number can be: embedded in a sentence, table, headline, map, spreadsheet, graphic." (Edward R Tufte, "Beautiful Evidence", 2006)

"Areas surrounding data-lines may generate unintentional optical clutter. Strong frames produce melodramatic but content-diminishing visual effects. [...] A good way to assess a display for unintentional optical clutter is to ask 'Do the prominent visual effects convey relevant content?'" (Edward R Tufte, "Beautiful Evidence", 2006)

"By segregating evidence by mode (word, number, image, graph) , the current-day computer approach contradicts the spirit of sparklines, a spirit that makes no distinction among words, numbers, graphics, images. It is all evidence, after all. A good system for evidence display should be centered on evidence, not on a collection of application programs each devoted to a single mode of information." (Edward R Tufte, "Beautiful Evidence", 2006)

"By showing recent change in relation to many past changes, sparklines provide a context for nuanced analysis - and, one hopes, better decisions. [...] Sparklines efficiently display and narrate binary data (presence/absence, occurrence/non-occurrence, win/loss). [...] Sparklines can simultaneously accommodate several variables. [...] Sparklines can narrate on-going results detail for any process producing sequential binary outcomes." (Edward R Tufte, "Beautiful Evidence", 2006)

"Closely spaced lines produce moiré vibration, usually at its worst when data-lines (the figure) and spaces (the ground) between data-lines are approximately equal in size, and also when figure and ground contrast strongly in color value." (Edward R Tufte, "Beautiful Evidence", 2006)

"Conflicting with the idea of integrating evidence regardless of its these guidelines provoke several issues: First, labels are data. even intriguing data. [...] Second, when labels abandon the data points, then a code is often needed to relink names to numbers. Such codes, keys, and legends are Impediments to learning, causing the reader's brow to furrow. Third, segregating nouns from data-dots breaks up evidence on the basis of mode (verbal vs. nonverbal), a distinction lacking substantive relevance. Such separation is uncartographic; contradicting the methods of map design often causes trouble for any type of graphical display. Fourth, design strategies that reduce data-resolution take evidence displays in the wrong direction. Fifth, what clutter? Even this supposedly cluttered graph clearly shows the main ideas: brain and body mass are roughly linear in logarithms, and as both variables increase, this linearity becomes less tight." (Edward R Tufte, "Beautiful Evidence", 2006) [argumentation against Cleveland's recommendation of not using words on data plots]

"Documentation allows more effective watching, and we have the Fifth Principle for the analysis and presentation of data: 'Thoroughly describe the evidence. Provide a detailed title, indicate the authors and sponsors, document the data sources, show complete measurement scales, point out relevant issues.'" (Edward R Tufte, "Beautiful Evidence", 2006)

"Explanatory, journalistic, and scientific images should nearly always be mapped, contextualized, and placed on the universal grid. Mapped pictures combine representational images with scales, diagrams, overlays, numbers, words, images." (Edward R Tufte, "Beautiful Evidence", 2006)

"Evidence is evidence, whether words, numbers, images, din grams- still or moving. It is all information after all. For readers and viewers, the intellectual task remains constant regardless of the particular mode Of evidence: to understand and to reason about the materials at hand, and to appraise their quality, relevance. and integrity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Excellent graphics exemplify the deep fundamental principles of analytical design in action. If this were not the case, then something might well be wrong with the principles." (Edward R Tufte, "Beautiful Evidence", 2006)

"Good design, however, can dispose of clutter and show all the data points and their names. [...] Clutter calls for a design solution, not a content reduction." (Edward R Tufte, "Beautiful Evidence", 2006)

"In general. statistical graphics should be moderately greater in length than in height. And, as William Cleveland discovered, for judging slopes and velocities up and down the hills in time-series, best is an aspect ratio that yields hill - slopes averaging 45°, over every cycle in the time-series. Variations in slopes are best detected when the slopes are around 45°, uphill or downhill." (Edward R Tufte, "Beautiful Evidence", 2006)

"Making a presentation is a moral act as well as an intellectual activity. The use of corrupt manipulations and blatant rhetorical ploys in a report or presentation - outright lying, flagwaving, personal attacks, setting up phony alternatives, misdirection, jargon-mongering, evading key issues, feigning disinterested objectivity, willful misunderstanding of other points of view - suggests that the presenter lacks both credibility and evidence. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Making an evidence presentation is a moral act as well as an intellectual activity. To maintain standards of quality, relevance, and integrity for evidence, consumers of presentations should insist that presenters be held intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity." (Edward R Tufte, "Beautiful Evidence", 2006)

"Most techniques for displaying evidence are inherently multimodal, bringing verbal, visual. and quantitative elements together. Statistical graphics and maps arc visual-numerical fields labeled with words and framed by numbers. Even an austere image may evoke other images, new or remembered narrative, and perhaps a sense of scale and quantity. Words can simultaneously convey semantic and visual content, as the nouns on a map both name places and locate them in the two - space of latitude and longitude." (Edward R Tufte, "Beautiful Evidence", 2006)

"Principles of design should attend to the fundamental intellectual tasks in the analysis of evidence; thus we have the Second Principle for the analysis And presentation of data: Show causality, mechanism, explanation, systematic structure." (Edward R Tufte, "Beautiful Evidence", 2006)

"Sparklines are wordlike graphics, With an intensity of visual distinctions comparable to words and letters. [...] Words visually present both an overall shape and letter-by-letter detail; since most readers have seen the word previously, the visual task is usually one of quick recognition. Sparklines present an overall shape and aggregate pattern along with plenty of local detail. Sparklines are read the same way as words, although much more carefully and slowly." (Edward R Tufte, "Beautiful Evidence", 2006)

"Sparklines vastly increase the amount of data within our eyespan and intensify statistical graphics up to the everyday routine capabilities of the human eye-brain system for reasoning about visual evidence, seeing distinctions, and making comparisons. [...] Providing a straightforward and contextual look at intense evidence, sparkline graphics give us some chance to be approximately right rather than exactly wrong." (Edward R Tufte, "Beautiful Evidence", 2006)

"Sparklines work at intense resolutions, at the level of good typography and cartography. [...] Just as sparklines are like words, so then distributions of sparklines on a page are like sentences and paragraphs. The graphical idea here is make it wordlike and typographic - an idea that leads to reasonable answers for most questions about sparkline arrangements." (Edward R Tufte, "Beautiful Evidence", 2006)

"[...] the First Principle for the analysis and presentation data: 'Show comparisons, contrasts, differences'. The fundamental analytical act in statistical reasoning is to answer the question "Compared with what?". Whether we are evaluating changes over space or time, searching big data bases, adjusting and controlling for variables, designing experiments , specifying multiple regressions, or doing just about any kind of evidence-based reasoning, the essential point is to make intelligent and appropriate comparisons. Thus visual displays, if they are to assist thinking, should show comparisons." (Edward R Tufte, "Beautiful Evidence", 2006)

"The only thing that is 2-dimensional about evidence is the physical flatland of paper and computer screen. Flatlandy technologies of display encourage flatlandy thinking. Reasoning about evidence should not be stuck in 2 dimensions, for the world seek to understand is profoundly multivariate. Strategies of design should make multivariateness routine, nothing out of the ordinary. To think multivariate, show multivariate; the Third Principle for the analysis and presentation of data: 'Show multivariate data; that is, show more than 1 or 2 variables.'" (Edward R Tufte, "Beautiful Evidence", 2006)

"The principles of analytical design are universal - like mathematics, the laws of Nature, the deep structure of language - and are not tied to any particular language, culture, style, century, gender, or technology of information display." (Edward R Tufte, "Beautiful Evidence", 2006)

"The purpose of an evidence presentation is to assist thinking. Thus presentations should be constructed so as to assist with the fundamental intellectual tasks in reasoning about evidence: describing the data, making multivariate comparisons, understanding causality, integrating a diversity Of evidence, and documenting the analysis. Thus the Grand Principle of analytical design: 'The principles of analytical design are derived from the principles of analytical thinking.' Cognitive tasks are turned into principles of evidence presentation and design." (Edward R Tufte, "Beautiful Evidence", 2006)

"The Sixth Principle for the analysis and display of data: 'Analytical presentations ultimately stand or fall depending on the quality, relevance, and integrity of their content.' This suggests that the most effective way to improve a presentation is to get better content. It also suggests that design devices and gimmicks cannot salvage failed content." (Edward R Tufte, "Beautiful Evidence", 2006)

"These little data lines, because of their active quality over time, are named sparklines - small, high-resolution graphics usually embedded in a full context of words, numbers, images. Sparklines are datawords: data-intense, design-simple, word-sized graphics." (Edward R Tufte, "Beautiful Evidence", 2006)

"Words. numbers. pictures, diagrams, graphics, charts, tables belong together. Excellent maps, which are the heart and soul of good practices in analytical graphics, routinely integrate words, numbers, line-art, grids, measurement scales. Rarely is a distinction among the different modes of evidence useful for making sound inferences. It is all information after all. Thus the Fourth Principle for the analysis and presentation of data: 'Completely integrate words, numbers, images, diagrams.'" (Edward R Tufte, "Beautiful Evidence", 2006)

SQL Troubles

Pages