18 April 2023

Graphical Representation: Diagrams We Live By I (The Analytics Marathon)

Graphical Representation
Graphical Representation Series

In a diagram adapted from an older article [1], Brent Dykes, the author of "Effective Data Storytelling" [2], makes a parallel between Data Analytics and marathon running, considering that an organization must pass through the depicted milestones, the percentages representing how many organizations reach the respective milestones:



It's a nice visualization and the metaphor makes sense given that running a marathon requires a long-term strategy to address the gaps between the current and targeted physical/mental form and skillset required to run a marathon, respectively for approaching a set of marathons and each course individually. Similarly, implementing a Data Analytics initiative requires a Data Strategy supposed to address the gaps existing between current and targeted state of art, respectively the many projects run to reach organization's goals. 

It makes sense, isn't it? On the other side the devil lies in details and frankly the diagram raises several questions when is compared with practices and processes existing in organizations. This doesn't mean that the diagram is wrong, just that it doesn't seem to reflect entirely the reality. 

The percentages represent author's perception of how many organizations reach the respective milestones, probably in an repeatable manner (as there are several projects). Thus, only 10% have a data strategy, 100% collect data, 80% of them prepare the data, while at the opposite side only 15% communicate insight, respectively 5% act on information.

Considering only the milestones the diagram looks like a funnel and a capability maturity model (CMM). Typically, the CMMs are more complex than this, evolving with technologies' capabilities. All the mentioned milestones have a set of capabilities that increase in complexity and that usually help differentiated organization's maturity. Therefore, the model seems too simple for an actual categorization.  

Typically, data collection has a specific scope resuming to surveys, interviews and/or research. However, the definition can be extended to the storage of data within organizations. Thus, data collection as the gathering of raw data is mainly done as part of their value supporting processes, and given the degree of digitization of data, one can suppose that most organizations gather data for the different purposes, even if only a small part are maybe digitized.

Even if many organizations build data warehouses, marts, lakehouses, mashes or whatever architecture might be en-vogue these days, an important percentage of the reporting needs are covered by standard reports or reporting tools that access directly the source systems without data preparation or even data visualization. The first important question is what is understood by data analytics? Is it only the use of machine learning and statistical analysis? Does it resume only to pattern and insight finding or does it includes also what is typically considered under the Business Intelligence umbrella? 

Pragmatically thinking, Data Analytics should consider BI capabilities as well as its an extension of the current infrastructure to consider analytic capabilities. On the other side Data Warehousing and BI are considered together by DAMA as part of their Data Management methodology. Moreover, organizations may have a Data Strategy and a BI strategy, respectively a Data Analytics strategy as they might have different goals, challenges and bodies to support them. To make it even more complicated, an organization might even consider all these important topics as part of the Data or even Information Governance, or consider BI or Analytics without Data Management. 

So, a Data Strategy might or might not address Data Analytics at all. It's a matter of management philosophy, organizational structure, politics and other factors. Probably, having a strayegy related to data should count. Even if a written and communicated data-related strategy is recommended for all medium to big organizations, only a small percentage of them have one, while small organizations might ignore the topic completely.

At least in the past, data analysis and its various subcomponents was performed before preparing and visualizing the data, or at least in parallel with data visualization. Frankly, it's a strange succession of steps. Or does it refers to exploratory data analysis (EDA) from a statistical perspective, which requires statistical experience to model and interpret the facts? Moreover, data exploration and discovery happen usually in the early stages.

The most puzzling step is the last one - what does the author intended with it? Ideally, data should be actionable, at least that's what one says about KPIs, OKRs and other metrics. Does it make sense to extend Data Analytics into the decision-making process? Where does a data professional's responsibilities end and which are those boundaries? Or does it refer to the actions that need to be performed by data professionals? 

The natural step after communicating insight is for the management to take action and provide feedback. Furthermore, the decisions taken have impact on the artifacts built and a reevaluation of the business problem, assumptions and further components is needed. The many steps of analytics projects are iterative, some iterations affecting the Data Strategy as well. The diagram shows the process as linear, which is not the case.

For sure there's an interface between Data Analytics and Decision-Making and the processes associated with them, however there should be clear boundaries. E.g., it's a data professional's responsibility to make sure that the data/information is actionable and eventually advise upon it, though whether the entitled people act on it is a management topic. Not acting upon an information is also a decision. Overstepping boundaries can put the data professional into a strange situation in which he becomes responsible and eventually accountable for an action not taken, which is utopic.

The final question - is the last mile representative for the analytical process? The challenge is not the analysis and communication of data but of making sure that the feedback processes work and the changes are addressed correspondingly, that value is created continuously from the data analytics infrastructure, that data-related risks and opportunities are addressed as soon they are recognized. 

As any model, a diagram doesn't need to be correct to be useful and might not be even wrong in the right context and argumentation. A data analytics CMM might allow better estimates and comparison between organizations, though it can easily become more complex to use. Between the two models lies probably a better solution for modeling the data analytics process.

Resources:
[1] Brent Dykes (2022) "Data Analytics Marathon: Why Your Organization Must Focus On The Finish", Forbes (link)
[2] Brent Dykes (2019) Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals (link)

16 April 2023

Book Review: Willard C Brinton's Graphic Methods for Presenting Facts (1919)

"It is often with impotent exasperation that a person having the knowledge sees some fallacious conclusion accepted, or some wrong policy adopted, just because known facts cannot be marshalled and presented in such manner as to be effective." 

This is the conclusion phrase from the first paragraph of Willard C Brinton's "Graphic Methods for Presenting Facts", in which the author expresses his disappointment about the impossibility of bridging the important gap existing between data collection and presentation on side, and the decision-making on the other. Despite being written more than a century ago (1915), the issue seems to be so actual, the average data professional probably met this kind of situation at least once in a lifetime, if not on a regular basis. 

I found out about this book from Bridget Cogley & Vidya Setlur's "Functional Aesthetics for Data Visualization" (2020), which credits Brinton for "shaping the path toward broad use of charts". I found a digitized copy of the book at Internet Archive and browsing though it I found it appealing for a deeper reading and a first review. 

Written in a simple style stripped of any mathematical or statistical formulae, and thus approachable by the average nontechnical reader, the book addresses the techniques and challenges of graphical authors in preparing charts and other graphical content for their consumption in organizations for insight and decision-making, as well for the masses. It mentions also the projecting of graphs as lantern slides to accompany a talk, a precursor of nowadays' forms of presentations.

The engineering and statistical background of the author can be seen in the meticulosity with which the book was written. The book discusses the graphic methods for presenting facts in graphical form, which are the component parts and how can be used to attract readers' attention, respectively present them in an effective manner. Several principle-like statements are considered though the book and listed together in the last chapters, rules that can be found in modern books as well, though probably less exemplified. 

From organization charts to maps, from circle and bar charts to time plots, the number and variety of graphical displays is overwhelming and at the same time surprising for a book that old, especially when we consider the publishing technologies available. As mentioned by the author, color printing of the book was prohibitive given the costs, only one ink color being used. However, this doesn't diminish the quality of visuals considered. Compared with nowadays' books, which seem to attempt compensating the lack of novelty with too much color and mentions of technologies, book's graphics stand out in their simplicity and richness of exemplifications. It is sad to remark that the graphical displays are better chosen and the book is better written than some of nowadays books on data visualization.

Comparing the language and vocabulary used nowadays with the one used then, the reader can see the difficulties of approaching a subject found in its early years, the author recognizing the lack of standards and the difficulties of showing quantitative facts in true proportions. It's also true that more modern authors like Tufte or Cleveland were facing same challenge 70 years later. 

About the author is worth mentioning that he was chairman of the "Joint Committee on Standards for Graphic Presentation" initiated in 1913, committee that published in 1915 their first brief report which consisted of 17 simply basic rules, a first attempt of standardizing the principles of graphic presentation. In 1939 Brinton published a second book on "Graphic Presentation", with less text and abundant colorful graphical displays. Even if some charts are available in the second book as well, overall, the two books seem to complement each other and should be a lecture for the data professional as well for the average reader interested in understanding the use of graphical methods.

Previous Post <<||>> Next Post

References:
[1] Willard C Brinton, "Graphic Methods for Presenting Facts", 1919 (link)
[2] Bridget Cogley & Vidya Setlur, "Functional Aesthetics for Data Visualization" (2020)
[3] Willard C Brinton, "Graphic Presentation", 1939 (link)
[4] Joint Committee on Standards for Graphic Presentation, "Publications of the American Statistical Association" Vol.14 (112), 1915 (Jstor)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.