SQL Troubles

01 December 2006

✏️Kristen Sosulski - Collected Quotes

"A heat map is a graphical representation of a table of data. The individual values are arranged in a table/matrix and represented by colors. Use grayscale or gradient for coloring. Sorting of the variables changes the color pattern." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"A picture may be worth a thousand words, but not all pictures are readable, interpretable, meaningful, or relevant." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Avoid using irrelevant words and pictures. Only use charts that add to your message. […] In addition, words should be read or heard - not both. Decide which one supports the key takeaway for your audience." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Building on the prior knowledge of your audience can foster understanding. Ask yourself, what does my audience already know about the topic? What don’t they yet know?" (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Data graphics are used to show findings, new insights, or results. The data graphic serves as the visual evidence presented to the audience. The data graphic makes the evidence clear when it shows an interpretable result such as a trend or pattern. Data graphics are only as good as the insight or message communicated." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Ensure high contrast values for colors. Allow even those with a color vision deficiency or color blindness to distinguish the different shades by using contrasting colors. Convert graphs to grayscale or print them out in black and white to test contrast." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Pitfall #1: not sharing your work with others prior to your presentation [...]
Pitfall #2: lack of audience engagement [...]
Pitfall #3: little or no eye contact with the audience [...]
Pitfall #4: making your work unreadable (small font) [...]
Pitfall #5: over the time limit [...]
Pitfall #6: showing too much information on a single slide [...]
Pitfall #7: failing to use appropriate data graphics to show insights [...]
Pitfall #8: showing a chart without an explanation [...]
Pitfall #9: presenting a chart without a clear takeaway [...]
Pitfall #10: showing so many variables on a single visual display that they impair the readability of the chart or graph" (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Stories can begin with a question or line of inquiry." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Good data visualizations are persuasive graphics that help tell your data story. When you begin any visualization project, how do you know if your audience will understand your message? Your audience has input in the data visualization process. Consider what they already know and don’t know. Determine how you will support them in identifying and understanding your key points. " (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"Use color only when it corresponds to differences in the data. Reserve color for highlighting a single data point or for differentiating a data series. Avoid thematic or decorative presentations. For example, avoid using red and green together. Be cognizant of the cultural meanings of the colors you select and the impact they may have on your audience." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

"When there are few data points, place the data labels directly on the data. Data density refers to the amount of data shown in a visualization through encodings (points, bars, lines, etc.). A common mistake is presenting too much data in a single data graph. The data itself can obscure the insight. It can make the chart unreadable because the data values are not discernible. Examples include: overlapping data points, too many lines in a line chart, or too many slices in a pie chart. Selecting the appropriate amount of data requires a delicate balance. It is your job to determine how much detail is necessary." (Kristen Sosulski, "Data Visualization Made Simple: Insights into Becoming Visual", 2018)

✏️Roy D G Allen - Collected Quotes

"A knowledge of statistical methods is not only essential for those who present statistical arguments it is also needed by those on the receiving end." (Roy D G Allen, "Statistics for Economists", 1951)

"All statistical data are subject to errors in collection." (Roy D G Allen, "Statistics for Economists", 1951)

"Any time series can now be plotted in two ways. Time is measured along the horizontal axis on a natural scale; the variable is measured along the vertical axis either on a natural or on a ratio scale. A graph of the second kind is the new construction; it is often called a semi-logarithmic graph since the ratio or logarithmic scale is used on one of the two axes of the graph."

"As with tabulation, however, skill in constructing diagrams is only acquired after long experience. The main point can be easily made; a graph or diagram should be clear and simple since it adds nothing to our understanding if it does not show up the trends and relations of our data more obviously than in the original tables. A chart is meant to 'help out' in drawing broad conclusions from a table which may be quite complicated. Inevitably the graph or diagram is less exact and shows less detail than the table; it is a step in the constant process of summarizing data. This must not be overdone. It is only too easy to simplify so drastically as to be misleading." (Roy D G Allen, "Statistics for Economists", 1951)

"Graphs and diagrams help to show up trends and relations but they do not define or measure them precisely. This can be achieved by calculations on the numerical data and, in particular, by the derivation of figures to summarize and relate the significant facts in a table. The main purpose of statistical analysis is to make comparisons. A single figure has no meaning by itself; it only becomes significant and "alive" when compared explicitly or implicitly with another figure. Our first task in analysis is to make the comparisons explicit, to express the relation between one figure and another." (Roy D G Allen, "Statistics for Economists", 1951)

"Graphs [for time series] can be misleading, however, and we need to subject our first impression to a closer scrutiny. We must develop more precise methods of analysis of time series. The variations of a time series are of many kinds which can be grouped under three heads. There is, first, the general direction of movement or the trend of the variable over the long period. Then there are oscillations of various types, of greater or less regularity, superimposed on the trend. Finally, there are residual or irregular variations which may arise from isolated events such as a war or general strike, or which may be due to the operation of random influences." (Roy D G Allen, "Statistics for Economists", 1951)

"It is common practice to round statistical figures. The need for rounding usually arises because the basic figures are approximations. But it is reinforced by the practical convenience of having rounded figures in statistical computations and for presentation of results." (Roy D G Allen, "Statistics for Economists", 1951)

"It is only by experience that skill is acquired in the framing of tables. It is partly a matter of design, to get a neat and concise layout which is both cheap to print and easy on the eye. It is partly a question of making sure that no essential information is omitted so as to leave the meaning of the table uncertain." (Roy D G Allen, "Statistics for Economists", 1951)

"Not even the most subtle and skilled analysis can overcome completely the unreliability of basic data." (Roy D G Allen, "Statistics for Economists", 1951)

"One very simple but effective form of statistical analysis is to represent the tabular data by drawing graphs or diagrams. If made with skill and care in avoiding bias, a diagram will show the data in a graphical form in which the salient features leap to the eye. The risk is that diagrams can be misleading when drawn by the unskilled and they can be very dangerous tools in unscrupulous hands." (Roy D G Allen, "Statistics for Economists", 1951)

"Summarization of statistical data into tabular form is an art rather than a routine following a set of formal rules. Tabulation inevitably implies a loss of detail. The original data are far too voluminous to be appreciated and understood; the significant details are mixed up with much that is irrelevant. The art of tabulation lies in the sacrifice of detail which is less significant for the purposes in hand so that what is really important can be emphasized. Tabulation implies classification, the grouping of items into classes according to various characteristics. And classification depends on clear and precise definitions." (Roy D G Allen, "Statistics for Economists", 1951)

"The error in a sum or difference of any number of rounded figures is the sum of the errors in the separate figures. [...] The relative error in a product or quotient. of two rounded figures is approximately the sum of the relative errors in the separate figures. [...] It is generally safe to write a product or quotient as correct to one less significant figure than the less accurate of the two values in the product or quotient." (Roy D G Allen, "Statistics for Economists", 1951)

"The function of the regression lines, as approximate representations of means of arrays, is to isolate the mean value of one variable corresponding to any given value of the other; the variation of the first variable about its mean is ignored. A regression line is an average relation, and with it there is a variation of values about the average. In the regression of y on x, the variation ignored is in the vertical direction, a variation of y up and down about the line." (Roy D G Allen, "Statistics for Economists", 1951)

"The mean deviation is awkward to compute when the items are numerous and grouped into classes, and it is also not convenient for algebraic manipulation, because of the difficulty of distinguishing and dropping the sign of negative deviations. We can avoid this difficulty by squaring each deviation, obtaining a series of positive values. The mean of the squared deviations is also a measure of dispersion and it is called the variance. This is a highly important concept in more advanced work where it is possible to split the total variances into several parts each attributable to one of the factors causing variation in the original series. This method of "analysis of variance" is described in the technical literature." (Roy D G Allen, "Statistics for Economists", 1951)

30 November 2006

🎯David Parmenter - Collected Quotes

"All good KPIs that I have come across, that have made a difference, had the CEO’s constant attention, with daily calls to the relevant staff. [...] A KPI should tell you about what action needs to take place. [...] A KPI is deep enough in the organization that it can be tied down to an individual. [...] A good KPI will affect most of the core CSFs and more than one BSC perspective. [...] A good KPI has a flow on effect." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"If the KPIs you currently have are not creating change, throw them out because there is a good chance that they may be wrong. They are probably measures that were thrown together without the in-depth research and investigation KPIs truly deserve." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Many management reports are not a management tool; they are merely memorandums of information. As a management tool, management reports should encourage timely action in the right direction, by reporting on those activities the Board, management, and staff need to focus on. The old adage “what gets measured gets done” still holds true." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Reporting to the Board is a classic 'catch-22' situation. Boards complain about getting too much information too late, and management complains that up to 20% of their time is tied up in the Board reporting process. Boards obviously need to ascertain whether management is steering the ship correctly and the state of the crew and customers before they can relax and 'strategize' about future initiatives. The process of assessing the current status of the organization from the most recent Board report is where the principal problem lies. Board reporting needs to occur more efficiently and effectively for both the Board and management." (David Parmenter, "Pareto’s 80/20 Rule for Corporate Accountants", 2007)

"Financial measures are a quantification of an activity that has taken place; we have simply placed a value on the activity. Thus, behind every financial measure is an activity. I call financial measures result indicators, a summary measure. It is the activity that you will want more or less of. It is the activity that drives the dollars, pounds, or yen. Thus financial measures cannot possibly be KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"'Getting it right the first time' is a rare achievement, and ascertaining the organization’s winning KPIs and associated reports is no exception. The performance measure framework and associated reporting is just like a piece of sculpture: you can be criticized on taste and content, but you can’t be wrong. The senior management team and KPI project team need to ensure that the project has a just-do-it culture, not one in which every step and measure is debated as part of an intellectual exercise." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"In order to get measures to drive performance, a reporting framework needs to be developed at all levels within the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Key performance indicators (KPIs) are those indicators that focus on the aspects of organizational performance that are the most critical for the current and future success of the organization." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Key Performance Indicators (KPIs) in many organizations are a broken tool. The KPIs are often a random collection prepared with little expertise, signifying nothing. [...] KPIs should be measures that link daily activities to the organization’s critical success factors (CSFs), thus supporting an alignment of effort within the organization in the intended direction." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Most organizational measures are very much past indicators measuring events of the last month or quarter. These indicators cannot be and never were KPIs." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"The traditional balanced-scorecard (BSC) approach uses performance measures to monitor the implementation of the strategic initiatives, and measures are typically cascaded down from a top-level organizational measure such as return on capital employed. This cascading of measures from one another will often lead to chaos, with hundreds of measures being monitored by staff in some form of BSC reporting application." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"We need indicators of overall performance that need only be reviewed on a monthly or bimonthly basis. These measures need to tell the story about whether the organization is being steered in the right direction at the right speed, whether the customers and staff are happy, and whether we are acting in a responsible way by being environmentally friendly. These measures are called key result indicators (KRIs)." (David Parmenter, "Key Performance Indicators: Developing, implementing, and using winning KPIs" 3rd Ed., 2015)

"Every day spent producing reports is a day less spent on analysis and projects." (David Parmenter)

🎯Zachary Karabell - Collected Quotes

"Culture is fuzzy, easy to caricature, amenable to oversimplifications, and often used as a catchall when all other explanations fail." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Defining an indicator as lagging, coincident, or leading is connected to another vital notion: the business cycle. Indicators are lagging or leading based on where economists believe we are in the business cycle: whether we are heading into a recession or emerging from one." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] economics is a profession grounded in the belief that 'the economy' is a machine and a closed system. The more clearly that machine is understood, the more its variables are precisely measured, the more we will be able to manage and steer it as we choose, avoiding the frenetic expansions and sharp contractions. With better indicators would come better policy, and with better policy, states would be less likely to fall into depression and risk collapse." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] humans make mistakes when they try to count large numbers in complicated systems. They make even greater errors when they attempt - as they always do - to reduce complicated systems to simple numbers." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"In the absence of clear information - in the absence of reliable statistics - people did what they had always done: filtered available information through the lens of their worldview." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Most people do not relate to or retain columns of numbers, however much those numbers reflect something that they care about deeply. Statistics can be cold and dull." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Our needs going forward will be best served by how we make use of not just this data but all data. We live in an era of Big Data. The world has seen an explosion of information in the past decades, so much so that people and institutions now struggle to keep pace. In fact, one of the reasons for the attachment to the simplicity of our indicators may be an inverse reaction to the sheer and bewildering volume of information most of us are bombarded by on a daily basis. […] The lesson for a world of Big Data is that in an environment with excessive information, people may gravitate toward answers that simplify reality rather than embrace the sheer complexity of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics are meaningless unless they exist in some context. One reason why the indicators have become more central and potent over time is that the longer they have been kept, the easier it is to find useful patterns and points of reference." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics are what humans do with the data they assemble; they are constructs meant to make sense of information. But the raw material is itself equally valuable, and rarely do we make sufficient use of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics represents the fusion of mathematics with the collection and analysis of data." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The concept that an economy (1) is characterized by regular cycles that (2) follow familiar patterns (3) illuminated by a series of statistics that (4) determine where we are in that cycle has become part and parcel of how we view the world." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The indicators - through no particular fault of anyone in particular - have not kept up with the changing world. As these numbers have become more deeply embedded in our culture as guides to how we are doing, we rely on a few big averages that can never be accurate pictures of complicated systems for the very reason that they are too simple and that they are averages. And we have neither the will nor the resources to invent or refine our current indicators enough to integrate all of these changes." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The search for better numbers, like the quest for new technologies to improve our lives, is certainly worthwhile. But the belief that a few simple numbers, a few basic averages, can capture the multifaceted nature of national and global economic systems is a myth. Rather than seeking new simple numbers to replace our old simple numbers, we need to tap into both the power of our information age and our ability to construct our own maps of the world to answer the questions we need answering." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"We don’t need new indicators that replace old simple numbers with new simple numbers. We need instead bespoke indicators, tailored to the specific needs and specific questions of governments, businesses, communities, and individuals." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Yet our understanding of the world is still framed by our leading indicators. Those indicators define the economy, and what they say becomes the answer to the simple question 'Are we doing well?'" (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

28 November 2006

🎯Piethein Strengholt - Collected Quotes

"For advanced analytics, a well-designed data pipeline is a prerequisite, so a large part of your focus should be on automation. This is also the most difficult work. To be successful, you need to stitch everything together." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"One of the patterns from domain-driven design is called bounded context. Bounded contexts are used to set the logical boundaries of a domain’s solution space for better managing complexity. It’s important that teams understand which aspects, including data, they can change on their own and which are shared dependencies for which they need to coordinate with other teams to avoid breaking things. Setting boundaries helps teams and developers manage the dependencies more efficiently." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"The logical boundaries are typically explicit and enforced on areas with clear and higher cohesion. These domain dependencies can sit on different levels, such as specific parts of the application, processes, associated database designs, etc. The bounded context, we can conclude, is polymorphic and can be applied to many different viewpoints. Polymorphic means that the bounded context size and shape can vary based on viewpoint and surroundings. This also means you need to be explicit when using a bounded context; otherwise it remains pretty vague." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"The transformation of a monolithic application into a distributed application creates many challenges for data management." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"A domain aggregate is a cluster of domain objects that can be treated as a single unit. When you have a collection of objects of the same format and type that are used together, you can model them as a single object, simplifying their usage for other domains." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Data products should remain stable and be decoupled from the operational/transactional applications. This requires a mechanism for detecting schema drift, and avoiding disruptive changes. It also requires versioning and, in some cases, independent pipelines to run in parallel, giving your data consumers time to migrate from one version to another." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Decentralization involves risks, because the more you spread out activities across the organization, the harder it gets to harmonize strategy and align and orchestrate planning, let alone foster the culture and recruit the talent needed to properly manage your data." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Enterprises have difficulties in interpreting new concepts like the data mesh and data fabric, because pragmatic guidance and experiences from the field are missing. In addition to that, the data mesh fully embraces a decentralized approach, which is a transformational change not only for the data architecture and technology, but even more so for organization and processes. This means the transformation cannot only be led by IT; it’s a business transformation as well." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"The data fabric is an approach that addresses today’s data management and scalability challenges by adding intelligence and simplifying data access using self-service. In contrast to the data mesh, it focuses more on the technology layer. It’s an architectural vision using unified metadata with an end-to-end integrated layer (fabric) for easily accessing, integrating, provisioning, and using data." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"The data mesh is an exciting new methodology for managing data at large. The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. It puts an emphasis on the human factor and addressing the challenges of managing the increasing complexity of data architectures." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"When performing experiments, the first step is to determine what compute infrastructure and environment you need.16 A general best practice is to start fresh, using a clean development environment. Keep track of everything you do in each experiment, versioning and capturing all your inputs and outputs to ensure reproducibility. Pay close attention to all data engineering activities. Some of these may be generic steps and will also apply for other use cases. Finally, you’ll need to determine the implementation integration pattern to use for your project in the production environment." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

27 November 2006

🔢Jordan Morrow - Collected Quotes

"A data visualization, or dashboard, is great for summarizing or describing what has gone on in the past, but if people don’t know how to progress beyond looking just backwards on what has happened, then they cannot diagnose and find the ‘why’ behind it." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Along with the important information that executives need to be data literate, there is one other key role they play: executives drive data literacy learning and initiatives at the organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data fluency, as defined in this book, is the ability to speak and understand the language of data; it is essentially an ability to communicate with and about data. In different cases around the world, the term data fluency has sometimes been used interchangeably with data literacy. That is not the approach of this book. This book looks to define data literacy as the ability to read, work with, analyze, and communicate with data. Data fluency is the ability to speak and understand the language of data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy empowers us to know the usage of data and how an algorithm can potentially be misleading, biased, and so forth; data literacy empowers us with the right type of skepticism that is needed to question everything." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy is for the masses, and data visualization is powerful to simplify what could be very complicated." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy is not a change in an individual’s abilities, talents, or skills within their careers, but more of an enhancement and empowerment of the individual to succeed with data. When it comes to data and analytics succeeding in an organization’s culture, the increase in the workforces’ skills with data literacy will help individuals to succeed with the strategy laid in front of them. In this way, organizations are not trying to run large change management programs; the process is more of an evolution and strengthening of individual’s talents with data. When we help individuals do more with data, we in turn help the organization’s culture do more with data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"[...] data literacy is the ability to read, work with, analyze, and communicate with data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data science is, in reality, something that has been around for a very long time. The desire to utilize data to test, understand, experiment, and prove out hypotheses has been around for ages. To put it simply: the use of data to figure things out has been around since a human tried to utilize the information about herds moving about and finding ways to satisfy hunger. The topic of data science came into popular culture more and more as the advent of ‘big data’ came to the forefront of the business world." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data scientists are advanced in their technical skills. They like to do coding, statistics, and so forth. In its purest form, data science is where an individual uses the scientific method on data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data visualization is a simplified approach to studying data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Ensure you build into your data literacy strategy learning on data quality. If the individuals who are using and working with data do not understand the purpose and need for data quality, we are not sitting in a strong position for great and powerful insight. What good will the insight be, if the data has no quality within the model?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"I agree that data visualizations should be visually appealing, driving and utilizing the appeal and power for individuals to utilize it effectively, but sometimes this can take too much time, taking it away from more valuable uses in data. Plus, if the data visualization is not moving the needle of a business goal or objective, how effective is that visualization?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021

"I think sometimes organizations are looking at tools or the mythical and elusive data driven culture to be the strategy. Let me emphasize now: culture and tools are not strategies; they are enabling pieces." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"In the world of data and analytics, people get enamored by the nice, shiny object. We are pulled around by the wind of the latest technology, but in so doing we are pulled away from the sound and intelligent path that can lead us to data and analytical success. The data and analytical world is full of examples of overhyped technology or processes, thinking this thing will solve all of the data and analytical needs for an individual or organization. Such topics include big data or data science. These two were pushed into our minds and down our throats so incessantly over the past decade that they are somewhat of a myth, or people finally saw the light. In reality, both have a place and do matter, but they are not the only solution to your data and analytical needs. Unfortunately, though, organizations bit into them, thinking they would solve everything, and were left at the alter, if you will, when it came time for the marriage of data and analytical success with tools." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"One main reason descriptive analytics is so prevalent is the lack of data literacy skills that exist in the world. If one thinks about it, if you do not have a good understanding of how to use data, then how are you going to be good at the four levels of analytics?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Overall [...] everyone also has a need to analyze data. The ability to analyze data is vital in its understanding of product launch success. Everyone needs the ability to find trends and patterns in the data and information. Everyone has a need to ‘discover or reveal (something) through detailed examination’, as our definition says. Not everyone needs to be a data scientist, but everyone needs to drive questions and analysis. Everyone needs to dig into the information to be successful with diagnostic analytics. This is one of the biggest keys of data literacy: analyzing data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Statistics is a field of probabilities and sometimes probabilities do not go the way we want." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"The process of asking, acquiring, analyzing, integrating, deciding, and iterating should become second nature to you. This should be a part of how you work on a regular basis with data literacy. Again, without a decision, what is the purpose of data literacy? Data literacy should lead you as an individual, and organizations, to make smarter decisions." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"The reality is, the majority of a workforce doesn’t need to be data scientists, they just need comfort with data literacy." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"When it comes to data literacy learning, there is one key aspect to ensure the program and project works and is successful: the role of leadership. It’s unlikely a project will succeed if you fail to secure the full buy-in from those in charge." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"When we are empowered with skills in data literacy, we have the ability to understand where our data is going, how it is being utilized, and so forth. Then, we can make smarter, data literacy informed decisions with regards to how we log in, create accounts and so forth. Data literacy gives a direct empowerment towards our personal data usage." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

🔢Mike Fleckenstein - Collected Quotes

"A big part of data governance should be about helping people (business and technical) get their jobs done by providing them with resources to answer their questions, such as publishing the names of data stewards and authoritative sources and other metadata, and giving people a way to raise, and if necessary escalate, data issues that are hindering their ability to do their jobs. Data governance helps answer some basic data management questions." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"A data lake is a storage repository that holds a very large amount of data, often from diverse sources, in native format until needed. In some respects, a data lake can be compared to a staging area of a data warehouse, but there are key differences. Just like a staging area, a data lake is a conglomeration point for raw data from diverse sources. However, a staging area only stores new data needed for addition to the data warehouse and is a transient data store. In contrast, a data lake typically stores all possible data that might be needed for an undefined amount of analysis and reporting, allowing analysts to explore new data relationships. In addition, a data lake is usually built on commodity hardware and software such as Hadoop, whereas traditional staging areas typically reside in structured databases that require specialized servers." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Data governance presents a clear shift in approach, signals a dedicated focus on data management, distinctly identifies accountability for data, and improves communication through a known escalation path for data questions and issues. In fact, data governance is central to data management in that it touches on essentially every other data management function. In so doing, organizational change will be brought to a group is newly - and seriously - engaging in any aspect of data management." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Data is owned by the enterprise, not by systems or individuals. The enterprise should recognize and formalize the responsibilities of roles, such as data stewards, with specific accountabilities for managing data. A data governance framework and guidelines must be developed to allow data stewards to coordinate with their peers and to communicate and escalate issues when needed. Data should be governed cooperatively to ensure that the interests of data stewards and users are represented and also that value to the enterprise is maximized." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"In truth, all three of these perspectives - process, technology, and data - are needed to create a good data strategy. Each type of person approaches things differently and brings different perspectives to the table. Think of this as another aspect of diversity. Just as a multicultural team and a team with different educational backgrounds will produce a better result, so will a team that includes people with process, technology and data perspectives." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Lack of trust is closely associated with uncertainty about the quality of the data, such as its sourcing, content definition, or content accuracy. The issue is not only that the data source has quality issues, but that the issues that it may or may not have are unknown." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"The desire to collect as much data as possible must be balanced with an approximation of which data sources are useful to address a business issue. It is worth mentioning that often the value of internal data is high. Most internal data has been cleansed and transformed to suit the mission. It should not be overlooked simply because of the excitement of so much other available data." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Typically, a data steward is responsible for a data domain (or part of a domain) across its life cycle. He or she supports that data domain across an entire business process rather than for a specific application or a project. In this way, data governance provides the end user with a go-to resource for data questions and requests. When formally applied, data governance also holds managers and executives accountable for data issues that cannot be resolved at lower levels. Thus, it establishes an escalation path beginning with the end user. Most important, data governance determines the level - local, departmental or enterprise - at which specific data is managed. The higher the value of a particular data asset, the more rigorous its data governance." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

26 November 2006

🎯Cindi Howson - Collected Quotes

"A common misconception about BI standardization is the assumption that all users must use the same tool. It would be a mistake to pursue this strategy. Instead, successful BI companies use the right tool for the right user. For a senior executive, the right tool might be a dashboard. For a power user, it might be a business query tool. For a call center agent, it might be a custom application or a BI gadget embedded in an operational application."(Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"A key secret to making BI a killer application within your company is to provide a business intelligence environment that is flexible enough to adapt to a changing business environment at the pace of the business environment - fast and with frequent change." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"A key sign of successful business intelligence is the degree to which it impacts business performance." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Achieving a high level of data quality is hard and is affected significantly by organizational and ownership issues. In the short term, bandaging problems rather than addressing the root causes is often the path of least resistance." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Attracting the best people and keeping the BI team motivated is only possible when the importance of BI is recognized by senior management. When it’s not, the best BI people will leave." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Business intelligence tools can only present the facts. Removing biases and other errors in decision making are dynamics of company culture that affect how well business intelligence is used." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Communicate loudly and widely where there are data quality problems and the associated risks with deploying BI tools on top of bad data. Also advise the different stakeholders on what can be done to address data quality problems - systematically and organizationally. Complaining without providing recommendations fixes nothing." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Data quality is such an important issue, and yet one that is not well understood or that excites business users. It’s often perceived as being a problem for IT to handle when it’s not: it’s for the business to own and correct." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Depending on the extent of the data quality issues, be careful about where you deploy BI. Without a reasonable degree of confidence in the data quality, BI should be kept in the hands of knowledge workers and not extended to frontline workers and certainly not to customers and suppliers. Deploy BI in this limited fashion as data quality issues are gradually exposed, understood, and ultimately, addressed. Don’t wait for every last data quality issue to be resolved; if you do, you will never deliver any BI capabilities, business users will never see the problem, and quality will never improve." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Even if you have previously tried to engage tech-wary users and were met with a lackluster response, try again. Technical and information literacy is evolutionary. BI tools have gotten significantly easier to use with more interface options to suit diverse user requirements, even for users with less affinity for information technology." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Knowledge workers and BI experts must continually evaluate the reports, dashboards, alerts, and other mechanisms for disseminating factual information to ensure the design facilitates insight." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"I would argue that every BI deployment needs an OLAP component; not only is it necessary to facilitate analysis, but also it can significantly reduce the number of reports either IT developers or business users have to create." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"If you give users with low data literacy access to a business query tool and they create incorrect queries because they didn’t understand the different ways revenue could be calculated, the BI tool will be perceived as delivering bad data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Successful BI companies start with a vision - whether it’s to improve air travel, improve patient care, or drive synergies. The business sees an opportunity to exploit the data to fulfill a broader vision. The detail requirements are not precisely known. Creativity and exploration are necessary ingredients to unlock these business opportunities and fulfill those visions." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Successful business intelligence is influenced by both technical aspects and organizational aspects. In general, companies rate organizational aspects (such as executive level sponsorship) as having a higher impact on success than technical aspects. And yet, even if you do everything right from an organizational perspective, if you don’t have high quality, relevant data, your BI initiative will fail." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The data architecture is the most important technical aspect of your business intelligence initiative. Fail to build an information architecture that is flexible, with consistent, timely, quality data, and your BI initiative will fail. Business users will not trust the information, no matter how powerful and pretty the BI tools. However, sometimes it takes displaying that messy data to get business users to understand the importance of data quality and to take ownership of a problem that extends beyond business intelligence, to the source systems and to the organizational structures that govern a company’s data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The frustration and divide between the business and IT has ramifications far beyond business intelligence. Yet given the distinct aspect of this technology, lack of partnership has a more profound effect in BI’s success. As both sides blame one another, a key secret to reducing blame and increasing understanding is to recognize how these two sides are different." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The problem is when biases and inaccurate data also get filtered into the gut. In this case, the gut-feel decision making should be supported with objective data, or errors in decision making may occur." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"There is one crucial aspect of extending the reach of business intelligence that has nothing to do with technology and that is Relevance. Understanding what information someone needs to do a job or to complete a task is what makes business intelligence relevant to that person. Much of business intelligence thus far has been relevant to power users and senior managers but not to front/line workers, customers, and suppliers." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

🎯Margaret Y Chu - Collected Quotes

"An organization needs to know the condition and quality of its data to be more effective in fixing them and making them blissful. Unfortunately, pride, shame, and a fear of looking incompetent all play a part when people are asked to openly discuss dirty data issues. Because data are an asset, some people are unwilling to share their data. They think this gives them control and power over others. The role of politics in the organization is the dirty secret of dirty data." (Margaret Y Chu, "Blissful Data", 2004)

"Blissful data consist of information that is accurate, meaningful, useful, and easily accessible to many people in an organization. These data are used by the organization’s employees to analyze information and support their decision-making processes to strategic action. It is easy to see that organizations that have reached their goal of maximum productivity with blissful data can triumph over their competition. Thus, blissful data provide a competitive advantage." (Margaret Y Chu, "Blissful Data", 2004)

"Business rules should be simple and owned and defined by the business; they are declarative, indivisible, expressed in clear, concise language, and business oriented." (Margaret Y Chu, "Blissful Data", 2004)

"Clear goals, multiple strategies, clear roles and responsibilities, boldness, teamwork, speed, flexibility, the ability to change, managing risk, and seizing opportunities when they arise are important characteristics in gaining objectives." (Margaret Y Chu, "Blissful Data", 2004)

"[…] dirt and stains are more noticeable on white or light-colored clothing. In the same way, dirty data and data quality issues have existed for a long time. But due to the inherent nature of operational data these issues have not been as visible or immense enough to affect the bottom line. Just as dark clothing hides spills and stains, dirty data have been hidden or ignored in operational data for decades." (Margaret Y Chu, "Blissful Data", 2004)

"Gauging the quality of the operational data becomes an important first step in predicting potential dirty data issues for an organization. But many organizations are reluctant to commit the time and expense to assess their data. Some organizations wait until dirty data issues blow up in their faces. The greater the pain being experienced, the bigger the commitment to improving data quality." (Margaret Y Chu, "Blissful Data", 2004)

"[...] incomplete, inaccurate, and invalid data can cause problems for an organization. These problems are not only embarrassing and awkward but will also cause the organization to lose customers, new opportunities, and market share." (Margaret Y Chu, "Blissful Data", 2004)

"Let’s define dirty data as: ‘… data that are incomplete, invalid, or inaccurate’. In other words, dirty data are simply data that are wrong. […] Incomplete or inaccurate data can result in bad decisions being made. Thus, dirty data are the opposite of blissful data. Problems caused by dirty data are significant; be wary of their pitfalls." (Margaret Y Chu, "Blissful Data", 2004)

"Organizations must know and understand the current organizational culture to be successful at implementing change. We know that it is the organization’s culture that drives its people to action; therefore, management must understand what motivates their people to attain goals and objectives. Only by understanding the current organizational culture will it be possible to begin to try and change it." (Margaret Y Chu, "Blissful Data", 2004)

"Processes must be implemented to prevent bad data from entering the system as well as propagating to other systems. That is, dirty data must be intercepted at its source. The operational systems are often the source of informational data; thus dirty data must be fixed at the operational data level. Implementing the right processes to cleanse data is, however, not easy." (Margaret Y Chu, "Blissful Data", 2004)

"So business rules are just like house rules. They are policies of an organization and contain one or more assertions that define or constrain some aspect of the business. Their purpose is to provide a structure and guideline to control or influence the behavior of the organization. Further, business rules represent the business and guide the decisions that are made by the people in the organization." (Margaret Y Chu, "Blissful Data", 2004)

"Vision and mission statements are important, but they are not an organization’s culture; they are its goals. A vision is the ideal they are striving to achieve. There may be a huge gap between the ideal and the current state of actions and behaviors."(Margaret Y Chu, "Blissful Data", 2004)

"What management notices and rewards is the best indication of the organization’s culture." (Margaret Y Chu, "Blissful Data", 2004)

🔢James Serra - Collected Quotes

"A common data model (CDM) is a standardized structure for storing and organizing data that is typically used when building a data warehouse solution. It provides a consistent way to represent data within tables and relationships between tables, making it easy for any system or application to understand the data." (James Serra, "Deciphering Data Architectures", 2024)

"A data architecture defines a high-level architectural approach and concept to follow, outlines a set of technologies to use, and states the flow of data that will be used to build your data solution to capture big data. [...] Data architecture refers to the overall design and organization of data within an information system." (James Serra, "Deciphering Data Architectures", 2024)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules." (James Serra, "Deciphering Data Architectures", 2024)

"At its core, a data fabric is an architectural framework, designed to be employed within one or more domains inside a data mesh. The data mesh, however, is a holistic concept, encompassing technology, strategies, and methodologies." (James Serra, "Deciphering Data Architectures", 2024)

"Be aware that data product is not the same thing as data as a product. Data as a product describes the idea that data owners treat data as a fully contained product that they are responsible for, rather than a byproduct of a process that others manage, and should make the data available to other domains and consumers. Data product refers to the architecture of implementing data as a product." (James Serra, "Deciphering Data Architectures", 2024)

"Choosing the right data ingestion strategy is a significant business decision that partially determines how well your organization can leverage its data for business decision making and operations. The stakes are high; the wrong strategy can lead to poor data quality, performance issues, increased costs, and even regulatory compliance breaches." (James Serra, "Deciphering Data Architectures", 2024)

"Data governance is the overall management of data in an organization. It involves establishing policies and procedures for collecting, storing, securing, transforming, and reporting data." (James Serra, "Deciphering Data Architectures", 2024)

"Delta Lake is a transactional storage software layer that runs on top of an existing data lake and adds RDW-like features that improve the lake’s reliability, security, and performance. Delta Lake itself is not storage. In most cases, it’s easy to turn a data lake into a Delta Lake; all you need to do is specify, when you are storing data to your data lake, that you want to save it in Delta Lake format (as opposed to other formats, like CSV or JSON)." (James Serra, "Deciphering Data Architectures", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures." (James Serra, "Deciphering Data Architectures", 2024)

"The data fabric architecture is an evolution of the modern data warehouse (MDW) architecture: an advanced layer built onto the MDW to enhance data accessibility, security, discoverability, and availability. [...] The most important aspect of the data fabric philosophy is that a data fabric solution can consume any and all data within the organization." (James Serra, "Deciphering Data Architectures", 2024)

"The goal of any data architecture solution you build should be to make it quick and easy for any end user, no matter what their technical skills are, to query the data and to create reports and dashboards." (James Serra, "Deciphering Data Architectures", 2024)

"The term data lakehouse is a portmanteau (blend) of data lake and data warehouse. [...] The concept of a lakehouse is to get rid of the relational data warehouse and use just one repository, a data lake, in your data architecture." (James Serra, "Deciphering Data Architectures", 2024)

"With all the hype, you would think building a data mesh is the answer to all of these 'problems' with data warehousing. The truth is that while data warehouse projects do fail, it is rarely because they can’t scale enough to handle big data or because the architecture or the technology isn’t capable. Failure is almost always because of problems with the people and/or the process, or that the organization chose the completely wrong technology." (James Serra, "Deciphering Data Architectures", 2024)

25 November 2006

🔢Arkady Maydanchik - Collected Quotes

"Data cleansing is dangerous mainly because data quality problems are usually complex and interrelated. Fixing one problem may create many others in the same or other related data elements." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Data quality program is a collection of initiatives with the common objective of maximizing data quality and minimizing negative impact of the bad data. [...] objective of any data quality program is to ensure that data quality docs not deteriorate during conversion and consolidation projects, Ideally, we would like to do even more and use the opportunity to improve data quality since data cleansing is much easier to perform before conversion than afterwards." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Databases rarely begin their life empty. More often the starting point in their lifecycle is a data conversion from some previously exiting data source. And by a cruel twist of fate, it is usually a rather violent beginning. Data conversion usually takes the better half of new system implementation effort and almost never goes smoothly." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"[...] data conversion is the most difficult part of any system implementation. The error rate in a freshly populated new database is often an order of magnitude above that of the old system from which the data is converted. As a major source of the data problems, data conversion must be treated with the utmost respect it deserves." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Equally critical is to include data quality definition and acceptable quality benchmarks into the conversion specifications. No product design skips quality specifications. including quality metrics and benchmarks. Yet rare data conversion follows suit. As a result, nobody knows how successful the conversion project was until data errors get exposed in the subsequent months and years. The solution is to perform comprehensive data quality assessment of the target data upon conversion and compare the results with pre-defined benchmarks." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"More and more data is exchanged between the systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to Other downstream databases. The advantage is immediate propagation of data to all relevant databases. Data is less likely to be out-of-sync. [...] The basic problem is that data is propagated too fast. There is little time to verify that the data is accurate. At best, the validity of individual attributes is usually checked. Even if a data problem can be identified. there is often nobody at the other end of the line to react. The transaction must be either accepted or rejectcd (whatever the consequences). If data is rejected, it may be lost forever!" (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Much data in databases has a long history. It might have come from old 'legacy' systems or have been changed several times in the past. The usage of data fields and value codes changes over time. The same value in the same field will mean totally different thing in different records. Knowledge or these facts allows experts to use the data properly. Without this knowledge, the data may bc used literally and with sad consequences. The same is about data quality. Data users in the trenches usually know good data from bad and can still use it efficiently. They know where to look and what to check. Without these experts, incorrect data quality assumptions are often made and poor data quality becomes exposed." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The big part of the challenge is that data quality does not improve by itself or as a result of general IT advancements. Over the years, the onus of data quality improvement was placed on modern database technologies and better information systems. [...] In reality, most IT processes affect data quality negatively, Thus, if we do nothing, data quality will continuously deteriorate to the point where the data will become a huge liability." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The corporate data universe consists of numerous databases linked by countless real-time and batch data feeds. The data continuously move about and change. The databases are endlessly redesigned and upgraded, as are the programs responsible for data exchange. The typical result of this dynamic is that information systems get better, while data deteriorates. This is very unfortunate since it is the data quality that determines the intrinsic value of the data to the business and consumers. Information technology serves only as a magnifier for this intrinsic value. Thus, high quality data combined with effective technology is a great asset, but poor quality data combined with effective technology is an equally great liability." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The greatest challenge in data conversion is that actual content and structure of the source data is rarely understood. More often data transformation algorithms rely on the theoretical data definitions and data models, Since this information is usually incomplete, outdated, and incorrect, the converted data look nothing like what is expected. Thus, data quality plummets. The solution is to precede conversion with extensive data profiling and analysis. In fact, data quality after conversion is in direct (or even exponential) relation with the amount of knowledge about actual data you possess. Lack of in-depth analysis will guarantee significant loss of data quality." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The main tool of a data quality assessment professional is a data quality rule - a constraint that validates a data element or a relationship between several data elements and can be implemented in a computer program. [...] The solution relies on the design and implementation of hundreds and thousands of such data quality rules, and then using them to identify all data inconsistencies. Miraculously, a well-designed and fine-tuned collection of rules will identify a majority Of data errors in a fraction or time compared with manual validation. In fact, it never takes more than a few months to design and implement the rules and produce comprehensive error reports, What is even better, the same setup can be reused over and over again to reassess data quality periodically with minimal effort." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Using data quality rules brings comprehensive data quality assessment from fantasy world to reality. However, it is by no means simple, and it takes a skillful skipper to navigate through the powerful currents and maelstroms along the way. Considering the volume and structural complexity of a typical database, designing a comprehensive set of data quality rules is a daunting task. The number Of rules will often reach hundreds or even thousands. When some rules are missing, the results of the data quality assessment can be completely jeopardized, Thus the first challenge is to design all rules and make sure that they indeed identify all or most errors." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"While we might attempt to identify and correct most data errors, as well as try to prevent others from entering the database, the data quality will never be perfect. Perfection is practically unattainable in data quality as with the quality of most other products. In truth, it is also unnecessary since at some point improving data quality becomes more expensive than leaving it alone. The more efficient our data quality program, the higher level of quality we will achieve- but never will it reach 100%. However, accepting imperfection is not the same as ignoring it. Knowledge of the data limitations and imperfections can help use the data wisely and thus save time and money, The challenge, of course, is making this knowledge organized and easily accessible to the target users. The solution is a comprehensive integrated data quality meta data warehouse." (Arkady Maydanchik, "Data Quality Assessment", 2007)

24 November 2006

🔢Rupa Mahanti - Collected Quotes

"A data model is a formal organized representation of real-world entities, focused on the definition of an object and its associated attributes and relationships between the entities. Data models should be designed consistently and coherently. They should not only meet requirements, but should also enable data consumers to better understand the data." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Bad data are expensive: my best estimate is that it costs a typical company 20% of revenue. Worse, they dilute trust - who would trust an exciting new insight if it is based on poor data! And worse still, sometimes bad data are simply dangerous; look at the damage brought on by the financial crisis, which had its roots in bad data." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Conformity, or validity, means the data comply with a set of internal or external standards or guidelines or standard data definitions, including metadata definitions. Comparison between the data items and metadata enables measuring the degree of conformity." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data-intensive projects generally involve at least one person who understands all the nuances of the application, process, and source and target data. These are the people who also know about all the abnormalities in the data and the workarounds to deal with them, and are the experts. This is especially true in the case of legacy systems that store and use data in a manner it should not be used. The knowledge is not documented anywhere and is usually inside the minds of the people. When the experts leave, with no one having a true understanding of the data, the data are not used properly and everything goes haywire." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data are collections of facts, such as numbers, words, measurements, observations, or descriptions of real-world objects, phenomena, or events and their attributes. Data are qualitative when they contain descriptive information, and quantitative when they contain numerical information." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data migration generally involves the transfer of data from an existing data source to a new database or to a new schema within the same database. [...] Data migration projects deal with the migration of data from one data structure to another data structure, or data transformed from one platform to another platform with modified data structure." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data quality is the capability of data to satisfy the stated business, system, and technical requirements of an enterprise. Data quality is an insight into or an evaluation of data’s fitness to serve their purpose in a given context. Data quality is accomplished when a business uses data that are complete, relevant, and timely. The general definition of data quality is 'fitness for use', or more specifically, to what extent some data successfully serve the purposes of the user." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Lack of a standard process to address business requirements and business process improvements, poorly designed and implemented business processes that result in lack of training, coaching, and communication in the use of the process, and unclear definition of subprocess or process ownership, roles, and responsibilities have an adverse impact on data quality." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"The degree of data quality excellence that should be attained and sustained is driven by the criticality of the data, the business need and the cost and time to achieve the defined degree of data quality." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"To understand why data quality is important, we need to understand the categorization of data, the current quality of data and how is it different from the quality of manufacturing processes, the business impact of bad data and cost of poor data quality, and possible causes of data quality issues." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

23 November 2006

🔢Neera Bhansali - Collected Quotes

"Are data quality and data governance the same thing? They share the same goal, essentially striving for the same outcome of optimizing data and information results for business purposes. Data governance plays a very important role in achieving high data quality. It deals primarily with orchestrating the efforts of people, processes, objectives, technologies, and lines of business in order to optimize outcomes around enterprise data assets. This includes, among other things, the broader cross-functional oversight of standards, architecture, business processes, business integration, and risk and compliance. Data governance is an organizational structure that oversees the compliance and standards of enterprise data." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Data governance is about putting people in charge of fixing and preventing data issues and using technology to help aid the process. Any time data is synchronized, merged, and exchanged, there have to be ground rules guiding this. Data governance serves as the method to organize the people, processes, and technologies for data-driven programs like data quality; they are a necessary part of any data quality effort." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Data governance is the process of creating and enforcing standards and policies concerning data. [...] The governance process isn't a transient, short-term project. The governance process is a continuing enterprise-focused program." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Having data quality as a focus is a business philosophy that aligns strategy, business culture, company information, and technology in order to manage data to the benefit of the enterprise. Data quality is an elusive subject that can defy measurement and yet be critical enough to derail a single IT project, strategic initiative, or even an entire company." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Understanding an organization's current processes and issues is not enough to build an effective data governance program. To gather business, functional, and technical requirements, understanding the future vision of the business or organization is important. This is followed with the development of a visual prototype or logical model, independent of products or technology, to demonstrate the data governance process. This business-driven model results in a definition of enterprise-wide data governance based on key standards and processes. These processes are independent of the applications and of the tools and technologies required to implement them. The business and functional requirements, the discovery of business processes, along with the prototype or model, provide an impetus to address the "hard" issues in the data governance process." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

🔢Saurabh Gupta - Collected Quotes

"A data warehouse follows a pre-built static structure to model source data. Any changes at the structural and configuration level must go through a stringent business review process and impact analysis. Data lakes are very agile. Consumption or analytical layer can be modified to fit in the model requirements. Consumers of a data lake are not constant; therefore, schema and modeling lies at the liberty of analysts and scientists." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data in the data lake should never get disposed. Data driven strategy must define steps to version the data and handle deletes and updates from the source systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data governance policies must not enforce constraints on data - Data governance intends to control the level of democracy within the data lake. Its sole purpose of existence is to maintain the quality level through audits, compliance, and timely checks. Data flow, either by its size or quality, must not be constrained through governance norms. [...] Effective data governance elevates confidence in data lake quality and stability, which is a critical factor to data lake success story. Data compliance, data sharing, risk and privacy evaluation, access management, and data security are all factors that impact regulation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake induces accessibility and catalyzes availability. It warrants data discovery platforms to soak the data trends at a horizontal scale and produce visual insights. It largely cuts down the time that goes into data preparation and exhaustive data analysis." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake is a single window snapshot of all enterprise data in its raw format, be it structured, semi-structured, or unstructured. Starting from curating the data ingestion pipeline to the transformation layer for analytical consumption, every aspect of data gets addressed in a data lake ecosystem. It is supposed to hold enormous volumes of data of varied structures." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data lake is an ecosystem for the realization of big data analytics. What makes data lake a huge success is its ability to contain raw data in its native format on a commodity machine and enable a variety of data analytics models to consume data through a unified analytical layer. While the data lake remains highly agile and data-centric, the data governance council governs the data privacy norms, data exchange policies, and the ensures quality and reliability of data lake." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data swamp, on the other hand, presents the devil side of a lake. A data lake in a state of anarchy is nothing but turns into a data swamp. It lacks stable data governance practices, lacks metadata management, and plays weak on ingestion framework. Uncontrolled and untracked access to source data may produce duplicate copies of data and impose pressure on storage systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data warehousing, as we are aware, is the traditional approach of consolidating data from multiple source systems and combining into one store that would serve as the source for analytical and business intelligence reporting. The concept of data warehousing resolved the problems of data heterogeneity and low-level integration. In terms of objectives, a data lake is no different from a data warehouse. Both are primary advocates of terms like 'single source of truth' and 'central data repository'." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Metadata is the key to effective data governance. Metadata in this context is the data that defines the structure and attributes of data. This could mean data types, data privacy attributes, scale, and precision. In general, quality of data is directly proportional to the amount and depth of metadata provided. Without metadata, consumers will have to depend on other sources and mechanisms." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"The quality of data that flows within a data pipeline is as important as the functionality of the pipeline. If the data that flows within the pipeline is not a valid representation of the source data set(s), the pipeline doesn’t serve any real purpose. It’s very important to incorporate data quality checks within different phases of the pipeline. These checks should verify the correctness of data at every phase of the pipeline. There should be clear isolation between checks at different parts of the pipeline. The checks include checks like row count, structure, and data type validation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

22 November 2006

🎯William H Inmon - Collected Quotes

"There are four levels of data in the architected environment - the operational level, the atomic (or the data warehouse) level, the departmental (or the data mart) level, and the individual level. These different levels of data are the basis of a larger architecture called the corporate information factory (CIF). The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community. The data-warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there. The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department. And the individual level of data is where much heuristic analysis is done." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"To interpret and understand information over time, a whole new dimension of context is required. While content of information remains important, the comparison and understanding of information over time mandates that context be an equal partner to content. And in years past, context has been an undiscovered, unexplored dimension of information." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"When management receives the conflicting reports, it is forced to make decisions based on politics and personalities because neither source is more or less credible. This is an example of the crisis of data credibility in the naturally evolving architecture." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"An interesting aspect of KPIs are that they change over time. At one moment in time the organization is interested in profitability. There will be one set of KPIs that measure profitability. At another moment in time the organization is interested in market share. There will be another set of KPIs that measure market share. As the focus of the corporation changes over time, so do the KPIs that measure that focus." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"Both the ODS and a data warehouse contain subject-oriented, integrated information. In that regard they are similar. But an ODS contains data that can be individually updated, deleted, or added. And a data warehouse contains nonvolatile data. A data warehouse contains snapshots of data. Once the snapshot is taken, the data in the data warehouse does not change. So when it comes to volatility, a data warehouse and an ODS are very different." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"In general, analytic processing is known as 'heuristic' processing. In heuristic processing the requirements for analysis are discovered by the results of the current iteration of processing. […] In heuristic processing you start with some requirements. You build a system to analyze those requirements. Then, after you have results, you sit back and rethink your requirements after you have had time to reflect on the results that have been achieved. You then restate the requirements and redevelop and reanalyze again. Each time you go through the redevelopment exercise is called an 'iteration'. You continue the process of building different iterations of processing until such time as you achieve the results that satisfy the organization that is sponsoring the exercise." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There are, however, many problems with independent data marts. Independent data marts: (1) Do not have data that can be reconciled with other data marts (2) Require their own independent integration of raw data (3) Do not provide a foundation that can be built on whenever there are future analytical needs." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There is then a real mismatch between the volume of data and the business value of data. For people who are examining repetitive data and hoping to find massive business value there, there is most likely disappointment in their future. But for people looking for business value in nonrepetitive data, there is a lot to look forward to." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"A defining characteristic of the data lakehouse architecture is allowing direct access to data as files while retaining the valuable properties of a data warehouse. Just do both!" (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"At first, we threw all of this data into a pit called the 'data lake'. But we soon discovered that merely throwing data into a pit was a pointless exercise. To be useful - to be analyzed - data needed to (1) be related to each other and (2) have its analytical infrastructure carefully arranged and made available to the end user. Unless we meet these two conditions, the data lake turns into a swamp, and swamps start to smell after a while. [...] In a data swamp, data just sits there are no one uses it. In the data swamp, data just rots over time." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data privacy, data confidentiality, and data protection are sometimes incorrectly diluted with security. For example, data privacy is related to, but not the same as, data security. Data security is concerned with assuring the confidentiality, integrity, and availability of data. Data privacy focuses on how and to what extent businesses may collect and process information about individuals." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data visualization adds credibility to any message. [...] Data visualizations are incredibly cold mediums because they require a lot of interpretation and participation from the audience. While boring numbers are authoritative, data visualization is inclusive. [...] Data visualizations absorb the viewer in the chart and communicate the author’s credibility through active participation. Like a good teacher, they walk the reader through the thought process and convince him/her effortlessly."

"Data visualization‘s key responsibilities and challenges include the obligation to earn your audience’s attention - do not take it for granted." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"In general, a data or data set contains its sensitivity or controversial nature only if it is linked or related to an individual’s personal information. Else an isolated, abandoned, or unrelated sensitive or controversial attribute has no significance."

"It is dangerous to do an analysis and merge data with very different quality profiles. As a general rule, the veracity of merged data is only as good as the worst data that has been merged. [...] Not knowing the quality of the data being analyzed jeopardizes the entire analysis." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Once you combine the data lake along with analytical infrastructure, the entire infrastructure can be called a data lakehouse. [...] The data lake without the analytical infrastructure simply becomes a data swamp. And a data swamp does no one any good." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"The data lakehouse architecture presents an opportunity comparable to the one seen during the early years of the data warehouse market. The unique ability of the lakehouse to manage data in an open environment, blend all varieties of data from all parts of the enterprise, and combine the data science focus of the data lake with the end user analytics of the data warehouse will unlock incredible value for organizations. [...] "The lakehouse architecture equally makes it natural to manage and apply models where the data lives." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Raw data without appropriate visualization is like dumped construction raw materials at a building construction site. The finished house is the actual visuals created from those data like raw materials." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"With the data lakehouse, it is possible to achieve a level of analytics and machine learning that is not feasible or possible any other way. But like all architectural structures, the data lakehouse requires an understanding of architecture and an ability to plan and create a blueprint." (Bill Inmon et al, "Building the Data Lakehouse", 2021)