30 November 2006

🎯Zachary Karabell - Collected Quotes

"Culture is fuzzy, easy to caricature, amenable to oversimplifications, and often used as a catchall when all other explanations fail." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Defining an indicator as lagging, coincident, or leading is connected to another vital notion: the business cycle. Indicators are lagging or leading based on where economists believe we are in the business cycle: whether we are heading into a recession or emerging from one." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] economics is a profession grounded in the belief that 'the economy' is a machine and a closed system. The more clearly that machine is understood, the more its variables are precisely measured, the more we will be able to manage and steer it as we choose, avoiding the frenetic expansions and sharp contractions. With better indicators would come better policy, and with better policy, states would be less likely to fall into depression and risk collapse." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"[…] humans make mistakes when they try to count large numbers in complicated systems. They make even greater errors when they attempt - as they always do - to reduce complicated systems to simple numbers." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"In the absence of clear information - in the absence of reliable statistics - people did what they had always done: filtered available information through the lens of their worldview." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Most people do not relate to or retain columns of numbers, however much those numbers reflect something that they care about deeply. Statistics can be cold and dull." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Our needs going forward will be best served by how we make use of not just this data but all data. We live in an era of Big Data. The world has seen an explosion of information in the past decades, so much so that people and institutions now struggle to keep pace. In fact, one of the reasons for the attachment to the simplicity of our indicators may be an inverse reaction to the sheer and bewildering volume of information most of us are bombarded by on a daily basis. […] The lesson for a world of Big Data is that in an environment with excessive information, people may gravitate toward answers that simplify reality rather than embrace the sheer complexity of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics are meaningless unless they exist in some context. One reason why the indicators have become more central and potent over time is that the longer they have been kept, the easier it is to find useful patterns and points of reference." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics are what humans do with the data they assemble; they are constructs meant to make sense of information. But the raw material is itself equally valuable, and rarely do we make sufficient use of it." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Statistics represents the fusion of mathematics with the collection and analysis of data." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The concept that an economy (1) is characterized by regular cycles that (2) follow familiar patterns (3) illuminated by a series of statistics that (4) determine where we are in that cycle has become part and parcel of how we view the world." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The indicators - through no particular fault of anyone in particular - have not kept up with the changing world. As these numbers have become more deeply embedded in our culture as guides to how we are doing, we rely on a few big averages that can never be accurate pictures of complicated systems for the very reason that they are too simple and that they are averages. And we have neither the will nor the resources to invent or refine our current indicators enough to integrate all of these changes." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"The search for better numbers, like the quest for new technologies to improve our lives, is certainly worthwhile. But the belief that a few simple numbers, a few basic averages, can capture the multifaceted nature of national and global economic systems is a myth. Rather than seeking new simple numbers to replace our old simple numbers, we need to tap into both the power of our information age and our ability to construct our own maps of the world to answer the questions we need answering." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"We don’t need new indicators that replace old simple numbers with new simple numbers. We need instead bespoke indicators, tailored to the specific needs and specific questions of governments, businesses, communities, and individuals." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"When statisticians, trained in math and probability theory, try to assess likely outcomes, they demand a plethora of data points. Even then, they recognize that unless it’s a very simple and controlled action such as flipping a coin, unforeseen variables can exert significant influence." (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

"Yet our understanding of the world is still framed by our leading indicators. Those indicators define the economy, and what they say becomes the answer to the simple question 'Are we doing well?'" (Zachary Karabell, "The Leading Indicators: A short history of the numbers that rule our world", 2014)

28 November 2006

🎯Piethein Strengholt - Collected Quotes

"For advanced analytics, a well-designed data pipeline is a prerequisite, so a large part of your focus should be on automation. This is also the most difficult work. To be successful, you need to stitch everything together." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"One of the patterns from domain-driven design is called bounded context. Bounded contexts are used to set the logical boundaries of a domain’s solution space for better managing complexity. It’s important that teams understand which aspects, including data, they can change on their own and which are shared dependencies for which they need to coordinate with other teams to avoid breaking things. Setting boundaries helps teams and developers manage the dependencies more efficiently." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"The logical boundaries are typically explicit and enforced on areas with clear and higher cohesion. These domain dependencies can sit on different levels, such as specific parts of the application, processes, associated database designs, etc. The bounded context, we can conclude, is polymorphic and can be applied to many different viewpoints. Polymorphic means that the bounded context size and shape can vary based on viewpoint and surroundings. This also means you need to be explicit when using a bounded context; otherwise it remains pretty vague." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"The transformation of a monolithic application into a distributed application creates many challenges for data management." (Piethein Strengholt, "Data Management at Scale: Best Practices for Enterprise Architecture", 2020)

"A domain aggregate is a cluster of domain objects that can be treated as a single unit. When you have a collection of objects of the same format and type that are used together, you can model them as a single object, simplifying their usage for other domains." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Decentralization involves risks, because the more you spread out activities across the organization, the harder it gets to harmonize strategy and align and orchestrate planning, let alone foster the culture and recruit the talent needed to properly manage your data." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"Enterprises have difficulties in interpreting new concepts like the data mesh and data fabric, because pragmatic guidance and experiences from the field are missing. In addition to that, the data mesh fully embraces a decentralized approach, which is a transformational change not only for the data architecture and technology, but even more so for organization and processes. This means the transformation cannot only be led by IT; it’s a business transformation as well." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"The data fabric is an approach that addresses today’s data management and scalability challenges by adding intelligence and simplifying data access using self-service. In contrast to the data mesh, it focuses more on the technology layer. It’s an architectural vision using unified metadata with an end-to-end integrated layer (fabric) for easily accessing, integrating, provisioning, and using data."  (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

"The data mesh is an exciting new methodology for managing data at large. The concept foresees an architecture in which data is highly distributed and a future in which scalability is achieved by federating responsibilities. It puts an emphasis on the human factor and addressing the challenges of managing the increasing complexity of data architectures." (Piethein Strengholt, "Data Management at Scale: Modern Data Architecture with Data Mesh and Data Fabric" 2nd Ed., 2023)

27 November 2006

🔢Jordan Morrow - Collected Quotes

"A data visualization, or dashboard, is great for summarizing or describing what has gone on in the past, but if people don’t know how to progress beyond looking just backwards on what has happened, then they cannot diagnose and find the ‘why’ behind it." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Along with the important information that executives need to be data literate, there is one other key role they play: executives drive data literacy learning and initiatives at the organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data fluency, as defined in this book, is the ability to speak and understand the language of data; it is essentially an ability to communicate with and about data. In different cases around the world, the term data fluency has sometimes been used interchangeably with data literacy. That is not the approach of this book. This book looks to define data literacy as the ability to read, work with, analyze, and communicate with data. Data fluency is the ability to speak and understand the language of data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy empowers us to know the usage of data and how an algorithm can potentially be misleading, biased, and so forth; data literacy empowers us with the right type of skepticism that is needed to question everything." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy is for the masses, and data visualization is powerful to simplify what could be very complicated." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data literacy is not a change in an individual’s abilities, talents, or skills within their careers, but more of an enhancement and empowerment of the individual to succeed with data. When it comes to data and analytics succeeding in an organization’s culture, the increase in the workforces’ skills with data literacy will help individuals to succeed with the strategy laid in front of them. In this way, organizations are not trying to run large change management programs; the process is more of an evolution and strengthening of individual’s talents with data. When we help individuals do more with data, we in turn help the organization’s culture do more with data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"[...] data literacy is the ability to read, work with, analyze, and communicate with data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data science is, in reality, something that has been around for a very long time. The desire to utilize data to test, understand, experiment, and prove out hypotheses has been around for ages. To put it simply: the use of data to figure things out has been around since a human tried to utilize the information about herds moving about and finding ways to satisfy hunger. The topic of data science came into popular culture more and more as the advent of ‘big data’ came to the forefront of the business world." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data scientists are advanced in their technical skills. They like to do coding, statistics, and so forth. In its purest form, data science is where an individual uses the scientific method on data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Data visualization is a simplified approach to studying data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Ensure you build into your data literacy strategy learning on data quality. If the individuals who are using and working with data do not understand the purpose and need for data quality, we are not sitting in a strong position for great and powerful insight. What good will the insight be, if the data has no quality within the model?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"I agree that data visualizations should be visually appealing, driving and utilizing the appeal and power for individuals to utilize it effectively, but sometimes this can take too much time, taking it away from more valuable uses in data. Plus, if the data visualization is not moving the needle of a business goal or objective, how effective is that visualization?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021

"I think sometimes organizations are looking at tools or the mythical and elusive data driven culture to be the strategy. Let me emphasize now: culture and tools are not strategies; they are enabling pieces." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"In the world of data and analytics, people get enamored by the nice, shiny object. We are pulled around by the wind of the latest technology, but in so doing we are pulled away from the sound and intelligent path that can lead us to data and analytical success. The data and analytical world is full of examples of overhyped technology or processes, thinking this thing will solve all of the data and analytical needs for an individual or organization. Such topics include big data or data science. These two were pushed into our minds and down our throats so incessantly over the past decade that they are somewhat of a myth, or people finally saw the light. In reality, both have a place and do matter, but they are not the only solution to your data and analytical needs. Unfortunately, though, organizations bit into them, thinking they would solve everything, and were left at the alter, if you will, when it came time for the marriage of data and analytical success with tools." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"One main reason descriptive analytics is so prevalent is the lack of data literacy skills that exist in the world. If one thinks about it, if you do not have a good understanding of how to use data, then how are you going to be good at the four levels of analytics?" (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Overall [...] everyone also has a need to analyze data. The ability to analyze data is vital in its understanding of product launch success. Everyone needs the ability to find trends and patterns in the data and information. Everyone has a need to ‘discover or reveal (something) through detailed examination’, as our definition says. Not everyone needs to be a data scientist, but everyone needs to drive questions and analysis. Everyone needs to dig into the information to be successful with diagnostic analytics. This is one of the biggest keys of data literacy: analyzing data." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Pure data science is the use of data to test, hypothesize, utilize statistics and more, to predict, model, build algorithms, and so forth. This is the technical part of the puzzle. We need this within each organization. By having it, we can utilize the power that these technical aspects bring to data and analytics. Then, with the power to communicate effectively, the analysis can flow throughout the needed parts of an organization." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"Statistics is a field of probabilities and sometimes probabilities do not go the way we want." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"The process of asking, acquiring, analyzing, integrating, deciding, and iterating should become second nature to you. This should be a part of how you work on a regular basis with data literacy. Again, without a decision, what is the purpose of data literacy? Data literacy should lead you as an individual, and organizations, to make smarter decisions." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"The reality is, the majority of a workforce doesn’t need to be data scientists, they just need comfort with data literacy." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"When it comes to data literacy learning, there is one key aspect to ensure the program and project works and is successful: the role of leadership. It’s unlikely a project will succeed if you fail to secure the full buy-in from those in charge." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

"When we are empowered with skills in data literacy, we have the ability to understand where our data is going, how it is being utilized, and so forth. Then, we can make smarter, data literacy informed decisions with regards to how we log in, create accounts and so forth. Data literacy gives a direct empowerment towards our personal data usage." (Jordan Morrow, "Be Data Literate: The data literacy skills everyone needs to succeed", 2021)

🔢Mike Fleckenstein - Collected Quotes

"A big part of data governance should be about helping people (business and technical) get their jobs done by providing them with resources to answer their questions, such as publishing the names of data stewards and authoritative sources and other metadata, and giving people a way to raise, and if necessary escalate, data issues that are hindering their ability to do their jobs. Data governance helps answer some basic data management questions." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"A data lake is a storage repository that holds a very large amount of data, often from diverse sources, in native format until needed. In some respects, a data lake can be compared to a staging area of a data warehouse, but there are key differences. Just like a staging area, a data lake is a conglomeration point for raw data from diverse sources. However, a staging area only stores new data needed for addition to the data warehouse and is a transient data store. In contrast, a data lake typically stores all possible data that might be needed for an undefined amount of analysis and reporting, allowing analysts to explore new data relationships. In addition, a data lake is usually built on commodity hardware and software such as Hadoop, whereas traditional staging areas typically reside in structured databases that require specialized servers." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Data governance presents a clear shift in approach, signals a dedicated focus on data management, distinctly identifies accountability for data, and improves communication through a known escalation path for data questions and issues. In fact, data governance is central to data management in that it touches on essentially every other data management function. In so doing, organizational change will be brought to a group is newly - and seriously - engaging in any aspect of data management." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Data is owned by the enterprise, not by systems or individuals. The enterprise should recognize and formalize the responsibilities of roles, such as data stewards, with specific accountabilities for managing data. A data governance framework and guidelines must be developed to allow data stewards to coordinate with their peers and to communicate and escalate issues when needed. Data should be governed cooperatively to ensure that the interests of data stewards and users are represented and also that value to the enterprise is maximized." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"In truth, all three of these perspectives - process, technology, and data - are needed to create a good data strategy. Each type of person approaches things differently and brings different perspectives to the table. Think of this as another aspect of diversity. Just as a multicultural team and a team with different educational backgrounds will produce a better result, so will a team that includes people with process, technology and data perspectives." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"Lack of trust is closely associated with uncertainty about the quality of the data, such as its sourcing, content definition, or content accuracy. The issue is not only that the data source has quality issues, but that the issues that it may or may not have are unknown." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

"The desire to collect as much data as possible must be balanced with an approximation of which data sources are useful to address a business issue. It is worth mentioning that often the value of internal data is high. Most internal data has been cleansed and transformed to suit the mission. It should not be overlooked simply because of the excitement of so much other available data." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018) 

"Typically, a data steward is responsible for a data domain (or part of a domain) across its life cycle. He or she supports that data domain across an entire business process rather than for a specific application or a project. In this way, data governance provides the end user with a go-to resource for data questions and requests. When formally applied, data governance also holds managers and executives accountable for data issues that cannot be resolved at lower levels. Thus, it establishes an escalation path beginning with the end user. Most important, data governance determines the level - local, departmental or enterprise - at which specific data is managed. The higher the value of a particular data asset, the more rigorous its data governance." (Mike Fleckenstein & Lorraine Fellows, "Modern Data Strategy", 2018)

26 November 2006

🎯Cindi Howson - Collected Quotes

"A common misconception about BI standardization is the assumption that all users must use the same tool. It would be a mistake to pursue this strategy. Instead, successful BI companies use the right tool for the right user. For a senior executive, the right tool might be a dashboard. For a power user, it might be a business query tool. For a call center agent, it might be a custom application or a BI gadget embedded in an operational application."(Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"A key secret to making BI a killer application within your company is to provide a business intelligence environment that is flexible enough to adapt to a changing business environment at the pace of the business environment - fast and with frequent change." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"A key sign of successful business intelligence is the degree to which it impacts business performance." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Achieving a high level of data quality is hard and is affected significantly by organizational and ownership issues. In the short term, bandaging problems rather than addressing the root causes is often the path of least resistance." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Attracting the best people and keeping the BI team motivated is only possible when the importance of BI is recognized by senior management. When it’s not, the best BI people will leave." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Business intelligence tools can only present the facts. Removing biases and other errors in decision making are dynamics of company culture that affect how well business intelligence is used." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Communicate loudly and widely where there are data quality problems and the associated risks with deploying BI tools on top of bad data. Also advise the different stakeholders on what can be done to address data quality problems - systematically and organizationally. Complaining without providing recommendations fixes nothing." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Data quality is such an important issue, and yet one that is not well understood or that excites business users. It’s often perceived as being a problem for IT to handle when it’s not: it’s for the business to own and correct." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Depending on the extent of the data quality issues, be careful about where you deploy BI. Without a reasonable degree of confidence in the data quality, BI should be kept in the hands of knowledge workers and not extended to frontline workers and certainly not to customers and suppliers. Deploy BI in this limited fashion as data quality issues are gradually exposed, understood, and ultimately, addressed. Don’t wait for every last data quality issue to be resolved; if you do, you will never deliver any BI capabilities, business users will never see the problem, and quality will never improve." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Even if you have previously tried to engage tech-wary users and were met with a lackluster response, try again. Technical and information literacy is evolutionary. BI tools have gotten significantly easier to use with more interface options to suit diverse user requirements, even for users with less affinity for information technology." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Knowledge workers and BI experts must continually evaluate the reports, dashboards, alerts, and other mechanisms for disseminating factual information to ensure the design facilitates insight." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"I would argue that every BI deployment needs an OLAP component; not only is it necessary to facilitate analysis, but also it can significantly reduce the number of reports either IT developers or business users have to create." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"If you give users with low data literacy access to a business query tool and they create incorrect queries because they didn’t understand the different ways revenue could be calculated, the BI tool will be perceived as delivering bad data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Successful BI companies start with a vision - whether it’s to improve air travel, improve patient care, or drive synergies. The business sees an opportunity to exploit the data to fulfill a broader vision. The detail requirements are not precisely known. Creativity and exploration are necessary ingredients to unlock these business opportunities and fulfill those visions." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"Successful business intelligence is influenced by both technical aspects and organizational aspects. In general, companies rate organizational aspects (such as executive level sponsorship) as having a higher impact on success than technical aspects. And yet, even if you do everything right from an organizational perspective, if you don’t have high quality, relevant data, your BI initiative will fail." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The data architecture is the most important technical aspect of your business intelligence initiative. Fail to build an information architecture that is flexible, with consistent, timely, quality data, and your BI initiative will fail. Business users will not trust the information, no matter how powerful and pretty the BI tools. However, sometimes it takes displaying that messy data to get business users to understand the importance of data quality and to take ownership of a problem that extends beyond business intelligence, to the source systems and to the organizational structures that govern a company’s data." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The frustration and divide between the business and IT has ramifications far beyond business intelligence. Yet given the distinct aspect of this technology, lack of partnership has a more profound effect in BI’s success. As both sides blame one another, a key secret to reducing blame and increasing understanding is to recognize how these two sides are different." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"The problem is when biases and inaccurate data also get filtered into the gut. In this case, the gut-feel decision making should be supported with objective data, or errors in decision making may occur." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

"There is one crucial aspect of extending the reach of business intelligence that has nothing to do with technology and that is Relevance. Understanding what information someone needs to do a job or to complete a task is what makes business intelligence relevant to that person. Much of business intelligence thus far has been relevant to power users and senior managers but not to front/line workers, customers, and suppliers." (Cindi Howson, "Successful Business Intelligence: Secrets to making BI a killer App", 2008)

🎯Margaret Y Chu - Collected Quotes

"An organization needs to know the condition and quality of its data to be more effective in fixing them and making them blissful. Unfortunately, pride, shame, and a fear of looking incompetent all play a part when people are asked to openly discuss dirty data issues. Because data are an asset, some people are unwilling to share their data. They think this gives them control and power over others. The role of politics in the organization is the dirty secret of dirty data." (Margaret Y Chu, "Blissful Data", 2004)

"Blissful data consist of information that is accurate, meaningful, useful, and easily accessible to many people in an organization. These data are used by the organization’s employees to analyze information and support their decision-making processes to strategic action. It is easy to see that organizations that have reached their goal of maximum productivity with blissful data can triumph over their competition. Thus, blissful data provide a competitive advantage." (Margaret Y Chu, "Blissful Data", 2004)

"Business rules should be simple and owned and defined by the business; they are declarative, indivisible, expressed in clear, concise language, and business oriented." (Margaret Y Chu, "Blissful Data", 2004)

"Clear goals, multiple strategies, clear roles and responsibilities, boldness, teamwork, speed, flexibility, the ability to change, managing risk, and seizing opportunities when they arise are important characteristics in gaining objectives." (Margaret Y Chu, "Blissful Data", 2004)

"[…] dirt and stains are more noticeable on white or light-colored clothing. In the same way, dirty data and data quality issues have existed for a long time. But due to the inherent nature of operational data these issues have not been as visible or immense enough to affect the bottom line. Just as dark clothing hides spills and stains, dirty data have been hidden or ignored in operational data for decades." (Margaret Y Chu, "Blissful Data", 2004)

"Gauging the quality of the operational data becomes an important first step in predicting potential dirty data issues for an organization. But many organizations are reluctant to commit the time and expense to assess their data. Some organizations wait until dirty data issues blow up in their faces. The greater the pain being experienced, the bigger the commitment to improving data quality." (Margaret Y Chu, "Blissful Data", 2004)

"[...] incomplete, inaccurate, and invalid data can cause problems for an organization. These problems are not only embarrassing and awkward but will also cause the organization to lose customers, new opportunities, and market share." (Margaret Y Chu, "Blissful Data", 2004)

"Let’s define dirty data as: ‘… data that are incomplete, invalid, or inaccurate’. In other words, dirty data are simply data that are wrong. […] Incomplete or inaccurate data can result in bad decisions being made. Thus, dirty data are the opposite of blissful data. Problems caused by dirty data are significant; be wary of their pitfalls."  (Margaret Y Chu, "Blissful Data", 2004)

"Organizations must know and understand the current organizational culture to be successful at implementing change. We know that it is the organization’s culture that drives its people to action; therefore, management must understand what motivates their people to attain goals and objectives. Only by understanding the current organizational culture will it be possible to begin to try and change it." (Margaret Y Chu, "Blissful Data", 2004)

"Processes must be implemented to prevent bad data from entering the system as well as propagating to other systems. That is, dirty data must be intercepted at its source. The operational systems are often the source of informational data; thus dirty data must be fixed at the operational data level. Implementing the right processes to cleanse data is, however, not easy." (Margaret Y Chu, "Blissful Data", 2004)

"So business rules are just like house rules. They are policies of an organization and contain one or more assertions that define or constrain some aspect of the business. Their purpose is to provide a structure and guideline to control or influence the behavior of the organization. Further, business rules represent the business and guide the decisions that are made by the people in the organization." (Margaret Y Chu, "Blissful Data", 2004)

"Vision and mission statements are important, but they are not an organization’s culture; they are its goals. A vision is the ideal they are striving to achieve. There may be a huge gap between the ideal and the current state of actions and behaviors."(Margaret Y Chu, "Blissful Data", 2004)

"What management notices and rewards is the best indication of the organization’s culture." (Margaret Y Chu, "Blissful Data", 2004)

🔢James Serra - Collected Quotes

"A common data model (CDM) is a standardized structure for storing and organizing data that is typically used when building a data warehouse solution. It provides a consistent way to represent data within tables and relationships between tables, making it easy for any system or application to understand the data." (James Serra, "Deciphering Data Architectures", 2024)

"A data architecture defines a high-level architectural approach and concept to follow, outlines a set of technologies to use, and states the flow of data that will be used to build your data solution to capture big data. [...] Data architecture refers to the overall design and organization of data within an information system." (James Serra, "Deciphering Data Architectures", 2024)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules." (James Serra, "Deciphering Data Architectures", 2024)

"At its core, a data fabric is an architectural framework, designed to be employed within one or more domains inside a data mesh. The data mesh, however, is a holistic concept, encompassing technology, strategies, and methodologies." (James Serra, "Deciphering Data Architectures", 2024)

"Be aware that data product is not the same thing as data as a product. Data as a product describes the idea that data owners treat data as a fully contained product that they are responsible for, rather than a byproduct of a process that others manage, and should make the data available to other domains and consumers. Data product refers to the architecture of implementing data as a product." (James Serra, "Deciphering Data Architectures", 2024)

"Choosing the right data ingestion strategy is a significant business decision that partially determines how well your organization can leverage its data for business decision making and operations. The stakes are high; the wrong strategy can lead to poor data quality, performance issues, increased costs, and even regulatory compliance breaches." (James Serra, "Deciphering Data Architectures", 2024)

"Data governance is the overall management of data in an organization. It involves establishing policies and procedures for collecting, storing, securing, transforming, and reporting data." (James Serra, "Deciphering Data Architectures", 2024)

"Delta Lake is a transactional storage software layer that runs on top of an existing data lake and adds RDW-like features that improve the lake’s reliability, security, and performance. Delta Lake itself is not storage. In most cases, it’s easy to turn a data lake into a Delta Lake; all you need to do is specify, when you are storing data to your data lake, that you want to save it in Delta Lake format (as opposed to other formats, like CSV or JSON)." (James Serra, "Deciphering Data Architectures", 2024)

"It is very important to understand that data mesh is a concept, not a technology. It is all about an organizational and cultural shift within companies. The technology used to build a data mesh could follow the modern data warehouse, data fabric, or data lakehouse architecture - or domains could even follow different architectures." (James Serra, "Deciphering Data Architectures", 2024)

"The data fabric architecture is an evolution of the modern data warehouse (MDW) architecture: an advanced layer built onto the MDW to enhance data accessibility, security, discoverability, and availability. [...] The most important aspect of the data fabric philosophy is that a data fabric solution can consume any and all data within the organization." (James Serra, "Deciphering Data Architectures", 2024)

"The goal of any data architecture solution you build should be to make it quick and easy for any end user, no matter what their technical skills are, to query the data and to create reports and dashboards." (James Serra, "Deciphering Data Architectures", 2024)

"The term data lakehouse is a portmanteau (blend) of data lake and data warehouse. [...] The concept of a lakehouse is to get rid of the relational data warehouse and use just one repository, a data lake, in your data architecture." (James Serra, "Deciphering Data Architectures", 2024)

"With all the hype, you would think building a data mesh is the answer to all of these 'problems' with data warehousing. The truth is that while data warehouse projects do fail, it is rarely because they can’t scale enough to handle big data or because the architecture or the technology isn’t capable. Failure is almost always because of problems with the people and/or the process, or that the organization chose the completely wrong technology." (James Serra, "Deciphering Data Architectures", 2024)

25 November 2006

🔢Arkady Maydanchik - Collected Quotes

"Data cleansing is dangerous mainly because data quality problems are usually complex and interrelated. Fixing one problem may create many others in the same or other related data elements." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Data quality program is a collection of initiatives with the common objective of maximizing data quality and minimizing negative impact of the bad data. [...] objective of any data quality program is to ensure that data quality docs not deteriorate during conversion and consolidation projects, Ideally, we would like to do even more and use the opportunity to improve data quality since data cleansing is much easier to perform before conversion than afterwards." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Databases rarely begin their life empty. More often the starting point in their lifecycle is a data conversion from some previously exiting data source. And by a cruel twist of fate, it is usually a rather violent beginning. Data conversion usually takes the better half of new system implementation effort and almost never goes smoothly." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"[...] data conversion is the most difficult part of any system implementation. The error rate in a freshly populated new database is often an order of magnitude above that of the old system from which the data is converted. As a major source of the data problems, data conversion must be treated with the utmost respect it deserves." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Equally critical is to include data quality definition and acceptable quality benchmarks into the conversion specifications. No product design skips quality specifications. including quality metrics and benchmarks. Yet rare data conversion follows suit. As a result, nobody knows how successful the conversion project was until data errors get exposed in the subsequent months and years. The solution is to perform comprehensive data quality assessment of the target data upon conversion and compare the results with pre-defined benchmarks." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"More and more data is exchanged between the systems through real-time (or near real-time) interfaces. As soon as the data enters one database, it triggers procedures necessary to send transactions to Other downstream databases. The advantage is immediate propagation of data to all relevant databases. Data is less likely to be out-of-sync. [...] The basic problem is that data is propagated too fast. There is little time to verify that the data is accurate. At best, the validity of individual attributes is usually checked. Even if a data problem can be identified. there is often nobody at the other end of the line to react. The transaction must be either accepted or rejectcd (whatever the consequences). If data is rejected, it may be lost forever!" (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Much data in databases has a long history. It might have come from old 'legacy' systems or have been changed several times in the past. The usage of data fields and value codes changes over time. The same value in the same field will mean totally different thing in different records. Knowledge or these facts allows experts to use the data properly. Without this knowledge, the data may bc used literally and with sad consequences. The same is about data quality. Data users in the trenches usually know good data from bad and can still use it efficiently. They know where to look and what to check. Without these experts, incorrect data quality assumptions are often made and poor data quality becomes exposed." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The big part of the challenge is that data quality does not improve by itself or as a result of general IT advancements. Over the years, the onus of data quality improvement was placed on modern database technologies and better information systems. [...] In reality, most IT processes affect data quality negatively, Thus, if we do nothing, data quality will continuously deteriorate to the point where the data will become a huge liability." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The corporate data universe consists of numerous databases linked by countless real-time and batch data feeds. The data continuously move about and change. The databases are endlessly redesigned and upgraded, as are the programs responsible for data exchange. The typical result of this dynamic is that information systems get better, while data deteriorates. This is very unfortunate since it is the data quality that determines the intrinsic value of the data to the business and consumers. Information technology serves only as a magnifier for this intrinsic value. Thus, high quality data combined with effective technology is a great asset, but poor quality data combined with effective technology is an equally great liability." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The greatest challenge in data conversion is that actual content and structure of the source data is rarely understood. More often data transformation algorithms rely on the theoretical data definitions and data models, Since this information is usually incomplete, outdated, and incorrect, the converted data look nothing like what is expected. Thus, data quality plummets. The solution is to precede conversion with extensive data profiling and analysis. In fact, data quality after conversion is in direct (or even exponential) relation with the amount of knowledge about actual data you possess. Lack of in-depth analysis will guarantee significant loss of data quality." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"The main tool of a data quality assessment professional is a data quality rule - a constraint that validates a data element or a relationship between several data elements and can be implemented in a computer program. [...] The solution relies on the design and implementation of hundreds and thousands of such data quality rules, and then using them to identify all data inconsistencies. Miraculously, a well-designed and fine-tuned collection of rules will identify a majority Of data errors in a fraction or time compared with manual validation. In fact, it never takes more than a few months to design and implement the rules and produce comprehensive error reports, What is even better, the same setup can be reused over and over again to reassess data quality periodically with minimal effort." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"Using data quality rules brings comprehensive data quality assessment from fantasy world to reality. However, it is by no means simple, and it takes a skillful skipper to navigate through the powerful currents and maelstroms along the way. Considering the volume and structural complexity of a typical database, designing a comprehensive set of data quality rules is a daunting task. The number Of rules will often reach hundreds or even thousands. When some rules are missing, the results of the data quality assessment can be completely jeopardized, Thus the first challenge is to design all rules and make sure that they indeed identify all or most errors." (Arkady Maydanchik, "Data Quality Assessment", 2007)

"While we might attempt to identify and correct most data errors, as well as try to prevent others from entering the database, the data quality will never be perfect. Perfection is practically unattainable in data quality as with the quality of most other products. In truth, it is also unnecessary since at some point improving data quality becomes more expensive than leaving it alone. The more efficient our data quality program, the higher level of quality we will achieve- but never will it reach 100%. However, accepting imperfection is not the same as ignoring it. Knowledge of the data limitations and imperfections can help use the data wisely and thus save time and money, The challenge, of course, is making this knowledge organized and easily accessible to the target users. The solution is a comprehensive integrated data quality meta data warehouse." (Arkady Maydanchik, "Data Quality Assessment", 2007)

24 November 2006

🔢Rupa Mahanti - Collected Quotes

"A data model is a formal organized representation of real-world entities, focused on the definition of an object and its associated attributes and relationships between the entities. Data models should be designed consistently and coherently. They should not only meet requirements, but should also enable data consumers to better understand the data." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Bad data are expensive: my best estimate is that it costs a typical company 20% of revenue. Worse, they dilute trust - who would trust an exciting new insight if it is based on poor data! And worse still, sometimes bad data are simply dangerous; look at the damage brought on by the financial crisis, which had its roots in bad data." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Conformity, or validity, means the data comply with a set of internal or external standards or guidelines or standard data definitions, including metadata definitions. Comparison between the data items and metadata enables measuring the degree of conformity." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data-intensive projects generally involve at least one person who understands all the nuances of the application, process, and source and target data. These are the people who also know about all the abnormalities in the data and the workarounds to deal with them, and are the experts. This is especially true in the case of legacy systems that store and use data in a manner it should not be used. The knowledge is not documented anywhere and is usually inside the minds of the people. When the experts leave, with no one having a true understanding of the data, the data are not used properly and everything goes haywire." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data are collections of facts, such as numbers, words, measurements, observations, or descriptions of real-world objects, phenomena, or events and their attributes. Data are qualitative when they contain descriptive information, and quantitative when they contain numerical information." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data migration generally involves the transfer of data from an existing data source to a new database or to a new schema within the same database. [...] Data migration projects deal with the migration of data from one data structure to another data structure, or data transformed from one platform to another platform with modified data structure." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Data quality is the capability of data to satisfy the stated business, system, and technical requirements of an enterprise. Data quality is an insight into or an evaluation of data’s fitness to serve their purpose in a given context. Data quality is accomplished when a business uses data that are complete, relevant, and timely. The general definition of data quality is 'fitness for use', or more specifically, to what extent some data successfully serve the purposes of the user." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"Lack of a standard process to address business requirements and business process improvements, poorly designed and implemented business processes that result in lack of training, coaching, and communication in the use of the process, and unclear definition of subprocess or process ownership, roles, and responsibilities have an adverse impact on data quality." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"The degree of data quality excellence that should be attained and sustained is driven by the criticality of the data, the business need and the cost and time to achieve the defined degree of data quality." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

"To understand why data quality is important, we need to understand the categorization of data, the current quality of data and how is it different from the quality of manufacturing processes, the business impact of bad data and cost of poor data quality, and possible causes of data quality issues." (Rupa Mahanti, "Data Quality: Dimensions, Measurement, Strategy, Management, and Governance", 2019)

23 November 2006

🔢Neera Bhansali - Collected Quotes

"Are data quality and data governance the same thing? They share the same goal, essentially striving for the same outcome of optimizing data and information results for business purposes. Data governance plays a very important role in achieving high data quality. It deals primarily with orchestrating the efforts of people, processes, objectives, technologies, and lines of business in order to optimize outcomes around enterprise data assets. This includes, among other things, the broader cross-functional oversight of standards, architecture, business processes, business integration, and risk and compliance. Data governance is an organizational structure that oversees the compliance and standards of enterprise data." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Data governance is about putting people in charge of fixing and preventing data issues and using technology to help aid the process. Any time data is synchronized, merged, and exchanged, there have to be ground rules guiding this. Data governance serves as the method to organize the people, processes, and technologies for data-driven programs like data quality; they are a necessary part of any data quality effort." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Data governance is the process of creating and enforcing standards and policies concerning data. [...] The governance process isn't a transient, short-term project. The governance process is a continuing enterprise-focused program." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Having data quality as a focus is a business philosophy that aligns strategy, business culture, company information, and technology in order to manage data to the benefit of the enterprise. Data quality is an elusive subject that can defy measurement and yet be critical enough to derail a single IT project, strategic initiative, or even an entire company." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

"Understanding an organization's current processes and issues is not enough to build an effective data governance program. To gather business, functional, and technical requirements, understanding the future vision of the business or organization is important. This is followed with the development of a visual prototype or logical model, independent of products or technology, to demonstrate the data governance process. This business-driven model results in a definition of enterprise-wide data governance based on key standards and processes. These processes are independent of the applications and of the tools and technologies required to implement them. The business and functional requirements, the discovery of business processes, along with the prototype or model, provide an impetus to address the "hard" issues in the data governance process." (Neera Bhansali, "Data Governance: Creating Value from Information Assets", 2014)

🔢Saurabh Gupta - Collected Quotes

"A data warehouse follows a pre-built static structure to model source data. Any changes at the structural and configuration level must go through a stringent business review process and impact analysis. Data lakes are very agile. Consumption or analytical layer can be modified to fit in the model requirements. Consumers of a data lake are not constant; therefore, schema and modeling lies at the liberty of analysts and scientists." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data in the data lake should never get disposed. Data driven strategy must define steps to version the data and handle deletes and updates from the source systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data governance policies must not enforce constraints on data - Data governance intends to control the level of democracy within the data lake. Its sole purpose of existence is to maintain the quality level through audits, compliance, and timely checks. Data flow, either by its size or quality, must not be constrained through governance norms. [...] Effective data governance elevates confidence in data lake quality and stability, which is a critical factor to data lake success story. Data compliance, data sharing, risk and privacy evaluation, access management, and data security are all factors that impact regulation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake induces accessibility and catalyzes availability. It warrants data discovery platforms to soak the data trends at a horizontal scale and produce visual insights. It largely cuts down the time that goes into data preparation and exhaustive data analysis." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data Lake is a single window snapshot of all enterprise data in its raw format, be it structured, semi-structured, or unstructured. Starting from curating the data ingestion pipeline to the transformation layer for analytical consumption, every aspect of data gets addressed in a data lake ecosystem. It is supposed to hold enormous volumes of data of varied structures." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data lake is an ecosystem for the realization of big data analytics. What makes data lake a huge success is its ability to contain raw data in its native format on a commodity machine and enable a variety of data analytics models to consume data through a unified analytical layer. While the data lake remains highly agile and data-centric, the data governance council governs the data privacy norms, data exchange policies, and the ensures quality and reliability of data lake." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data swamp, on the other hand, presents the devil side of a lake. A data lake in a state of anarchy is nothing but turns into a data swamp. It lacks stable data governance practices, lacks metadata management, and plays weak on ingestion framework. Uncontrolled and untracked access to source data may produce duplicate copies of data and impose pressure on storage systems." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Data warehousing, as we are aware, is the traditional approach of consolidating data from multiple source systems and combining into one store that would serve as the source for analytical and business intelligence reporting. The concept of data warehousing resolved the problems of data heterogeneity and low-level integration. In terms of objectives, a data lake is no different from a data warehouse. Both are primary advocates of terms like 'single source of truth' and 'central data repository'." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"Metadata is the key to effective data governance. Metadata in this context is the data that defines the structure and attributes of data. This could mean data types, data privacy attributes, scale, and precision. In general, quality of data is directly proportional to the amount and depth of metadata provided. Without metadata, consumers will have to depend on other sources and mechanisms." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

"The quality of data that flows within a data pipeline is as important as the functionality of the pipeline. If the data that flows within the pipeline is not a valid representation of the source data set(s), the pipeline doesn’t serve any real purpose. It’s very important to incorporate data quality checks within different phases of the pipeline. These checks should verify the correctness of data at every phase of the pipeline. There should be clear isolation between checks at different parts of the pipeline. The checks include checks like row count, structure, and data type validation." (Saurabh Gupta et al, "Practical Enterprise Data Lake Insights", 2018)

22 November 2006

🎯William H Inmon - Collected Quotes

"There are four levels of data in the architected environment - the operational level, the atomic (or the data warehouse) level, the departmental (or the data mart) level, and the individual level. These different levels of data are the basis of a larger architecture called the corporate information factory (CIF). The operational level of data holds application-oriented primitive data only and primarily serves the high-performance transaction-processing community. The data-warehouse level of data holds integrated, historical primitive data that cannot be updated. In addition, some derived data is found there. The departmental or data mart level of data contains derived data almost exclusively. The departmental or data mart level of data is shaped by end-user requirements into a form specifically suited to the needs of the department. And the individual level of data is where much heuristic analysis is done." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"To interpret and understand information over time, a whole new dimension of context is required. While content of information remains important, the comparison and understanding of information over time mandates that context be an equal partner to content. And in years past, context has been an undiscovered, unexplored dimension of information." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"When management receives the conflicting reports, it is forced to make decisions based on politics and personalities because neither source is more or less credible. This is an example of the crisis of data credibility in the naturally evolving architecture." (William H Inmon, "Building the Data Warehouse" 4th Ed., 2005)

"An interesting aspect of KPIs are that they change over time. At one moment in time the organization is interested in profitability. There will be one set of KPIs that measure profitability. At another moment in time the organization is interested in market share. There will be another set of KPIs that measure market share. As the focus of the corporation changes over time, so do the KPIs that measure that focus." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"Both the ODS and a data warehouse contain subject-oriented, integrated information. In that regard they are similar. But an ODS contains data that can be individually updated, deleted, or added. And a data warehouse contains nonvolatile data. A data warehouse contains snapshots of data. Once the snapshot is taken, the data in the data warehouse does not change. So when it comes to volatility, a data warehouse and an ODS are very different." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"In general, analytic processing is known as 'heuristic' processing. In heuristic processing the requirements for analysis are discovered by the results of the current iteration of processing. […] In heuristic processing you start with some requirements. You build a system to analyze those requirements. Then, after you have results, you sit back and rethink your requirements after you have had time to reflect on the results that have been achieved. You then restate the requirements and redevelop and reanalyze again. Each time you go through the redevelopment exercise is called an 'iteration'. You continue the process of building different iterations of processing until such time as you achieve the results that satisfy the organization that is sponsoring the exercise." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There are, however, many problems with independent data marts. Independent data marts: (1) Do not have data that can be reconciled with other data marts (2) Require their own independent integration of raw data (3) Do not provide a foundation that can be built on whenever there are future analytical needs." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"There is then a real mismatch between the volume of data and the business value of data. For people who are examining repetitive data and hoping to find massive business value there, there is most likely disappointment in their future. But for people looking for business value in nonrepetitive data, there is a lot to look forward to." (William H Inmon & Daniel Linstedt, "Data Architecture: A Primer for the Data Scientist: Big Data, Data Warehouse and Data Vault", 2015)

"A defining characteristic of the data lakehouse architecture is allowing direct access to data as files while retaining the valuable properties of a data warehouse. Just do both!" (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"At first, we threw all of this data into a pit called the 'data lake'. But we soon discovered that merely throwing data into a pit was a pointless exercise. To be useful - to be analyzed - data needed to (1) be related to each other and (2) have its analytical infrastructure carefully arranged and made available to the end user. Unless we meet these two conditions, the data lake turns into a swamp, and swamps start to smell after a while. [...] In a data swamp, data just sits there are no one uses it. In the data swamp, data just rots over time." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data privacy, data confidentiality, and data protection are sometimes incorrectly diluted with security. For example, data privacy is related to, but not the same as, data security. Data security is concerned with assuring the confidentiality, integrity, and availability of data. Data privacy focuses on how and to what extent businesses may collect and process information about individuals." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Data visualization adds credibility to any message. [...] Data visualizations are incredibly cold mediums because they require a lot of interpretation and participation from the audience. While boring numbers are authoritative, data visualization is inclusive. [...] Data visualizations absorb the viewer in the chart and communicate the author’s credibility through active participation. Like a good teacher, they walk the reader through the thought process and convince him/her effortlessly."

"Data visualization‘s key responsibilities and challenges include the obligation to earn your audience’s attention - do not take it for granted." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"In general, a data or data set contains its sensitivity or controversial nature only if it is linked or related to an individual’s personal information. Else an isolated, abandoned, or unrelated sensitive or controversial attribute has no significance."

"It is dangerous to do an analysis and merge data with very different quality profiles. As a general rule, the veracity of merged data is only as good as the worst data that has been merged. [...] Not knowing the quality of the data being analyzed jeopardizes the entire analysis." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Once you combine the data lake along with analytical infrastructure, the entire infrastructure can be called a data lakehouse. [...] The data lake without the analytical infrastructure simply becomes a data swamp. And a data swamp does no one any good." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"The data lakehouse architecture presents an opportunity comparable to the one seen during the early years of the data warehouse market. The unique ability of the lakehouse to manage data in an open environment, blend all varieties of data from all parts of the enterprise, and combine the data science focus of the data lake with the end user analytics of the data warehouse will unlock incredible value for organizations. [...] "The lakehouse architecture equally makes it natural to manage and apply models where the data lives." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"Raw data without appropriate visualization is like dumped construction raw materials at a building construction site. The finished house is the actual visuals created from those data like raw materials." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

"With the data lakehouse, it is possible to achieve a level of analytics and machine learning that is not feasible or possible any other way. But like all architectural structures, the data lakehouse requires an understanding of architecture and an ability to plan and create a blueprint." (Bill Inmon et al, "Building the Data Lakehouse", 2021)

21 November 2006

🔢Angelika Klidas - Collected Quotes

"Also, remember that data literacy is not just a set of technical skills. There is an equal need and weight for soft skills and business skills. This can be misleading for some technical resources within an organization, as those technical resources may believe they are data literate by default as they are data architects or data analysts. They have the existing technical skills, but maybe they do not have any deep proficiencies in other skills such as communicating with data, challenging assumptions, and mitigating bias, or perhaps they do not have an open mindset to be open to different perspectives." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Critical thinking is part of data literacy; it is the ability to question the logic of arguments or assumptions and examine evidence in order to determine whether a claim is true, false, or uncertain." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Current decision-making in business suffers from insight gaps. Organizations invest in data and analytics, hoping that will provide them with insights that they can use to make decisions, but in reality, there are many challenges and obstacles that get in the way of that process. One of the biggest challenges is that these organizations tend to focus on technology and hard skills only. They are definitely important, but you will not automatically get insights and better decisions with hard skills alone. Using data to make better data-informed decisions requires not only hard skills but also soft skills as well as mindsets." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Data literacy is not achieved by mastering a uniform set of competencies that applies to everyone. Those that are relevant to each individual can vary significantly depending on how they interact with data and which part of the data process they are involved in." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Decision-makers are constantly provided data in the form of numbers or insights, or similar. The challenge is that we tend to believe every number or piece of data we hear, especially when it comes from a trusted source. However, even if the source is trusted and the data is correct, insights from the data are created when we put it in context and apply meaning to it. This means that we may have put incorrect meaning to the data and then made decisions based on that, which is not ideal. This is why anyone involved in the process needs to have the skills to think critically about the data, to try to understand the context, and to understand the complexity of the situation where the answer is not limited to just one specific thing. Critical thinking allows individuals to assess limitations of what was presented, as well as mitigate any cognitive bias that they may have." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Data literacy is something that affects everyone and every organization. The more people who can debate, analyze, work with, and use data in their daily roles, the better data-informed decision-making will be." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"It is also important to note that data literacy is not about expecting to or becoming an expert; rather, it is a journey that must begin somewhere." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"Organizations must have a plan and vision for data literacy, which they then communicate to all employees. They will need to develop and foster a culture that embraces data literacy and data-informed decisions. They will need to provide employees with access to various learning content specific to data literacy. Along their journey, they will need to make sure they benchmark and measure progress toward their vision and celebrate successes along the way." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

"People can get confused or experience anxiety when they have to work with data and analytics. As we data literacy geeks say, people shouldn’t be pushed to work with data and analytics - they should do this because they want to. [...] Visualizations that are not understood present another risk. To be successful with data and analytics, we need visualizations that are presented in a clear meaningful way. If we do not take care of the data literacy levels within an organization, we might lose our public. Therefore, it is necessary to think of the risk of overwhelming our readers/viewers." (Angelika Klidas & Kevin Hanegan, "Data Literacy in Practice", 2022)

20 November 2006

🔢Zach Gemignani - Collected Quotes

"A culture of data fluency needs to be built on a shared understanding of the data sources, data analysis, key metrics, and data products. It requires employees to be on the same page about how data is used and why it is important." (Zach Gemignani et al, "Data Fluency", 2014)

"Any presentation of data, whether a simple calculated metric or a complex predictive model, is going to have a set of assumptions and choices that the producer has made to get to the output. The more that these can be made explicit, the more the audience of the data will be open to accepting the message offered by the presenter." (Zach Gemignani et al, "Data Fluency", 2014)

"Broadly defined, data means events that are captured and made available for analysis. A data source is a consistent record of these events. And a data product translates this record of events into something that can easily be understood. [...] Data products can be organized and characterized by a series of continuums that describe the nature of the data and how it is presented." (Zach Gemignani et al, "Data Fluency", 2014)

"[...] communicating with data is less often about telling a specific story and more like starting a guided conversation. It is a dialogue with the audience rather than a monologue. While some data presentations may share the linear approach of a traditional story, other data products (analytical tools, in particular) give audiences the flexibility for exploration. In our experience, the best data products combine a little of both: a clear sense of direction defined by the author with the ability for audiences to focus on the information that is most relevant to them. The attributes of the traditional story approach combined with the self-exploration approach leads to the guided safari analogy." (Zach Gemignani et al, "Data Fluency", 2014)

"Creating a data fluent organization doesn’t just happen. It starts with people who love using data as a tool to improve their job performance - people who have learned to converse with others in the language of data. It needs people who expect and demand better, more useful data products from themselves and others. It starts with you." (Zach Gemignani et al, "Data Fluency", 2014)

"Data alone isn’t valuable. In fact, it can be expensive in time and resources to manage and maintain. The analysis of this data is closer to something that is valuable. A clearly communicated analysis starts to transform a reflection of the world into knowledge in the minds of people. Even so, knowledge alone does not make your organization better. It is the decisions and actions of people - based on this data-sourced knowledge - that is the goal. But these decisions are seldom made in a vacuum. In most organizations, decisions are a collaborative, social experience. People come together to discuss options, review their knowledge of the situation, and arrive at a path to go down. Herein is one of the great powers of effective data products: They can shape and guide these discussions. Conclusions are seldom clear-cut, even when there is data to support a direction." (Zach Gemignani et al, "Data Fluency", 2014)

"Data captures actions and characteristics of the real world and transforms them into something that can be examined and explored after the fact." (Zach Gemignani et al, "Data Fluency", 2014)

"Data visualizations are designed to emphasize patterns and deviations in data. In fact, each specific chart type is well suited to highlighting particular forms of insight. A skilled author of data products will choose the right visualization to emphasize a message. The data, chart, and supporting descriptions should work in harmony to point out what is interesting. The reader simply goes along for the ride." (Zach Gemignani et al, "Data Fluency", 2014)

"Goals associated with a few, well-understood key metrics is a powerful combination. For both internal and external stakeholders, there is a strong alignment between organization mission, vision, goals, and tracking of progress. The efforts of everyone can be directed at these measurable goals, and people will focus on the processes that can impact these metrics." (Zach Gemignani et al, "Data Fluency", 2014)

"In fact, the analogy to storytelling is limited when applied to communicating with data. Data visualization has fundamental characteristics missing from traditional storytelling. For example, interactive data visualizations let audiences explore information to find insights that resonate with them. Visualizations take shape based to a large extent on the underlying data. And as this data changes, the emphasis and message of the visualization is likely to change." (Zach Gemignani et al, "Data Fluency", 2014)

"Metrics can serve two purposes: identifying problems and measuring performance. When the goal is to identify problems and pinpoint areas of operational inefficiency and ineffectiveness, defining the right metric requires a bit of detective work. It requires you to uncover the data residue of a problem and to determine what evidence can be found and how exactly it shows up. When the goal is to measure performance, the right success metrics focus on measures that can be controlled and where improvement in the metric is an unambiguously good thing." (Zach Gemignani et al, "Data Fluency", 2014)

"Most discussions of decision making assume that only senior executives make decisions or that only senior executives' decisions matter. This is a dangerous mistake. Decisions are made at every level of the organization, beginning with individual professional contributors and frontline supervisors. These apparently low-level decisions are extremely important in a knowledge-based organization." (Zach Gemignani et al, "Data Fluency", 2014)

"The most common mistake in ineffective data products is an inability to make difficult decisions about what information is most important. [...] Often information gets included in data products for reasons that are superfluous to the purpose, audience, and message - reasons that cater the product to someone influential or use information that has been included historically. The bar should be higher." (Zach Gemignani et al, "Data Fluency", 2014)

"We have an inbuilt ability to manipulate visual metaphors in ways we cannot do with the things and concepts they stand for—to use them as malleable, conceptual Tetris blocks or modeling clay that we can more easily squeeze, stack, and reorder. And then - whammo! - a pattern emerges, and we’ve arrived someplace we would never have gotten by any other means." (Zach Gemignani et al, "Data Fluency", 2014)

19 November 2006

🎯Stephen Few - Collected Quotes

"An effective dashboard is the product not of cute gauges, meters, and traffic lights, but rather of informed design: more science than art, more simplicity than dazzle. It is, above all else, about communication." (Stephen Few, "Information Dashboard Design", 2006)

"Most dashboards fail to communicate efficiently and effectively, not because of inadequate technology (at least not primarily), but because of poorly designed implementations. No matter how great the technology, a dashboard's success as a medium of communication is a product of design, a result of a display that speaks clearly and immediately. Dashboards can tap into the tremendous power of visual perception to communicate, but only if those who implement them understand visual perception and apply that understanding through design principles and practices that are aligned with the way people see and think." (Stephen Few, "Information Dashboard Design", 2006)

"A signal is a useful message that resides in data. Data that isn’t useful is noise. […] When data is expressed visually, noise can exist not only as data that doesn’t inform but also as meaningless non-data elements of the display (e.g. irrelevant attributes, such as a third dimension of depth in bars, color variation that has no significance, and artificial light and shadow effects)." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Apart from the secondary benefits of digital data, which are many, such as faster and cheaper information collection and distribution, the primary benefit is better decision making based on evidence. Despite our intellectual powers, when we allow our minds to become disconnected from reliable information about the world, we tend to screw up and make bad decisions." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Data contain descriptions. Some are true, some are not. Some are useful, most are not. Skillful use of data requires that we learn to pick out the pieces that are true and useful. [...] To find signals in data, we must learn to reduce the noise - not just the noise that resides in the data, but also the noise that resides in us. It is nearly impossible for noisy minds to perceive anything but noise in data." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Signals always point to something. In this sense, a signal is not a thing but a relationship. Data becomes useful knowledge of something that matters when it builds a bridge between a question and an answer. This connection is the signal." (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"The term data, unlike the related terms facts and evidence, does not connote truth. Data is descriptive, but data can be erroneous. We tend to distinguish data from information. Data is a primitive or atomic state (as in ‘raw data’). It becomes information only when it is presented in context, in a way that informs. This progression from data to information is not the only direction in which the relationship flows, however; information can also be broken down into pieces, stripped of context, and stored as data. This is the case with most of the data that’s stored in computer systems. Data that’s collected and stored directly by machines, such as sensors, becomes information only when it’s reconnected to its context."  (Stephen Few, "Signal: Understanding What Matters in a World of Noise", 2015)

"Everything that informs us of something useful that we didn't already know is a potential signal. If it matters and deserves a response, its potential is actualized." (Stephen Few)

"One of the great purposes of education today is to help us filter the data, to reduce it to what's true and useful." (Stephen Few)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.