11 November 2006

🎯🏭🗒️Sonia Mezzetta - Collected Quotes

"A data architecture needs to have the robustness and ability to support multiple data management and operational models to provide the necessary business value and agility to support an enterprise’s business strategy and capabilities." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"A data strategy must align with the business goals and overall framework of how data will be used and managed within an organization. It needs to include standards for how data will be discovered, integrated, accessed, shared, and protected. It needs to address how data will meet regulatory compliance policies, Master Data Management, and data democratization. There needs to be an assurance that both data and metadata have a quality control framework in place to achieve data trust. A data strategy needs to have a clear path on how an organization will accomplish data monetization." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"A data strategy is a living document that needs to be continuously updated to align with business goals. It should have a clear maintenance process with frequent reviews and identification of authors and stakeholders that will contribute to the data strategy. This also includes the handling of exceptions to a data strategy process for any one-off decisions in special circumstances. A data strategy document must always be easily assessable, to the point, and understandable." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Apply DataOps principles to the development and delivery of data. DataOps is a best practice framework that accelerates the development of data and quality across its entire life cycle with high efficiency and quality. This is especially important when integrating data across distributed complex systems and environments." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Automated data orchestration is a key DataOps principle. An example of orchestration can take ETL jobs and a Python script to ingest and transform data based on a specific sequence from different source systems. It can handle the versioning of data to avoid breaking existing data consumption pipelines already in place." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric’s building blocks represent groupings of different components and characteristics. They are high-level blocks that describe a package of capabilities that address specific business needs. The building blocks are Data Governance and its knowledge layer, Data Integration, and Self-Service." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a composable architecture made up of different tools, technologies, and systems. It has an active metadata and event-driven design that automates Data Integration while achieving interoperability. Data Governance, Data Privacy, Data Protection, and Data Security are paramount to its design and to enable Self-Service data sharing. The following figure summarizes the different characteristics that constitute a Data Fabric design." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric focuses on Self-Service data access via active metadata leveraging a composable set of tools and technologies. It offers the ability to discover, understand, and access data across hybrid and multi-cloud data landscapes with automation and Data Governance. It is primarily process and technology centric with flexibility in supporting diverse organizational models. On the other hand, Data Mesh is organizationally and process driven. It requires a technical implementation approach to execute its design. Data Mesh is at a higher level and Data Fabric is at a lower level. Data Fabric is capable of fulfilling Data Mesh’s key principles." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a distributed and composable architecture that is metadata and event driven. It’s use case agnostic and excels in managing and governing distributed data. It integrates dispersed data with automation, strong Data Governance, protection, and security. Data Fabric focuses on the Self-Service delivery of governed data." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric is a distributed data architecture that connects scattered data across tools and systems with the objective of providing governed access to fit-for-purpose data at speed. Data Fabric focuses on Data Governance, Data Integration, and Self-Service data sharing. It leverages a sophisticated active metadata layer that captures knowledge derived from data and its operations, data relationships, and business context. Data Fabric continuously analyzes data management activities to recommend value-driven improvements. Data Fabric works with both centralized and decentralized data systems and supports diverse operational models." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"[Data Fabric] is not a single technology, such as data virtualization. […] It is not a single tool like a data catalog and it doesn’t have to be a single data storage system like a data warehouse. It represents a diverse set of tools, technologies, and storage systems that work together in a connected ecosystem via a distributed data architecture, with active metadata as the glue. It doesn’t just support centralized data management but also federated and decentralized data management. It excels in connecting distributed data. Data Fabric is not the same as Data Mesh. They are different data architectures that tackle the complexities of distributed data management using different but complementary approaches." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Fabric supports a federated, decentralized, or centralized organization. To participate in Data Fabric, metadata is contributed in an automated manner and knowledge is populated from it to propel data management. Data Fabric is different from a Data Mesh design in that it supports decentralized, federated, and centralized organizations. Data Fabric’s objectives are to help an organization to evolve to a more mature level of data management by leveraging active metadata, which is a core prerequisite." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Data Mesh is a design concept based on federated data and business domains. It applies product management thinking to data management with the outcome being Data Products. It’s technology agnostic and calls for a domain-centric organization with federated Data Governance." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Establish an organization’s data maturity level and progress toward ongoing improvement. An organization needs to first understand what its current data maturity level is to determine the areas of improvement to create a forward-looking plan. A data maturity assessment offers a position on the current data maturity that serves as an indicator of the health of an organization. A data maturity assessment can be used as a tool to drive continuous improvement by measuring progress. The key thing here is to always strive for continuous improvement to achieve success." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"I emphasize this point as there are views in the industry that Data Fabric is a centralized storage architecture, which is not the case from my point of view. A Data Fabric architecture is driven by the needs and direction of the business architecture." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Manage data as a strategic asset that evolves into a data product. The premise here is to stop managing data as a byproduct and create an ecosystem that manages data as a valuable strategic asset that can evolve into a data product. Data producers are accountable for managing the life cycle of data from creation to end of life and ensuring it creates business value along the way for data consumers. This requires data that is governed, trusted, protected, secure, and easily accessible. Move data from technical data assets to Data Products by operationalizing data for high scale sharing." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

"Where Data Mesh differs from Data Fabric is that it has fixed requirements for the Self-Service platform focused on organizing and managing Data Products by business domain. Another difference is Data Fabric supports managing data as an asset and as a product. A Data Product can be composed of assets that have been governed and managed in a Data Fabric architecture. Data Fabric does not have these fixed requirements, although it inherently supports isolating data and Data Governance enforcement via metadata by business domain. You can think of a Data Mesh Self-Service data platform as supporting separate, independent companies (business domains), although the key criteria are that it does not create data silos and attains data sharing across these companies in a secure, quick, and easy manner. In Data Mesh, Data Products are created and managed by federated business domains and a data platform requires capabilities that enable data and policy federation. This is where a Data Fabric solution can also address Data Mesh’s requirements." (Sonia Mezzetta, "Principles of Data Fabric: Become a data-driven organization by implementing Data Fabric solutions efficiently", 2023)

🔢Charles D Tupper - Collected Quotes

"An architecture is the response to the integrated collections of models and views within the problem area being examined." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"An architecture represents combined perspectives in a structured format that is easily viewable and explains the context of the area being analyzed to all those viewing it." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Analyzing and defining an area must be done prior to doing any activity within that area. Without understanding all that must be done, incorrect assumptions can be reached. Short-term vision may handicap future development. Inappropriate scoping may produce artificial boundaries where there should be none." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Data architecture allows strategic development of flexible modular designs by insulating the data from the business as well as the technology process." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Methodologies provide guidelines for the application development process. They specify analysis and design techniques as well as the stages in which they occur. They also develop event sequencing. Lastly, they specify milestones and work products that must be created and the appropriate documentation that should be generated." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Data architectures are the heart of business functionality. Given the proper data architecture, all possible functions can be completed within the enterprise easily and expeditiously." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Processes that use data change far more frequently than the data structures themselves." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"The enterprise architecture delineates the data according to the inherent structure within the organization rather than by organizational function or use. In this manner it makes the data dependent on business objects but independent of business processes." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

"Using architecture leads to foundational stability, not rigidity. As long as the appropriate characteristics are in place to ensure positive architectural evolution, the architecture will remain a living construct. Well-developed architectures are frameworks that evolve as the business evolves." (Charles D Tupper, "Data Architecture: From Zen to Reality", 2011)

10 November 2006

🔢Pearl Zhu - Collected Quotes

"A good strategy tells you not only what specifically needs to accomplish, but WHY." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Agile is more a 'direction', than an 'end'. Transforming to Agile culture means the business knows the direction they want to go on." (Pearl Zhu, "Digital Agility: The Rocky Road from Doing Agile to Being Agile", 2016)

"Breaking rules is indeed an important part of creativity. Innovation needs a level of guidance." (Pearl Zhu,  "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Good governance is less about structure and rules than being focused, effective and accountable." (Pearl Zhu,  "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Governance is not about maximization, but about optimization." (Pearl Zhu, "Digitizing Boardroom: The Multifaceted Aspects of Digital Ready Boards", 2016)

"Selecting the right measure and measuring things right are both art and science. And KPIs influence management behavior as well as business culture." (Pearl Zhu, "CIO Master: Unleash the Digital Potential of It", 2016)

"Setting the right priorities or having superior time management skill means knowing the difference between 'must have', and 'nice to have'." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"The art of questioning is to ignite innovative thinking; the science of questioning is to frame system thinking, with the progressive pursuit of better solutions." (Pearl Zhu, "Leadership Master: Five Digital Trends to Leap Leadership Maturity", 2016)

"The 'result' of micromanagement is perhaps tangible in the short run, but more often causes damage for the long term." (Pearl Zhu, "Change Insight: Change as an Ongoing Capability to Fuel Digital Transformation", 2016)

"Using two-dimensional lenses to perceive the multi-faceted world can limit your ability to observe the world more objectively." (Pearl Zhu, "Thinkingaire: 100 Game Changing Digital Mindsets to Compete for the Future", 2016)

"A performance dashboard is a practical tool to improve management effectiveness and efficiency, not just a pretty retrospective picture in an annual report." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"A 'roadmap' is simply a plan for moving or transitioning, from one state to another. A roadmap provides the direction to the future." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A well-defined set of digital rules are not for limiting innovation, but for setting the frame of relevance and guide through changes and digital transformation." (Pearl Zhu, "100 Digital Rules: Setting Guidelines to Explore Digital New Normal", 2017)

"Building a comprehensive problem-solving framework is about leveraging a structured methodology that allows you to frame problems systematically and solve problems creatively." (Pearl Zhu, "Problem Solving Master: Frame Problems Systematically and Solve Problem Creatively", 2017)

"Decision makers with emotional excellence have the ability to dispassionately examine alternatives via fact finding, analysis, structured planning, objective evaluations, and comparison." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision making is an art only until the person understands the science." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Decision maturity is to ensure the right decisions have been made by the right people at the right time to solve the right problems." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"Digital synchronization and strategic alignment occur when all parts of the choir sing their respective parts in harmony to achieve a higher purpose." (Pearl Zhu, "12 CIO Personas: The Digital CIO's Situational Leadership Practices", 2017)

"Digitalization implies the full-scale changes in the way business is conducted so that it’s a multi-dimensional planning and orchestration." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"Framing the right problem is equally or even more important than solving it." (Pearl Zhu, “Change, Creativity and Problem-Solving”, 2017)

"Most organizations fail to manage performance effectively because they fail to look into the system holistically." (Pearl Zhu, "Performance Master: Take a Holistic Approach to Unlock Digital Performance", 2017)

"The science of decision-making is to make sure there is an effective decision process in place." (Pearl Zhu, "Decision Master: The Art and Science of Decision Making", 2017)

"It is important to strengthen the weakest link, to ensure all important business elements integrated and knitted into ongoing organizational capabilities and unique business competency." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"The simplicity and the complexity are just the opposite ends of the same spectrum." (Pearl Zhu, "Digital Gaps: Bridging Multiple Gaps to Run Cohesive Digital Business", 2017)

"We are moving slowly into an era where Big Data is the starting point, not the end." (Pearl Zhu, "Digital Master: Debunk the Myths of Enterprise Digital Maturity", 2017)

"You can’t improve what you are not managing, you can’t manage what you are not measuring, and you can’t measure what you are not focusing." (Pearl Zhu, "Digital Capability: Building Lego Like Capability Into Business Competency", 2017)

"A business ecosystem is just like the natural ecosystem; first, needs to be understood, then, needs to be well planned, and also needs to be thoughtfully renewed as well." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"A seamless digital transformation requires a vision to convey 'WHY', a solid strategy to clarify 'WHAT', and a technical specification to articulate 'HOW' you want to transform radically." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"An organizational structure carries inherent capabilities as to what can be achieved within its frame." (Pearl Zhu, Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight, 2018)

"Change Management is a journey, not just a one-time project, riding ahead of change curve takes both strategy and methodology." (Pearl Zhu, "The Change Agent CIO: The CIO’s Dynamic Role of Leading Digitalization", 2018)

"Coherence improves business flow; resilience makes business robust and anti-fragile." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"Going digital is more like a journey than a destination. Predicting and preparing the next level of digitalization is an iterative learning and doing continuum." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Ideally, the two structures - hierarchy, and relationship structure wrap around each other to ensure responsibility, to keep information flow and the creation of power." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

"Taking the multidimensional hybrid models for going digital is all about how to strike the right balance of reaping quick wins and focusing on the long-term strategic goals." (Pearl Zhu, "Digital Hybridity: How to Strike the Right Balance for Digital Paradigm Shift", 2018)

"The most effective digital workplace is one where collaboration and sharing are the norms." (Pearl Zhu, "Digital Maturity: Take a Journey of a Thousand Miles from Functioning to Delight", 2018)

🎯Rukmani Gopalan - Collected Quotes

"A cloud data warehouse is an enterprise data warehouse offered as a managed service (PaaS) on public clouds with optimized integrations for data ingestion, analytics processing, and BI analytics." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Churn refers to rapidly changing the activities and your plan when they are in flux - this is disruptive to your organization and slows your progress. Change refers to an inevitable movement in requirements and helps you plan for and execute this movement thoughtfully." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Data mesh relies on a distributed architecture that consists of domains. Each domain is an independent unit of data and its associated storage and compute components. When an organization contains various product units, each with its own data needs, each product team owns a domain that is operated and governed independently by the product team. […] Data mesh has a unique value proposition, not just offering scale of infrastructure and scenarios but also helping shift the organization’s culture around data," (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"If there is one thing I strongly recommend, it is to invest in a cloud data lake and start collecting and processing data that you believe is useful to your organization today." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"It’s true that data and data strategy are critical to the organization; however, it’s also true that data by itself is a means to the end of business or customer impact unless you’re a provider of data or data-related services." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Plan for customer impact, and prepare to learn and fine-tune as you progress. Make choices based on the impact they offer to customers, and stay consistent in your implementation while keeping open-minded for learnings. Especially if you are an early adopter of a technology, you can help develop the technology with the provider and thus get ample support from the technology provider in return. Similarly, identify highly motivated early adopters within your customer base and offer to develop your solution with them." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"Real-time stream processing refers to the ingestion, processing, and consumption of data with a specific focus on speed, targeting near real time - that is, almost instantaneous results. […] Real-time stream processing pipelines involve data that is arriving from its source at very high velocity; in other words, it is data that is streaming into the system, just like rain or a waterfall." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"The lakehouse provides a key advantage over the modern data warehouse by eliminating the need to have two places to store the same data. [...] Data lakehouses offer the key benefit of being able to run performant BI/SQL-based scenarios directly on the data lake, right alongside the other exploratory data science and machine learning scenarios." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"The promise of a cloud data lake architecture lies in the boundless diversity of scenarios that it enables." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022) 

"The very simple definition of cloud data lake storage is a service available as a cloud offering that can serve as a central repository for all kinds of data (structured, unstructured, and semistructured) and can support data and transactions at a large scale." (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

"When it comes to data lakes, some things usually stay constant: the storage and processing patterns. Change could come in any of the following ways: Adding new components and processing or consumption patterns to respond to new requirements. […] Optimizing existing architecture for better cost or performance" (Rukmani Gopalan, "The Cloud Data Lake: A Guide to Building Robust Cloud Data Architecture", 2022)

08 November 2006

🔢Robert Hawker - Collected Quotes

"[...] a conceptual data model [...] is system-agnostic and is a diagrammatic business representation of how different types of data are associated with one another in the organization." (Robert Hawker, "Practical Data Quality", 2023)

"A data quality rule is logic that is applied to each row of a dataset, which can determine whether the row of data is correct or incorrect. Correct data is deemed to have passed the rule, and incorrect data is deemed to have failed the rule – hence, the term failed data [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Correction of data in the secondary source is not recommended. However, it is important to recognize that sometimes, secondary source fixes are required." (Robert Hawker, "Practical Data Quality", 2023)

"Data discovery is the process where an organization obtains an understanding of which data matters the most and identifies challenges with that data. The outcome of data discovery is that the scope of a data quality initiative should be clear and data quality rules can be defined." (Robert Hawker, "Practical Data Quality", 2023)

"Data profiling assesses a set of data and provides information on the values, the length of strings, the level of completeness, and the distribution patterns of each column." (Robert Hawker, "Practical Data Quality", 2023)

"Data quality rules are only effective if they are tightly scoped. Generic rules tend to produce a lot of unwanted failed records, and business users start to ignore the results. Once business users lose faith in what they see from a data quality tool, it is hard to restore engagement." (Robert Hawker, "Practical Data Quality", 2023)

"Every data quality initiative is different, and senior stakeholders at different organizations will have different needs." (Robert Hawker, "Practical Data Quality", 2023)

"If an organization had a single overall data quality key performance indicator (KPI), then it might be appropriate to put a greater weighting on those rules which would impact regulatory compliance. A lack of regulatory compliance is a risk to the very existence of organizations like these, and therefore, a greater weighting might be needed." (Robert Hawker, "Practical Data Quality", 2023)

"It rarely makes sense to aim for what people might consider perfect data (every record is complete, accurate, and up to date). The investment required is usually prohibitive, and the gains made for the last 1% of data quality improvement effort become far too marginal." (Robert Hawker, "Practical Data Quality", 2023)

"In truth, no one knows how much bad data quality costs a company – even companies with mature data quality initiatives in place, who are measuring hundreds of data points for their quality struggle to accurately measure quantitative impact. This is often a deal-breaker for senior leaders when trying to get approval for a budget for data quality work. Data quality initiatives often seek substantial budgets and are up against projects with more tangible benefits." (Robert Hawker, "Practical Data Quality", 2023)

"Momentum is important in data quality initiatives. If an issue is problematic, even where the priority is high, it can be better to move on to an issue that can be progressed efficiently." (Robert Hawker, "Practical Data Quality", 2023)

"Most data quality issues will re-occur if the root cause is not fully understood [...]" (Robert Hawker, "Practical Data Quality", 2023)

"Organizations will always only have a limited amount of resources available to remediate data. It will almost certainly not be possible to tackle all the issues at the same time. Therefore, prioritization is key to ensuring that the most value is generated from the available resources." (Robert Hawker, "Practical Data Quality", 2023)

"Successful organizations try to put a holistic data culture in place. Everyone is educated on the basics of looking after data and the importance of having good data. They consider what they have learned when performing their day-to-day tasks. This is often referred to as the promotion of good data literacy." (Robert Hawker, "Practical Data Quality", 2023)

"The biggest mistake that can be made in a data quality initiative is focusing on the wrong data. If you fix data that does not impact a critical business process or drive important decisions, your initiative simply will not make the difference that you want it to." (Robert Hawker, "Practical Data Quality", 2023)

"The data should be monitored in the source, it should be corrected in the source, and it should then feed the secondary source(s) with high-quality data that can be used without workarounds. The reduction in workarounds will make the data engineers, scientists, and data visualization specialists much more productive." (Robert Hawker, "Practical Data Quality", 2023)

"The level of data quality in an organization is the extent to which data can be used for its intended purposes."  (Robert Hawker, "Practical Data Quality", 2023)

"Start with a business strategy. Too many organizations start their data quality initiative by looking at the details of the data and trying to see 'what is wrong with it'. The right approach is to understand what the business is trying to achieve and to work out where data issues might impede this. It ensures that data quality work will be truly impactful." (Robert Hawker, "Practical Data Quality", 2023)

04 November 2006

🔢Dhanurjay "DJ" Patil - Collected Quotes

"[...] a good definition of a data product is a product that facilitates an end goal through the use of data. It’s tempting to think of a data product purely as a data problem. After all, there’s nothing more fun than throwing a lot of technical expertise and fancy algorithmic work at a difficult problem." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"As data scientists, we prefer to interact with the raw data. We know how to import it, transform it, mash it up with other data sources, and visualize it. Most of your customers can’t do that. One of the biggest challenges of developing a data product is figuring out how to give data back to the user. Giving back too much data in a way that’s overwhelming and paralyzing is 'data vomit'. It’s natural to build the product that you would want, but it’s very easy to overestimate the abilities of your users. The product you want may not be the product they want." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"By giving data back to the user, you can create both engagement and revenue. We’re far enough into the data game that most users have realized that they’re not the customer, they’re the product. Their role in the system is to generate data, either to assist in ad targeting or to be sold to the highest bidder, or both." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Generalizing beyond advertising, when building any data product in which the data is obfuscated (where there isn’t a clear relationship between the user and the result), you can compromise on precision, but not on recall. But when the data is exposed, focus on high precision." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Ideas for data products tend to start simple and become complex; if they start complex, they become impossible." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"In many applications, a design treatment that gives the user control over the outcome can go far to create interactions that leave the user feeling good." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The best way to avoid data vomit is to focus on actionability of data. That is, what action do you want the user to take? If you want them to be impressed with the number of things that you can do with the data, then you’re likely producing data vomit. If you’re able to lead them to a clear set of actions, then you’ve built a product with a clear focus." (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"The key aspect of making a data product is putting the 'product' first and 'data' second. Saying it another way, data is one mechanism by which you make the product user-focused. With all products, you should ask yourself the following three questions: (1) What do you want the user to take away from this product? (2) What action do you want the user to take because of the product? (3) How should the user feel during and after using your product?" (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"You can give your data product a better chance of success by carefully setting the users’ expectations. [...] One under-appreciated facet of designing data products is how the user feels after using the product. Does he feel good? Empowered? Or disempowered and dejected?" (Dhanurjay Patil, "Data Jujitsu: The Art of Turning Data into Product", 2012)

"Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see." (Dhanurjay Patil, "A Code of Ethics for Data Science", 2016)

01 November 2006

🎯Clay Helberg - Collected Quotes

"Another key element in making informative graphs is to avoid confounding design variation with data variation. This means that changes in the scale of the graphic should always correspond to changes in the data being represented." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Another trouble spot with graphs is multidimensional variation. This occurs where two-dimensional figures are used to represent one-dimensional values. What often happens is that the size of the graphic is scaled both horizontally and vertically according to the value being graphed. However, this results in the area of the graphic varying with the square of the underlying data, causing the eye to read an exaggerated effect in the graph." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"It may be helpful to consider some aspects of statistical thought which might lead many people to be distrustful of it. First of all, statistics requires the ability to consider things from a probabilistic perspective, employing quantitative technical concepts such as 'confidence', 'reliability', 'significance'. This is in contrast to the way non-mathematicians often cast problems: logical, concrete, often dichotomous conceptualizations are the norm: right or wrong, large or small, this or that." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"[...] many non-mathematicians hold quantitative data in a sort of awe. They have been lead to believe that numbers are, or at least should be, unquestionably correct." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Most statistical models assume error free measurement, at least of independent (predictor) variables. However, as we all know, measurements are seldom if ever perfect. Particularly when dealing with noisy data such as questionnaire responses or processes which are difficult to measure precisely, we need to pay close attention to the effects of measurement errors. Two characteristics of measurement which are particularly important in psychological measurement are reliability and validity." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

"Remember that a p-value merely indicates the probability of a particular set of data being generated by the null model - it has little to say about the size of a deviation from that model (especially in the tails of the distribution, where large changes in effect size cause only small changes in p-values)." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"There are a number of ways that statistical techniques can be misapplied to problems in the real world. Three of the most common hazards are designing experiments with insufficient power, ignoring measurement error, and performing multiple comparisons." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995)

"We can consider three broad classes of statistical pitfalls. The first involves sources of bias. These are conditions or circumstances which affect the external validity of statistical results. The second category is errors in methodology, which can lead to inaccurate or invalid results. The third class of problems concerns interpretation of results, or how statistical results are applied (or misapplied) to real world issues." (Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995) 

References:
[1] Clay Helberg, "Pitfalls of Data Analysis (or How to Avoid Lies and Damned Lies)", 1995 [link]

31 October 2006

⛩️Daniel Jackson - Collected Quotes

"A commitment to simplicity of design means addressing the essence of design - the abstractions on which software is built - explicitly and up front. Abstractions are articulated, explained, reviewed and examined deeply, in isolation from the details of the implementation. This doesn’t imply a waterfall process, in which all design and specification precedes all coding. But developers who have experienced the benefits of this separation of concerns are reluctant to rush to code, because they know that an hour spent on designing abstractions can save days of refactoring." (Daniel Jackson, "Software Abstractions", 2006)

"A language for describing software abstractions is more than just a logic. You need ways to organize a model, to build larger models from smaller ones, and to factor out components that can be used more than once. There are also small syntactic details - such as shorthands for declarations - that make a language usable in practice. And finally, there’s the need to communicate with an analysis tool, by indicating which analyses are to be performed." (Daniel Jackson, "Software Abstractions", 2006)

"A model diagram declares some sets and binary relations, and imposes some basic constraints on them. A diagram is a good way to convey the outline of a model, but diagrams aren’t expressive enough to include detailed constraints." (Daniel Jackson, "Software Abstractions", 2006) 

"Abstractions matter to users too. Novice users want programs whose abstractions are simple and easy to understand; experts want abstractions that are robust and general enough to be combined in new ways. When good abstractions are missing from the design, or erode as the system evolves, the resulting program grows barnacles of complexity. The user is then forced to master a mass of spurious details, to develop workarounds, and to accept frequent, inexplicable failures." (Daniel Jackson, "Software Abstractions", 2006)

"An assertion is a constraint that is intended to follow from the facts of the model. […] Typically, assertions play two different roles. Some express mundane properties that aren’t interesting in their own right; they’re written purely to detect flaws in the model. It’s surprising how effective even a few such assertions can be in uncovering subtle flaws. […] Other assertions express truly essential properties, and are sometimes more fundamental than the facts of the model." (Daniel Jackson, "Software Abstractions", 2006)

"Analysis brings software abstractions to life in three ways. First, it encourages you as you explore, by giving you concrete examples that reinforce intuition and suggest new scenarios. Second, it keeps you honest, by helping you to check as you go along that what you write down means what you think it means. And third, it can reveal subtle fl aws that you might not have discovered until much later (or not at all)." (Daniel Jackson, "Software Abstractions", 2006)

"An abstraction is not a module, or an interface, class, or method; it is a structure, pure and simple - an idea reduced to its essential form. Since the same idea can be reduced to different forms, abstractions are always, in a sense, inventions, even if the ideas they reduce existed before in the world outside the software. The best abstractions, however, capture their underlying ideas so naturally and convincingly that they seem more like discoveries." (Daniel Jackson, "Software Abstractions", 2006)

"Software is built on abstractions. Pick the right ones, and programming will flow naturally from design; modules will have small and simple interfaces; and new functionality will more likely fit in without extensive reorganization […] Pick the wrong ones, and programming will be a series of nasty surprises: interfaces will become baroque and clumsy as they are forced to accommodate unanticipated interactions, and even the simplest of changes will be hard to make." (Daniel Jackson, "Software Abstractions", 2006)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.