SQL Troubles: April 2017

28 April 2017

⛏️Data Management: Completeness (Definitions)

"A characteristic of information quality that measures the degree to which there is a value in a field; synonymous with fill rate. Assessed in the data quality dimension of Data Integrity Fundamentals." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Containing by a composite data all components necessary to full description of the states of a considered object or process." (Juliusz L Kulikowski, "Data Quality Assessment", 2009)

"An inherent quality characteristic that is a measure of the extent to which an attribute has values for all instances of an entity class." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Completeness is a dimension of data quality. As used in the DQAF, completeness implies having all the necessary or appropriate parts; being entire, finished, total. A dataset is complete to the degree that it contains required attributes and a sufficient number of records, and to the degree that attributes are populated in accord with data consumer expectations. For data to be complete, at least three conditions must be met: the dataset must be defined so that it includes all the attributes desired (width); the dataset must contain the desired amount of data (depth); and the attributes must be populated to the extent desired (density). Each of these secondary dimensions of completeness can be measured differently." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"Completeness is defined as a measure of the presence of core source data elements that, exclusive of derived fields, must be present in order to complete a given business process." (Rajesh Jugulum, "Competing with High Quality Data", 2014)

"Complete existence of all values or attributes of a record that are necessary." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The degree to which all data has been delivered or stored and no values are missing. Examples are empty or missing records." (Piethein Strengholt, "Data Management at Scale", 2020)

"The degree to which elements that should be contained in the model are indeed there." (Panos Alexopoulos, "Semantic Modeling for Data", 2020)

"The degree of data representing all properties and instances of the real-world context." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data is considered 'complete' when it fulfills expectations of comprehensiveness." (Precisely) [source]

"The degree to which all required measures are known. Values may be designated as “missing” in order not to have empty cells, or missing values may be replaced with default or interpolated values. In the case of default or interpolated values, these must be flagged as such to distinguish them from actual measurements or observations. Missing, default, or interpolated values do not imply that the dataset has been made complete." (CODATA)

27 April 2017

⛏️Data Management: Availability (Definitions)

"Corresponds to the information that should be available when necessary and in the appropriate format." (José M Gaivéo, "Security of ICTs Supporting Healthcare Activities", 2013)

"A property by which the data is available all the time during the business hours. In cloud computing domain, the data availability by the cloud service provider holds a crucial importance." (Sumit Jaiswal et al, "Security Challenges in Cloud Computing", 2015)

"Availability: the ability of the data user to access the data at the desired point in time." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"It is one of the main aspects of the information security. It means data should be available to its legitimate user all the time whenever it is requested by them. To guarantee availability data is replicated at various nodes in the network. Data must be reliably available." (Omkar Badve et al, "Reviewing the Security Features in Contemporary Security Policies and Models for Multiple Platforms", 2016)

"Timely, reliable access to data and information services for authorized users." (Maurice Dawson et al, "Battlefield Cyberspace: Exploitation of Hyperconnectivity and Internet of Things", 2017)

"A set of principles and metrics that assures the reliability and constant access to data for the authorized individuals or groups." (Gordana Gardašević et al, "Cybersecurity of Industrial Internet of Things", 2020)

"Ensuring the conditions necessary for easy retrieval and use of information and system resources, whenever necessary, with strict conditions of confidentiality and integrity." (Alina Stanciu et al, "Cyberaccounting for the Leaders of the Future", 2020)

"The state when data are in the place needed by the user, at the time the user needs them, and in the form needed by the user." (CODATA)

"The state that exists when data can be accessed or a requested service provided within an acceptable period of time." (NISTIR 4734)

"Timely, reliable access to information by authorized entities." (NIST SP 800-57 Part 1)

25 April 2017

⛏️Data Management: Data Products (Definitions)

"In the case of data mesh, a data product is an architectural quantum. It is the smallest unit of architecture that can be independently deployed and managed." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product is a data asset that should be trusted, reusable, and accessible. The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers." (Marthe Mengen, 2024) [source]

"A data product is a self-contained, independently deployable unit of data that delivers business value." (James Serra, "Deciphering Data Architectures", 2024)

"A collection of optimized data or data-related assets that are packaged for reuse and distribution with controlled access. Data products contain data as well as models, dashboards, and other computational asset types. Unlike data assets in governance catalogs, data products are managed as products with multiple purposes to provide business value." (IBM)

"A data product, in general terms, is any tool or application that processes data and generates results. […] Data products have one primary objective: to manage, organize and make sense of the vast amount of data that organizations collect and generate. It’s the users’ job to put the insights to use that they gain from these data products, take actions and make better decisions based on these insights." (Sisense) [source]

"A data product is a product built around data, containing everything required to complete a specific task or objective using that underlying data." (Opendatasoft)

"A data product is digital information that can be purchased." (Techtarget) [source]

"A key concept in data mesh architecture, Data Products are independent units of data managed by a specific domain team. They are responsible for defining, publishing, and maintaining their data assets while ensuring high-quality data that meets the needs of its consumers." (DataHub)

[Data product specification:] "Detailed description of a data set or data set series together with additional information that will enable it to be created, supplied to and used by another party" (ISO 19131)

"Data set or data set series that conforms to a data product specification" (ISO 19131)

12 April 2017

⛏️Data Management: Accessibility (Definitions)

"Capable of being reached, capable of being used or seen." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"The degree to which data can be obtained and used." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"The opportunity to find, as well as the ease and convenience associated with locating, information. Often, this is related to the physical location of the individual seeking the information and the physical location of the information in a book or journal." (Jimmie L Joseph & David P Cook, "Medical Ethical and Policy Issues Arising from RIA", 2008)

"An inherent quality characteristic that is a measure of the ability to access data when it is required." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"The ability to readily obtain data when needed." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Accessibility refers to the difficulty level for users to obtain data. Accessibility is closely linked with data openness, the higher the data openness degree, the more data types obtained, and the higher the degree of accessibility." (Li Cai & Yangyong Zhu, "The Challenges of Data Quality and Data Quality Assessment in the Big Data Era", 2015) [source]

"It is the state of each user to have access to any information at any time." (ihsan Eken & Basak Gezmen, "Accessibility for Everyone in Health Communication Mobile Application Usage", 2020)

"Data accessibility measures the extent to which government data are provided in open and re-usable formats, with their associated metadata." (OECD)

⛏️Data Management: Data Virtualization (Definitions)

"The concept of letting data stay 'where it lives; and developing a hardware and software architecture that exposes the data to various business processes and organizations. The goal of virtualization is to shield developers and users from the complexity of the underlying data structures." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"The ability to easily select and combine data fragments from many different locations dynamically and in any way into a single data structure while also maintaining its semantic accuracy." (Michael M David & Lee Fesperman, "Advanced SQL Dynamic Data Modeling and Hierarchical Processing", 2013)

"The process of retrieving and manipulating data without requiring details of how the data formatted or where the data is located" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A data integration process used to gain more insights. Usually it involves databases, applications, file systems, websites, big data techniques, and so on." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"Data virtualization is an approach that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source or where it is physically located, and can provide a single customer view (or single view of any other entity) of the overall data. Some database vendors provide a database (virtual) query layer, which is also called a data virtualization layer. This layer abstracts the database and optimizes the data for better read performance. Another reason to abstract is to intercept queries for better security. An example is Amazon Athena." (Piethein Strengholt, "Data Management at Scale", 2020)

"A data integration process in order to gain more insights. Usually it involves databases, applications, file systems, websites, big data techniques, etc.)." (Analytics Insight)

"The integration and transformation of data in real time or near real time from disparate data sources in multicloud and hybrid cloud, to support business intelligence, reporting, analytics, and other workloads." (Forrester)

⛏️Data Management: Data Lineage (Definitions)

"A mechanism for recording information to determine the source of any piece of data, and the transformations applied to that data using Data Transformation Services (DTS). Data lineage can be tracked at the package and row levels of a table and provides a complete audit trail for information stored in a data warehouse. Data lineage is available only for packages stored in Microsoft Repository." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"This information is used by Data Transformation Services (DTS) when it works in conjunction with Meta Data Services. This information records the history of package execution and data transformations for each piece of data." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"This is also called data provenance. It deals with the origin of data; it is all about documenting where data is, how it has been derived, and how it flows so you can manage and secure it appropriately as it is further processed by applications." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

"This provides the functionality to determine where data comes from, how it is transformed, and where it is going. Data lineage metadata traces the lifecycle of information between systems, including the operations that are performed on the data." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Data lineage refers to a set of identifiable points that can be used to understand details of data movement and transformation (e.g., transactional source field names, file names, data processing job names, programming rules, target table fields). Lineage describes the movement of data through systems from its origin or provenance to its use in a particular application. Lineage is related to both the data chain and the information life cycle. Most people concerned with the lineage of data want to understand two aspects of it: the data’s origin and the ways in which the data has changed since it was originally created. Change can take place within one system or between systems." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

⛏️Data Management: Data Federation (Definitions)

"Data access to a variety of data stores, using consistent rules and definitions that enable all the data stores to be treated as a single resource." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies 2nd Ed.", 2009)

"Technology that joins data from different sources, operational or analytic, around an organization. Data federation allows users to have a single view of disparate data without having to understand the details of the individual data sources." (Tony Fisher, "The Data Asset", 2009)

"A method of transparently joining or linking data from multiple physical locations and/or multiple platforms." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data access to a variety of data stores, using consistent rules and definitions that enable all the data stores to be treated as a single resource. " (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"Data federation technology is software that provides an organization with the ability to aggregate data from disparate sources in a virtual database so it can be used for business intelligence (BI) or other analysis." (Techtarget)

"Process where data is collected from distinct databases without ever copying or transforming the original data." (Solutions Review)

06 April 2017

⛏️Data Management: Data Mesh (Definitions)

"Data Mesh is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data mesh is an architectural concept in data engineering that gives business domains (divisions/departments) within a large organization ownership of the data they produce. The centralized data management team then becomes the organization’s data governance team." (Margaret Rouse, 2023) [source]

"Data Mesh is a design concept based on federated data and business domains. It applies product management thinking to data management with the outcome being Data Products. It’s technology agnostic and calls for a domain-centric organization with federated Data Governance." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules."(James Serra, "Deciphering Data Architectures", 2024)

"A data mesh is a federated data architecture that emphasizes decentralizing data across business functions or domains such as marketing, sales, human resources, and more. It facilitates organizing and managing data in a logical way to facilitate the more targeted and efficient use and governance of the data across organizations." (Arshad Ali & Bradley Schacht, "Learn Microsoft Fabric", 2024)

"To explain a data mesh in one sentence, a data mesh is a centrally managed network of decentralized data products. The data mesh breaks the central data lake into decentralized islands of data that are owned by the teams that generate the data. The data mesh architecture proposes that data be treated like a product, with each team producing its own data/output using its own choice of tools arranged in an architecture that works for them. This team completely owns the data/output they produce and exposes it for others to consume in a way they deem fit for their data." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"A data mesh is a decentralized data architecture that organizes data by a specific business domain - for example, marketing, sales, customer service and more - to provide more ownership to the producers of a given data set." (IBM) [source]

"A data mesh is a new approach to designing data architectures. It takes a decentralized approach to data storage and management, having individual business domains retain ownership over their datasets rather than flowing all of an organization’s data into a centrally owned data lake." (Alteryx) [source]

"A Data Mesh is a solution architecture for the specific goal of building business-focused data products without preference or specification of the technology involved." (Gartner)

"A data mesh is an architectural framework that solves advanced data security challenges through distributed, decentralized ownership." (AWS) [source]

"Data mesh defines a platform architecture based on a decentralized network. The data mesh distributes data ownership and allows domain-specific teams to manage data independently." (TIBCO) [source]

"Data mesh refers to a data architecture where data is owned and managed by the teams that use it. A data mesh decentralizes data ownership to business domains–such as finance, marketing, and sales–and provides them a self-serve data platform and federated computational governance." (Qlik) [source]

SQL Troubles

Pages