28 April 2017

Data Quality Dimensions: Completeness (Definitions)

"A characteristic of information quality that measures the degree to which there is a value in a field; synonymous with fill rate. Assessed in the data quality dimension of Data Integrity Fundamentals." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Containing by a composite data all components necessary to full description of the states of a considered object or process." (Juliusz L Kulikowski, "Data Quality Assessment", 2009)

"An inherent quality characteristic that is a measure of the extent to which an attribute has values for all instances of an entity class." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Completeness is a dimension of data quality. As used in the DQAF, completeness implies having all the necessary or appropriate parts; being entire, finished, total. A dataset is complete to the degree that it contains required attributes and a sufficient number of records, and to the degree that attributes are populated in accord with data consumer expectations. For data to be complete, at least three conditions must be met: the dataset must be defined so that it includes all the attributes desired (width); the dataset must contain the desired amount of data (depth); and the attributes must be populated to the extent desired (density). Each of these secondary dimensions of completeness can be measured differently." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"Completeness is defined as a measure of the presence of core source data elements that, exclusive of derived fields, must be present in order to complete a given business process." (Rajesh Jugulum, "Competing with High Quality Data", 2014)

"Complete existence of all values or attributes of a record that are necessary." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The degree to which all data has been delivered or stored and no values are missing. Examples are empty or missing records." (Piethein Strengholt, "Data Management at Scale", 2020)

"The degree to which elements that should be contained in the model are indeed there." (Panos Alexopoulos, "Semantic Modeling for Data", 2020)

"Data is considered 'complete' when it fulfills expectations of comprehensiveness." (Precisely) [source]

"The degree to which all required measures are known. Values may be designated as “missing” in order not to have empty cells, or missing values may be replaced with default or interpolated values. In the case of default or interpolated values, these must be flagged as such to distinguish them from actual measurements or observations. Missing, default, or interpolated values do not imply that the dataset has been made complete." (CODATA)

27 April 2017

Data Quality Dimensions: Availability (Definitions)

"Corresponds to the information that should be available when necessary and in the appropriate format." (José M Gaivéo, "Security of ICTs Supporting Healthcare Activities", 2013)

"A property by which the data is available all the time during the business hours. In cloud computing domain, the data availability by the cloud service provider holds a crucial importance." (Sumit Jaiswal et al, "Security Challenges in Cloud Computing", 2015) 

"Availability: the ability of the data user to access the data at the desired point in time." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"It is one of the main aspects of the information security. It means data should be available to its legitimate user all the time whenever it is requested by them. To guarantee availability data is replicated at various nodes in the network. Data must be reliably available." (Omkar Badve et al, "Reviewing the Security Features in Contemporary Security Policies and Models for Multiple Platforms", 2016)

"Timely, reliable access to data and information services for authorized users." (Maurice Dawson et al, "Battlefield Cyberspace: Exploitation of Hyperconnectivity and Internet of Things", 2017)

"A set of principles and metrics that assures the reliability and constant access to data for the authorized individuals or groups." (Gordana Gardašević et al, "Cybersecurity of Industrial Internet of Things", 2020)

"Ensuring the conditions necessary for easy retrieval and use of information and system resources, whenever necessary, with strict conditions of confidentiality and integrity." (Alina Stanciu et al, "Cyberaccounting for the Leaders of the Future", 2020)

"The state when data are in the place needed by the user, at the time the user needs them, and in the form needed by the user." (CODATA)

"The state that exists when data can be accessed or a requested service provided within an acceptable period of time." (NISTIR 4734)

"Timely, reliable access to information by authorized entities." (NIST SP 800-57 Part 1)

25 April 2017

Data Management: Data Products (Definitions)

"In the case of data mesh, a data product is an architectural quantum. It is the smallest unit of architecture that can be independently deployed and managed." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product is a data asset that should be trusted, reusable, and accessible. The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers." (Marthe Mengen, 2024) [source] 

"A data product is a self-contained, independently deployable unit of data that delivers business value." (James Serra, "Deciphering Data Architectures", 2024)

"A collection of optimized data or data-related assets that are packaged for reuse and distribution with controlled access. Data products contain data as well as models, dashboards, and other computational asset types. Unlike data assets in governance catalogs, data products are managed as products with multiple purposes to provide business value." (IBM)

"A data product, in general terms, is any tool or application that processes data and generates results. […] Data products have one primary objective: to manage, organize and make sense of the vast amount of data that organizations collect and generate. It’s the users’ job to put the insights to use that they gain from these data products, take actions and make better decisions based on these insights." (Sisense) [source]

"A data product is a product built around data, containing everything required to complete a specific task or objective using that underlying data." (Opendatasoft)

"A data product is digital information that can be purchased." (Techtarget) [source]

"A key concept in data mesh architecture, Data Products are independent units of data managed by a specific domain team. They are responsible for defining, publishing, and maintaining their data assets while ensuring high-quality data that meets the needs of its consumers." (DataHub)

[Data product specification:] "Detailed description of a data set or data set series together with additional information that will enable it to be created, supplied to and used by another party" (ISO 19131)

"Data set or data set series that conforms to a data product specification" (ISO 19131)

12 April 2017

Data Management: Accessibility (Definitions)

"Capable of being reached, capable of being used or seen." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"The degree to which data can be obtained and used." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"The opportunity to find, as well as the ease and convenience associated with locating, information. Often, this is related to the physical location of the individual seeking the information and the physical location of the information in a book or journal." (Jimmie L Joseph & David P Cook, "Medical Ethical and Policy Issues Arising from RIA", 2008)

"An inherent quality characteristic that is a measure of the ability to access data when it is required." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"The ability to readily obtain data when needed." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Accessibility refers to the difficulty level for users to obtain data. Accessibility is closely linked with data openness, the higher the data openness degree, the more data types obtained, and the higher the degree of accessibility." (Li Cai & Yangyong Zhu, "The Challenges of Data Quality and Data Quality Assessment in the Big Data Era", 2015) [source]

"It is the state of each user to have access to any information at any time." (ihsan Eken & Basak Gezmen, "Accessibility for Everyone in Health Communication Mobile Application Usage", 2020)

"Data accessibility measures the extent to which government data are provided in open and re-usable formats, with their associated metadata." (OECD)

Data Management: Data Virtualization (Definitions)

"The concept of letting data stay 'where it lives; and developing a hardware and software architecture that exposes the data to various business processes and organizations. The goal of virtualization is to shield developers and users from the complexity of the underlying data structures." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"The ability to easily select and combine data fragments from many different locations dynamically and in any way into a single data structure while also maintaining its semantic accuracy." (Michael M David & Lee Fesperman, "Advanced SQL Dynamic Data Modeling and Hierarchical Processing", 2013)

"The process of retrieving and manipulating data without requiring details of how the data formatted or where the data is located" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A data integration process used to gain more insights.  Usually it involves databases, applications, file systems, websites, big data techniques, and so on." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"Data virtualization is an approach that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source or where it is physically located, and can provide a single customer view (or single view of any other entity) of the overall data. Some database vendors provide a database (virtual) query layer, which is also called a data virtualization layer. This layer abstracts the database and optimizes the data for better read performance. Another reason to abstract is to intercept queries for better security. An example is Amazon Athena." (Piethein Strengholt, "Data Management at Scale", 2020)

"A data integration process in order to gain more insights. Usually it involves databases, applications, file systems, websites, big data techniques, etc.)." (Analytics Insight)

 "The integration and transformation of data in real time or near real time from disparate data sources in multicloud and hybrid cloud, to support business intelligence, reporting, analytics, and other workloads." (Forrester)

Data Management: Data Lineage (Definitions)

 "A mechanism for recording information to determine the source of any piece of data, and the transformations applied to that data using Data Transformation Services (DTS). Data lineage can be tracked at the package and row levels of a table and provides a complete audit trail for information stored in a data warehouse. Data lineage is available only for packages stored in Microsoft Repository." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"This information is used by Data Transformation Services (DTS) when it works in conjunction with Meta Data Services. This information records the history of package execution and data transformations for each piece of data." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"This is also called data provenance. It deals with the origin of data; it is all about documenting where data is, how it has been derived, and how it flows so you can manage and secure it appropriately as it is further processed by applications." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

"This provides the functionality to determine where data comes from, how it is transformed, and where it is going. Data lineage metadata traces the lifecycle of information between systems, including the operations that are performed on the data." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Data lineage refers to a set of identifiable points that can be used to understand details of data movement and transformation (e.g., transactional source field names, file names, data processing job names, programming rules, target table fields). Lineage describes the movement of data through systems from its origin or provenance to its use in a particular application. Lineage is related to both the data chain and the information life cycle. Most people concerned with the lineage of data want to understand two aspects of it: the data’s origin and the ways in which the data has changed since it was originally created. Change can take place within one system or between systems." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.