Showing posts with label ontology. Show all posts
Showing posts with label ontology. Show all posts

06 June 2013

🎓Knowledge Management: Ontology (Definitions)

"A data model that represents the entities that are defined and evaluated by its own attributes, and organized according to a hierarchy and a semantic. Ontologies are used for representing knowledge on the whole of a specific domain or on of it." (Gervásio Iwens et al, "Programming Body Sensor Networks", 2008)

"An ontology specifies a conceptualization, that is, a structure of related concepts for a given domain." (Troels Andreasen & Henrik Bulskov, "Query Expansion by Taxonomy", 2008)

"A semantic structure useful to standardize and provide rigorous definitions of the terminology used in a domain and to describe the knowledge of the domain. It is composed of a controlled vocabulary, which describes the concepts of the considered domain, and a semantic network, which describes the relations among such concepts. Each concept is connected to other concepts of the domain through semantic relations that specify the knowledge of the domain. A general concept can be described by several terms that can be synonyms or characteristic of different domains in which the concept exists. For this reason the ontologies tend to have a hierarchical structure, with generic concepts/terms at the higher levels of the hierarchy and specific concepts/terms at the lover levels, connected by different types of relations." (Mario Ceresa, "Clinical and Biomolecular Ontologies for E-Health", Handbook of Research on Distributed Medical Informatics and E-Health, 2009)

"In the context of knowledge sharing, the chapter uses the term ontology to mean a specification of conceptual relations. An ontology is the concepts and relationships that can exist for an agent or a community of agents. The chapter refers to designing ontologies for the purpose of enabling knowledge sharing and re-use." (Ivan Launders, "Socio-Technical Systems and Knowledge Representation", 2009)

 "The systematic description of a given phenomenon, which often includes a controlled vocabulary and relationships, captures nuances in meaning and enables knowledge sharing and reuse. Typically, ontology defines data entities, data attributes, relations and possible functions and operations." (Mark Olive, "SHARE: A European Healthgrid Roadmap", 2009)

"Those things that exist are those things that have a formal representation within the context of a machine. Knowledge commits to an ontology if it adheres to the structure, vocabulary and semantics intrinsic to a particular ontology i.e. it conforms to the ontology definition. A formal ontology in computer science is a logical theory that represents a conceptualization of real world concepts." (Philip D. Smart, "Semantic Web Rule Languages for Geospatial Ontologies", 2009)

"A formal representation of a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to define the domain." (Yong Yu et al, "Social Tagging: Properties and Applications", 2010)

"Is set of well-defined concepts describing a specific domain." (Hak-Lae Kim et al, "Representing and Sharing Tagging Data Using the Social Semantic Cloud of Tags", 2010)

"An ontology is a 'formal, explicit specification of a shared conceptualisation'. It is composed of concepts and relations structured into hierarchies (i.e. they are linked together by using the Specialisation/Generalisation relationship). A heavyweight ontology is a lightweight ontology (i.e. an ontology simply based on a hierarchy of concepts and a hierarchy of relations) enriched with axioms used to fix the semantic interpretation of concepts and relations." (Francky Trichet et al, "OSIRIS: Ontology-Based System for Semantic Information Retrieval and Indexation Dedicated to Community and Open Web Spaces", 2011)

"The set of the things that can be dealt with in a particular domain, together with their relationships." (Steven Woods et al, "Knowledge Dissemination in Portals", 2011) 

"In semantic web and related technologies, an ontology (aka domain ontology) is a set of taxonomies together with typed relationships connecting concepts from the taxonomies and, possibly, sets of integrity rules and constraints defining classes and relationships." (Marcus Spies & Said Tabet, "Emerging Standards and Protocols for Governance, Risk, and Compliance Management", 2012)

"High-level knowledge and data representation structure. Ontologies provide a formal frame to represent the knowledge related with a complex domain, as a qualitative model of the system. Ontologies can be used to represent the structure of a domain by means of defining concepts and properties that relate them." (Lenka Lhotska et al, "Interoperability of Medical Devices and Information Systems", 2013)

"(a) In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about concepts. (b) In philosophy, ontology is the study of the nature of being, becoming, existence , or reality , as well as the basic categories of being and their relations. Traditionally listed as a part of the major branch of philosophy known as metaphysics, ontology deals with questions concerning what entities exist or can be said to exist, and how such entities can be grouped, related within a hierarchy, and subdivided according to similarities and differences." (Ronald J Lofaro, "Knowledge Engineering Methodology with Examples", 2015)

"It is a shared structure which classify and organizes all the entities of a given domain." (T R Gopalakrishnan Nair, "Intelligent Knowledge Systems", 2015)

"The study of how things relate. Used in big data to analyze seemingly unrelated data to discover insights." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"An ontology is a formal, explicit specification of a shared conceptualization." (Fu Zhang et al, "A Review of Answering Queries over Ontologies Based on Databases", 2016)

18 January 2010

🗄️Data Management: Data Quality Dimensions (Part V: Consistency)

Data Management
Data Management Series

IEEE defines consistency in general as "the degree of uniformity, standardization, and freedom from contradiction among the documents or parts of a system or component" [4]. In respect to data, consistency can be defined thus as the degree of uniformity and standardization of data values among systems or data repositories (aka cross-system consistencies), records within the same repository (aka cross-record consistency), or within the same record at different points in time (temporal consistency) (see [2], [3]). 

Unfortunately, uniformity, standardization and freedom can be considered as data quality dimensions as well and they might even have broader scope than the one provided by consistency. Moreover, the definition requires further definitions for the concept to be understood, which is not ideal. 

Simply put, consistency refers to the extent (data) values are consistent in notation, respectively the degree to which the data values across different contexts match. For example, one system uses "Free on Board" while another systems uses "FOB" to refer to the point obligations, costs, and risk involved in the delivery of goods shift from the seller to the buyer. The two systems refer to the same value by two different notations. When the two systems are integrated, because of the different values used, the rules defined in the target system might make the record fail because it is expecting another value. Conversely, other systems may import the value as it is, leading thus to two values used in parallel for the same meaning. This can happen not only to reference data, but also to master data, for example when a value deviates slightly from the expected value (e.g. misspelled). A more special case is when one of the systems uses case sensitive values (usually the target system, though there can be also bidirectional data integrations).

One solution for such situations would be to "standardize" the values across systems, though not all systems allow to easily change the values once they have been set. Another solution would be to create a mapping as part the integration, though to maintain such mappings for many cases is suboptimal, but in the end, it might be the only solution. Further systems can be impacted by these issues as well (e.g. data warehouses, data marts).

It's recommended to use a predefined list of values (LOV) - a data dictionary, an ontology or any other type of knowledge representation form that can be used to 'enforce' data consistency. ‘Enforce’ is maybe not the best term because the two data sets could be disconnected from each other, being in Users’ responsibility to ensure the overall consistency, or the two data sets could be integrated using specific techniques. In many cases is checked the consistency of the values taken by one attribute against an existing LOV, though for example for data formed from multiple segments (e.g. accounts) each segment might need to be checked against a specific data set or rule generator, such mechanisms implying multi-attribute mappings or associational rules that specify the possible values.

As highlighted also by [1], there are two aspects of consistency: the structural consistency in which two or more values can be distinct in notation but have the same meaning (e.g. missing vs. n/a), and semantic consistency in which each value has a unique meaning (e.g. only n/a is allowed to highlight missing values). It should be targeted to have the data semantically consistent, to avoid confusion, accidental exclusion of data during filtering or reporting. More and more organizations are investing in ontologies, they allow ensuring the semantic consistency of concepts/entities, though for most of the cases simple single or multi-attribute lists of values are enough.


Written: Jan-2010, Last Reviewed: Mar-2024

References:
[1] Chapman A.D. (2005) "Principles of Data Quality", version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen
[2] David Loshin (2009) Master Data Management
[3] IEEE (1990) "IEEE Standard Glossary of Software Engineering Terminology"
[4] DAMA International (2010) "The DAMA Dictionary of Data Management" 1st Ed.

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.