SQL Troubles

01 May 2017

⛏️Data Management: Hash (Definitions)

"A number (often a 32-bit integer) that is derived from column values using a lossy compression algorithm. DBMSs occasionally use hashing to speed up access, but indexes are a more common mechanism." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A set of characters generated by running text data through certain algorithms. Often used to create digital signatures and compare changes in content." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Hash, a mathematical method for creating a numeric signature based on content; these days, often unique and based on public key encryption technology." (Bo Leuf, "The Semantic Web: Crafting infrastructure for agency", 2006)

[hash code:] "An integer calculated from an object. Identical objects have the same hash code. Generated by a hash method." (Michael Fitzgerald, "Learning Ruby", 2007)

"An unordered collection of data where keys and values are mapped. Compare with array." (Michael Fitzgerald, "Learning Ruby", 2007)

"A cryptographic hash is a fixed-size bit string that is generated by applying a hash function to a block of data. Secure cryptographic hash functions are collision-free, meaning there is a very small possibility of generating the same hash for two different blocks of data. A secure cryptographic hash function should also be one-way, meaning it is infeasible to retrieve the original text from the hash." (Michael Coles & Rodney Landrum, "Expert SQL Server 2008 Encryption", 2008)

"A hash is the result of applying a mathematical function or transformation on data to generate a smaller 'fingerprint' of the data. Generally, the most useful hash functions are one-way collision-free hashes that guarantee a high level of uniqueness in their results." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"The output of a hash function." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"A number based on the hash value of a string." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"1.Data allocated in an algorithmically randomized fashion in an attempt to evenly distribute data and smooth access patterns. 2.Verb. To calculate a hash key for data." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A hash is the result of applying a mathematical function or transformation on data to generate a smaller 'fingerprint' of the data. Generally, the most useful hash functions are one-way collision-free hashes that guarantee a high level of uniqueness in their results." (Jay Natarajan et al, "Pro T-SQL 2012 Programmer's Guide" 3rd Ed., 2012)

"An unordered association of key/value pairs, stored such that you can easily use a string key to look up its associated data value. This glossary is like a hash, where the word to be defined is the key and the definition is the value. A hash is also sometimes septisyllabically called an “associative array”, which is a pretty good reason for simply calling it a 'hash' instead." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"In a hash cluster, a unique numeric ID that identifies a bucket. Oracle Database uses a hash function that accepts an infinite number of hash key values as input and sorts them into a finite number of buckets. Each hash value maps to the database block address for the block that stores the rows corresponding to the hash key value (department 10, 20, 30, and so on)." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

"The result of applying a mathematical function or transformation to data to generate a smaller 'fingerprint' of the data. Generally, the most useful hash functions are one-way, collision-free hashes that guarantee a high level of uniqueness in their results." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

[hash code:] "The output of the hash function that is associated with the input object" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

"A numerical value produced by a mathematical function, which generates a fixed-length value typically much smaller than the input to the function. The function is many to one, but generally, for all practical purposes, each file or other data block input to a hash function yields a unique hash value." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"The number generated by a hash function to indicate the position of a given item in a hash table." (IEEE 610.5-1990)

28 April 2017

⛏️Data Management: Completeness (Definitions)

"A characteristic of information quality that measures the degree to which there is a value in a field; synonymous with fill rate. Assessed in the data quality dimension of Data Integrity Fundamentals." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Containing by a composite data all components necessary to full description of the states of a considered object or process." (Juliusz L Kulikowski, "Data Quality Assessment", 2009)

"An inherent quality characteristic that is a measure of the extent to which an attribute has values for all instances of an entity class." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"Completeness is a dimension of data quality. As used in the DQAF, completeness implies having all the necessary or appropriate parts; being entire, finished, total. A dataset is complete to the degree that it contains required attributes and a sufficient number of records, and to the degree that attributes are populated in accord with data consumer expectations. For data to be complete, at least three conditions must be met: the dataset must be defined so that it includes all the attributes desired (width); the dataset must contain the desired amount of data (depth); and the attributes must be populated to the extent desired (density). Each of these secondary dimensions of completeness can be measured differently." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"Completeness is defined as a measure of the presence of core source data elements that, exclusive of derived fields, must be present in order to complete a given business process." (Rajesh Jugulum, "Competing with High Quality Data", 2014)

"Complete existence of all values or attributes of a record that are necessary." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"The degree to which all data has been delivered or stored and no values are missing. Examples are empty or missing records." (Piethein Strengholt, "Data Management at Scale", 2020)

"The degree to which elements that should be contained in the model are indeed there." (Panos Alexopoulos, "Semantic Modeling for Data", 2020)

"The degree of data representing all properties and instances of the real-world context." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"Data is considered 'complete' when it fulfills expectations of comprehensiveness." (Precisely) [source]

"The degree to which all required measures are known. Values may be designated as “missing” in order not to have empty cells, or missing values may be replaced with default or interpolated values. In the case of default or interpolated values, these must be flagged as such to distinguish them from actual measurements or observations. Missing, default, or interpolated values do not imply that the dataset has been made complete." (CODATA)

27 April 2017

⛏️Data Management: Availability (Definitions)

"Corresponds to the information that should be available when necessary and in the appropriate format." (José M Gaivéo, "Security of ICTs Supporting Healthcare Activities", 2013)

"A property by which the data is available all the time during the business hours. In cloud computing domain, the data availability by the cloud service provider holds a crucial importance." (Sumit Jaiswal et al, "Security Challenges in Cloud Computing", 2015)

"Availability: the ability of the data user to access the data at the desired point in time." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"It is one of the main aspects of the information security. It means data should be available to its legitimate user all the time whenever it is requested by them. To guarantee availability data is replicated at various nodes in the network. Data must be reliably available." (Omkar Badve et al, "Reviewing the Security Features in Contemporary Security Policies and Models for Multiple Platforms", 2016)

"Timely, reliable access to data and information services for authorized users." (Maurice Dawson et al, "Battlefield Cyberspace: Exploitation of Hyperconnectivity and Internet of Things", 2017)

"A set of principles and metrics that assures the reliability and constant access to data for the authorized individuals or groups." (Gordana Gardašević et al, "Cybersecurity of Industrial Internet of Things", 2020)

"Ensuring the conditions necessary for easy retrieval and use of information and system resources, whenever necessary, with strict conditions of confidentiality and integrity." (Alina Stanciu et al, "Cyberaccounting for the Leaders of the Future", 2020)

"The state when data are in the place needed by the user, at the time the user needs them, and in the form needed by the user." (CODATA)

"The state that exists when data can be accessed or a requested service provided within an acceptable period of time." (NISTIR 4734)

"Timely, reliable access to information by authorized entities." (NIST SP 800-57 Part 1)

25 April 2017

⛏️Data Management: Data Products (Definitions)

"In the case of data mesh, a data product is an architectural quantum. It is the smallest unit of architecture that can be independently deployed and managed." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data product is a data asset that should be trusted, reusable, and accessible. The data product is developed, owned, and managed by a domain, and each domain needs to make sure that its data products are accessible to other domains and their data consumers." (Marthe Mengen, 2024) [source]

"A data product is a self-contained, independently deployable unit of data that delivers business value." (James Serra, "Deciphering Data Architectures", 2024)

"A collection of optimized data or data-related assets that are packaged for reuse and distribution with controlled access. Data products contain data as well as models, dashboards, and other computational asset types. Unlike data assets in governance catalogs, data products are managed as products with multiple purposes to provide business value." (IBM)

"A data product, in general terms, is any tool or application that processes data and generates results. […] Data products have one primary objective: to manage, organize and make sense of the vast amount of data that organizations collect and generate. It’s the users’ job to put the insights to use that they gain from these data products, take actions and make better decisions based on these insights." (Sisense) [source]

"A data product is a product built around data, containing everything required to complete a specific task or objective using that underlying data." (Opendatasoft)

"A data product is digital information that can be purchased." (Techtarget) [source]

"A key concept in data mesh architecture, Data Products are independent units of data managed by a specific domain team. They are responsible for defining, publishing, and maintaining their data assets while ensuring high-quality data that meets the needs of its consumers." (DataHub)

[Data product specification:] "Detailed description of a data set or data set series together with additional information that will enable it to be created, supplied to and used by another party" (ISO 19131)

"Data set or data set series that conforms to a data product specification" (ISO 19131)

12 April 2017

⛏️Data Management: Accessibility (Definitions)

"Capable of being reached, capable of being used or seen." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"The degree to which data can be obtained and used." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"The opportunity to find, as well as the ease and convenience associated with locating, information. Often, this is related to the physical location of the individual seeking the information and the physical location of the information in a book or journal." (Jimmie L Joseph & David P Cook, "Medical Ethical and Policy Issues Arising from RIA", 2008)

"An inherent quality characteristic that is a measure of the ability to access data when it is required." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"The ability to readily obtain data when needed." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Accessibility refers to the difficulty level for users to obtain data. Accessibility is closely linked with data openness, the higher the data openness degree, the more data types obtained, and the higher the degree of accessibility." (Li Cai & Yangyong Zhu, "The Challenges of Data Quality and Data Quality Assessment in the Big Data Era", 2015) [source]

"It is the state of each user to have access to any information at any time." (ihsan Eken & Basak Gezmen, "Accessibility for Everyone in Health Communication Mobile Application Usage", 2020)

"Data accessibility measures the extent to which government data are provided in open and re-usable formats, with their associated metadata." (OECD)

⛏️Data Management: Data Virtualization (Definitions)

"The concept of letting data stay 'where it lives; and developing a hardware and software architecture that exposes the data to various business processes and organizations. The goal of virtualization is to shield developers and users from the complexity of the underlying data structures." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

"The ability to easily select and combine data fragments from many different locations dynamically and in any way into a single data structure while also maintaining its semantic accuracy." (Michael M David & Lee Fesperman, "Advanced SQL Dynamic Data Modeling and Hierarchical Processing", 2013)

"The process of retrieving and manipulating data without requiring details of how the data formatted or where the data is located" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"A data integration process used to gain more insights. Usually it involves databases, applications, file systems, websites, big data techniques, and so on." (Jason Williamson, "Getting a Big Data Job For Dummies", 2015)

"Data virtualization is an approach that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source or where it is physically located, and can provide a single customer view (or single view of any other entity) of the overall data. Some database vendors provide a database (virtual) query layer, which is also called a data virtualization layer. This layer abstracts the database and optimizes the data for better read performance. Another reason to abstract is to intercept queries for better security. An example is Amazon Athena." (Piethein Strengholt, "Data Management at Scale", 2020)

"A data integration process in order to gain more insights. Usually it involves databases, applications, file systems, websites, big data techniques, etc.)." (Analytics Insight)

"The integration and transformation of data in real time or near real time from disparate data sources in multicloud and hybrid cloud, to support business intelligence, reporting, analytics, and other workloads." (Forrester)

⛏️Data Management: Data Lineage (Definitions)

"A mechanism for recording information to determine the source of any piece of data, and the transformations applied to that data using Data Transformation Services (DTS). Data lineage can be tracked at the package and row levels of a table and provides a complete audit trail for information stored in a data warehouse. Data lineage is available only for packages stored in Microsoft Repository." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"This information is used by Data Transformation Services (DTS) when it works in conjunction with Meta Data Services. This information records the history of package execution and data transformations for each piece of data." (Anthony Sequeira & Brian Alderman, "The SQL Server 2000 Book", 2003)

"This is also called data provenance. It deals with the origin of data; it is all about documenting where data is, how it has been derived, and how it flows so you can manage and secure it appropriately as it is further processed by applications." (Martin Oberhofer et al, "Enterprise Master Data Management", 2008)

"This provides the functionality to determine where data comes from, how it is transformed, and where it is going. Data lineage metadata traces the lifecycle of information between systems, including the operations that are performed on the data." (Martin Oberhofer et al, "The Art of Enterprise Information Architecture", 2010)

"Data lineage refers to a set of identifiable points that can be used to understand details of data movement and transformation (e.g., transactional source field names, file names, data processing job names, programming rules, target table fields). Lineage describes the movement of data through systems from its origin or provenance to its use in a particular application. Lineage is related to both the data chain and the information life cycle. Most people concerned with the lineage of data want to understand two aspects of it: the data’s origin and the ways in which the data has changed since it was originally created. Change can take place within one system or between systems." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

⛏️Data Management: Data Federation (Definitions)

"Data access to a variety of data stores, using consistent rules and definitions that enable all the data stores to be treated as a single resource." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies 2nd Ed.", 2009)

"Technology that joins data from different sources, operational or analytic, around an organization. Data federation allows users to have a single view of disparate data without having to understand the details of the individual data sources." (Tony Fisher, "The Data Asset", 2009)

"A method of transparently joining or linking data from multiple physical locations and/or multiple platforms." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data access to a variety of data stores, using consistent rules and definitions that enable all the data stores to be treated as a single resource. " (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"Data federation technology is software that provides an organization with the ability to aggregate data from disparate sources in a virtual database so it can be used for business intelligence (BI) or other analysis." (Techtarget)

"Process where data is collected from distinct databases without ever copying or transforming the original data." (Solutions Review)

06 April 2017

⛏️Data Management: Data Mesh (Definitions)

"Data Mesh is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations." (Zhamak Dehghani, "Data Mesh: Delivering Data-Driven Value at Scale", 2021)

"A data mesh is an architectural concept in data engineering that gives business domains (divisions/departments) within a large organization ownership of the data they produce. The centralized data management team then becomes the organization’s data governance team." (Margaret Rouse, 2023) [source]

"Data Mesh is a design concept based on federated data and business domains. It applies product management thinking to data management with the outcome being Data Products. It’s technology agnostic and calls for a domain-centric organization with federated Data Governance." (Sonia Mezzetta, "Principles of Data Fabric", 2023)

"A data mesh is a decentralized data architecture with four specific characteristics. First, it requires independent teams within designated domains to own their analytical data. Second, in a data mesh, data is treated and served as a product to help the data consumer to discover, trust, and utilize it for whatever purpose they like. Third, it relies on automated infrastructure provisioning. And fourth, it uses governance to ensure that all the independent data products are secure and follow global rules."(James Serra, "Deciphering Data Architectures", 2024)

"A data mesh is a federated data architecture that emphasizes decentralizing data across business functions or domains such as marketing, sales, human resources, and more. It facilitates organizing and managing data in a logical way to facilitate the more targeted and efficient use and governance of the data across organizations." (Arshad Ali & Bradley Schacht, "Learn Microsoft Fabric", 2024)

"To explain a data mesh in one sentence, a data mesh is a centrally managed network of decentralized data products. The data mesh breaks the central data lake into decentralized islands of data that are owned by the teams that generate the data. The data mesh architecture proposes that data be treated like a product, with each team producing its own data/output using its own choice of tools arranged in an architecture that works for them. This team completely owns the data/output they produce and exposes it for others to consume in a way they deem fit for their data." (Aniruddha Deswandikar,"Engineering Data Mesh in Azure Cloud", 2024)

"A data mesh is a decentralized data architecture that organizes data by a specific business domain - for example, marketing, sales, customer service and more - to provide more ownership to the producers of a given data set." (IBM) [source]

"A data mesh is a new approach to designing data architectures. It takes a decentralized approach to data storage and management, having individual business domains retain ownership over their datasets rather than flowing all of an organization’s data into a centrally owned data lake." (Alteryx) [source]

"A Data Mesh is a solution architecture for the specific goal of building business-focused data products without preference or specification of the technology involved." (Gartner)

"A data mesh is an architectural framework that solves advanced data security challenges through distributed, decentralized ownership." (AWS) [source]

"Data mesh defines a platform architecture based on a decentralized network. The data mesh distributes data ownership and allows domain-specific teams to manage data independently." (TIBCO) [source]

"Data mesh refers to a data architecture where data is owned and managed by the teams that use it. A data mesh decentralizes data ownership to business domains–such as finance, marketing, and sales–and provides them a self-serve data platform and federated computational governance." (Qlik) [source]

20 March 2017

⛏️Data Management: Data Structure (Definitions)

"A logical relationship among data elements that is designed to support specific data manipulation functions (trees, lists, and tables)." (William H Inmon, "Building the Data Warehouse", 2005)

"Data stored in a computer in a way that (usually) allows efficient retrieval of the data. Arrays and hashes are examples of data structures." (Michael Fitzgerald, "Learning Ruby", 2007)

"A data structure in computer science is a way of storing data to be used efficiently." (Sahar Shabanah, "Computer Games for Algorithm Learning", 2011)

"Data structure is a general term referring to how data is organized. In modeling, it refers more specifically to the model itself. Tables are referred to as 'structures'." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

[probabilistic *] "A data structure which exploits randomness to boost its efficiency, for example skip lists and Bloom filters. In the case of Bloom filters, the results of certain operations may be incorrect with a small probability." (Wei-Chih Huang & William J Knottenbelt, "Low-Overhead Development of Scalable Resource-Efficient Software Systems", 2014)

"A collection of methods for storing and organizing sets of data in order to facilitate access to them. More formally data structures are concise implementations of abstract data types, where an abstract data type is a set of objects together with a collection of operations on the elements of the set." (Ioannis Kouris et al, "Indexing and Compressing Text", 2015)

"A representation of the logical relationship between elements of data." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"Is a schematic organization of data and relationship to express a reality of interest, usually represented in a diagrammatic form." (Maria T Artese Isabella Gagliardi, "UNESCO Intangible Cultural Heritage Management on the Web", 2015)

"The implementation of a composite data field in an abstract data type" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"A way of organizing data so that it can be efficiently accessed and updated." (Vasileios Zois et al, "Querying of Time Series for Big Data Analytics", 2016)

"A particular way of storing information, allowing to a high level approach on the software implementation." (Katia Tannous & Fillipe de Souza Silva, "Particle Shape Analysis Using Digital Image Processing", 2018)

"It is a particular way of organizing data in a computer so that they can be used efficiently." (Edgar C Franco et al, "Implementation of an Intelligent Model Based on Machine Learning in the Application of Macro-Ergonomic Methods...", 2019)

"Way information is represented and stored." (Shalin Hai-Jew, "Methods for Analyzing and Leveraging Online Learning Data", 2019)

"A physical or logical relationship among a collection of data elements." (IEEE 610.5-1990)

⛏️Data Management: Data Sharing (Definitions)

"The ability to share individual pieces of data transparently from a database across different applications." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"Exchange of data and/or meta-data in a situation involving the use of open, freely available data formats, where process patterns are known and standard, and where not limited by privacy and confidentiality regulations." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data sharing involves one entity sending data to another entity, usually with the understanding that the other entity will store and use the data. This process may involve free or purchased data, and it may be done willingly, or in compliance with regulations, laws, or court orders." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"The ability of subsystems or application programs to access data directly and to change it while maintaining data integrity." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"The ability of two or more DB2 subsystems to directly access and change a single set of data." (BMC)

⛏️Data Management: Information Overload (Definitions)

"A state in which information can no longer be internalized productively by the individual due to time constraints or the large volume of received information." (Martin J Eppler, "Managing Information Quality" 2nd Ed., 2006)

"Phenomena related to the inability to absorb and manage effectively large amounts of information, creating inefficiencies, stress, and frustration. It has been exacerbated by advances in the generation, storage, and electronic communication of information." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"A situation where relevant information becomes buried in a mass of irrelevant information" (Josep C Morales, "Information Disasters in Networked Organizations", 2008)

"A situation where individuals have access to so much information that it becomes impossible for them to function effectively, sometimes leading to where nothing gets done and the user gives the impression of being a rabbit caught in the glare of car headlights." Alan Pritchard, "Information-Rich Learning Concepts", 2009)

"is the situation when the information processing requirements exceed the information processing capacities." (Jeroen ter Heerdt & Tanya Bondarouk, "Information Overload in the New World of Work: Qualitative Study into the Reasons", 2009)

"Refers to an excess amount of information, making it difficult for individuals to effectively absorb and use information; increases the likelihood of poor decisions." (Leslie G Eldenburg & Susan K Wolcott, "Cost Management" 2nd Ed., 2011)

"The inability to cope with or process ever-growing amounts of data into our lives." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed., 2011)

"The state where the rate or amount of input to a system or person outstrips the capacity or speed of processing that input successfully." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The state in which a huge influx of information interferes with understanding an issue, making good decisions, and performance on the job." (Carol A. Brown, "Economic Impact of Information and Communication Technology in Higher Education", 2014)

"The difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information." (Li Chen, "Mobile Technostress", Encyclopedia of Mobile Phone Behavior, 2015)

"Occurs when excess of information suffocates businesses and causes employees to suffer mental anguish and physical illness. Information overload causes high levels of stress that can result in health problems and the breakdown of individuals’ personal relationships." (Sérgio Maravilhas & Sérgio R G Oliveira, "Entrepreneurship and Innovation: The Search for the Business Idea", 2018)

"A set of subjective and objective difficulties, mainly originating in the amount and complexity of information available and people’s inability to handle such situations." (Tibor Koltay, "Information Overload", 2021)

19 March 2017

⛏️Data Management: Encryption (Definitions)

"A method for keeping sensitive information confidential by changing data into an unreadable form." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"The encoding of data so that the plain text is transformed into something unintelligible, called cipher text." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Reordering of bits of data to make it unintelligible (and therefore useless) to an unauthorized third party, while still enabling the authorized user to use the data after the reverse process of decryption." (David G Hill, "Data Protection: Governance, Risk Management, and Compliance", 2009)

"To transform information from readable plain text to unreadable cipher text to prevent unintended recipients from reading the data." (Janice M Roehl-Anderson, "IT Best Practices for Financial Managers", 2010)

"The process of transforming data using an algorithm (called a cipher) to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key." (Craig S Mullins, "Database Administration", 2012)

"The process of converting readable data (plaintext) into a coded form (ciphertext) to prevent it from being read by an unauthorized party." (Microsoft, "SQL Server 2012 Glossary", 2012)

"The cryptographic transformation of data to produce ciphertext." (Manish Agrawal, "Information Security and IT Risk Management", 2014)

"The process of scrambling data in such a way that it is unreadable by unauthorized users but can be unscrambled by authorized users to be readable again." (Weiss, "Auditing IT Infrastructures for Compliance, 2nd Ed", 2015)

"The transformation of plaintext into unreadable ciphertext." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide, 8th Ed", 2018)

"In computer security, the process of transforming data into an unintelligible form in such a way that the original data either cannot be obtained or can be obtained only by using a decryption process." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"Encryption is about translating the data into complex codes that cannot be interpreted (decrypted) without the use of a decryption key. These keys are typically distributed and stored separately. There are two types of encryption: symmetric key encryption and public key encryption. In symmetric key encryption, the key to both encrypt and decrypt is exactly the same. Public key encryption has two different keys. One key is used to encrypt the values (the public key), and one key is used to decrypt the data (the private key)." (Piethein Strengholt, "Data Management at Scale", 2020)

"The process of encoding data in such a way to prevent unauthorized access." (AICPA)

16 March 2017

⛏️Data Management: Missing Data (Definitions)

"Noise in a bivalent testing input pattern in which one or more components have been changed from the correct value to a value midway between the correct and the incorrect value, i.e. a + 1, or a -1, has been changed to a O." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

"Many databases have cases where not all the attribute values are known. These can be due to structural reasons (e.g., parity for males), due to changes or variations in data collection methodology, or due to nonresponses. In the latter case, it is important to distinguish between ignorable and nonignorable nonresponse. The former must be addressed even though the latter can (usually) be treated as random." (William J Raynor Jr., "The International Dictionary of Artificial Intelligence", 1999)

"Observations where one or more variables contain no value." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"data are said to be missing when there is no information for one or more pattern on one or more features in a research study." (Pedro J García-Laencina et al, "Classification with Incomplete Data", 2010)

"Missing data, also known as lost data, is the data that is lost in an inner join when rows of the tables being joined do not match with any other rows. Missing data can also occur with one-sided joins on the side that is not being preserved. This definition ignores all the other reasons for missing data." (Michael M David & Lee Fesperman, "Advanced SQL Dynamic Data Modeling and Hierarchical Processing", 2013)

"It refers that no data value is stored for the variable in the observation." (Liang-Ting Tsai et al, "Weighting Imputation for Categorical Data", 2014)

"Observations which were planned and are missing." (OECD)

"In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data." (Wikipedia)

15 March 2017

⛏️Data Management: Data Conversion (Definitions)

"The function to translate data from one format to another" (Yang Xiang & Daxin Tian, "Multi-Core Supported Deep Packet Inspection", 2010)

"1.In systems, the migration from the use of one application to another. 2.In data management, the process of preparing, reengineering, cleansing and transforming data and loading it into a new target data structure. Typically, the term is used to describe a one-time event as part of a new database implementation. However, it is sometimes used to describe an ongoing operational procedure." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"(1)The process of changing data structure, format, or contents to comply with some rule or measurement requirement. (2)The process of changing data contents stored in one system so that it can be stored in another system, or used by an application." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of automatically reading data in one file format and emitting the same data in a different format, thus making the data accessible to a wider range of applications." (Open Data Handbook)

"To change data from one form of representation to another; for example, to convert data from an ASCII representation to an EBCDIC representation." (IEEE 610.5-1990)