20 May 2017

⛏️Data Management: Data Scrubbing (Definitions)

"The process of making data consistent, either manually, or automatically using programs." (Microsoft Corporation, "Microsoft SQL Server 7.0 System Administration Training Kit", 1999)

Processing data to remove or repair inconsistencies." (Rod Stephens, "Beginning Database Design Solutions", 2008)

"The process of building a data warehouse out of data coming from multiple online transaction processing (OLTP) systems." (Microsoft, "SQL Server 2012 Glossary", 2012)

"A term that is very similar to data deidentification and is sometimes used improperly as a synonym for data deidentification. Data scrubbing refers to the removal, from data records, of identifying information (i.e., information linking the record to an individual) plus any other information that is considered unwanted. This may include any personal, sensitive, or private information contained in a record, any incriminating or otherwise objectionable language contained in a record, and any information irrelevant to the purpose served by the record." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"The process of removing corrupt, redundant, and inaccurate data in the data governance process. (Robert F Smallwood, Information Governance: Concepts, Strategies, and Best Practices, 2014)

"Data Cleansing (or Data Scrubbing) is the action of identifying and then removing or amending any data within a database that is: incorrect, incomplete, duplicated." (experian) [source]

"Data cleansing, or data scrubbing, is the process of detecting and correcting or removing inaccurate data or records from a database. It may also involve correcting or removing improperly formatted or duplicate data or records. Such data removed in this process is often referred to as 'dirty data'. Data cleansing is an essential task for preserving data quality." (Teradata) [source]

"Data scrubbing, also called data cleansing, is the process of amending or removing data in a database that is incorrect, incomplete, improperly formatted, or duplicated." (Techtarget) [source]

"Part of the process of building a data warehouse out of data coming from multiple online transaction processing (OLTP) systems." (Microsoft Technet)

"The process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse." (Information Management)

05 May 2017

⛏️Data Management: Data Steward (Definitions)

"A person with responsibility to improve the accuracy, reliability, and security of an organization’s data; also works with various groups to clearly define and standardize data." (Margaret Y Chu, "Blissful Data ", 2004)

"Critical players in data governance councils. Comfortable with technology and business problems, data stewards seek to speak up for their business units when an organization-wide decision will not work for that business unit. Yet they are not turf protectors, instead seeking solutions that will work across an organization. Data stewards are responsible for communication between the business users and the IT community." (Tony Fisher, "The Data Asset", 2009)

"A business leader and/or subject matter expert designated as accountable for: a) the identification of operational and Business Intelligence data requirements within an assigned subject area, b) the quality of data names, business definitions, data integrity rules, and domain values within an assigned subject area, c) compliance with regulatory requirements and conformance to internal data policies and data standards, d) application of appropriate security controls, e) analyzing and improving data quality, and f) identifying and resolving data related issues. Data stewards are often categorized as executive data stewards, business data stewards, or coordinating data stewards." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

[business data steward:] "A knowledge worker, business leader, and recognized subject matter expert assigned accountability for the data specifications and data quality of specifically assigned business entities, subject areas or databases, but with less responsibility for data governance than a coordinating data steward or an executive data steward." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The person responsible for maintaining a data element in a metadata registry." (Microsoft, "SQL Server 2012 Glossary, 2012)

"The term stewardship is “the management or care of another person’s property” (NOAD). Data stewards are individuals who are responsible for the care and management of data. This function is carried out in different ways based on the needs of particular organizations." (Laura Sebastian-Coleman, "Measuring Data Quality for Ongoing Improvement ", 2012)

"The person responsible for maintaining a data element in a metadata registry." (Microsoft, SQL Server 2012 Glossary, 2012)

"An individual comfortable with both technology and business problems. Stewards are responsible for communicating between the business users and the IT community." (Jim Davis & Aiman Zeid, "Business Transformation: A Roadmap for Maximizing Organizational Insights", 2014)

"A role in the data governance organization that is responsible for the development of a uniform data model for business objects used across boundaries. The data steward is also often responsible for the development of master data management and ensures compliance with the governance rules." (Boris Otto & Hubert Österle, "Corporate Data Quality", 2015)

"A natural person assigned the responsibility to catalog, define, and monitor changes to critical data. Example: The data steward for finance critical data is Dan." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"A person responsible for managing data content, quality, standards, and controls within an organization or function." (Jonathan Ferrar et al, "The Power of People", 2017)

"A data steward is a job role that involves planning, implementing and managing the sourcing, use and maintenance of data assets in an organization. Data stewards enable an organization to take control and govern all the types and forms of data and their associated libraries or repositories." (Richard T Herschel, "Business Intelligence", 2019)

03 May 2017

⛏️Data Management: Hashing (Definitions)

"A technique for providing fast access to data based on a key value by determining the physical storage location of that data." (Jan L Harrington, "Relational Database Dessign: Clearly Explained" 2nd Ed., 2002)

"A mathematical technique for assigning a unique number to each record in a file." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A technique that transforms a key value via an algorithm to a physical storage location to enable quick direct access to data. The algorithm is typically referred to as a randomizer, because the goal of the hashing routine is to spread the key values evenly throughout the physical storage." (Craig S Mullins, "Database Administration", 2012)

"A mathematical technique in which an infinite set of input values is mapped to a finite set of output values, called hash values. Hashing is useful for rapid lookups of data in a hash table." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

"An algorithm converts data values into an address" (Daniel Linstedt & W H Inmon, "Data Architecture: A Primer for the Data Scientist", 2014)

"The technique used for ordering and accessing elements in a collection in a relatively constant amount of time by manipulating the element’s key to identify the element’s location in the collection" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

"The application of an algorithm to a search key to derive a physical storage location." (George Tillmann, "Usage-Driven Database Design: From Logical Data Modeling through Physical Schmea Definition", 2017)

"Hashing is the process of mapping data values to fixed-size hash values (hashes). Common hashing algorithms are Message Digest 5 (MD5) and Secure Hashing Algorithm (SHA). It’s impossible to turn a hash value back into the original data value." (Piethein Strengholt, "Data Management at Scale", 2020)

"A mathematical technique in which an infinite set of input values is mapped to a finite set of output values, called hash values. Hashing is useful for rapid lookups of data in a hash table." (Oracle, "Oracle Database Concepts")

"A process used to convert data into a string of numbers and letters." (AICPA)

"A technique for arranging a set of items, in which a hash function is applied to the key of each item to determine its hash value. The hash value identifies each item's primary position in a hash table, and if this position is already occupied, the item is inserted either in an overflow table or in another available position in the table." (IEEE 610.5-1990)

01 May 2017

⛏️Data Management: Hash (Definitions)

"A number (often a 32-bit integer) that is derived from column values using a lossy compression algorithm. DBMSs occasionally use hashing to speed up access, but indexes are a more common mechanism." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A set of characters generated by running text data through certain algorithms. Often used to create digital signatures and compare changes in content." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Hash, a mathematical method for creating a numeric signature based on content; these days, often unique and based on public key encryption technology." (Bo Leuf, "The Semantic Web: Crafting infrastructure for agency", 2006)

[hash code:] "An integer calculated from an object. Identical objects have the same hash code. Generated by a hash method." (Michael Fitzgerald, "Learning Ruby", 2007)

"An unordered collection of data where keys and values are mapped. Compare with array." (Michael Fitzgerald, "Learning Ruby", 2007)

"A cryptographic hash is a fixed-size bit string that is generated by applying a hash function to a block of data. Secure cryptographic hash functions are collision-free, meaning there is a very small possibility of generating the same hash for two different blocks of data. A secure cryptographic hash function should also be one-way, meaning it is infeasible to retrieve the original text from the hash." (Michael Coles & Rodney Landrum, "Expert SQL Server 2008 Encryption", 2008)

"A hash is the result of applying a mathematical function or transformation on data to generate a smaller 'fingerprint' of the data. Generally, the most useful hash functions are one-way collision-free hashes that guarantee a high level of uniqueness in their results." (Michael Coles, "Pro T-SQL 2008 Programmer's Guide", 2008)

"The output of a hash function." (Mark S Merkow & Lakshmikanth Raghavan, "Secure and Resilient Software Development", 2010)

"A number based on the hash value of a string." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"1.Data allocated in an algorithmically randomized fashion in an attempt to evenly distribute data and smooth access patterns. 2.Verb. To calculate a hash key for data." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A hash is the result of applying a mathematical function or transformation on data to generate a smaller 'fingerprint' of the data. Generally, the most useful hash functions are one-way collision-free hashes that guarantee a high level of uniqueness in their results." (Jay Natarajan et al, "Pro T-SQL 2012 Programmer's Guide" 3rd Ed., 2012)

"An unordered association of key/value pairs, stored such that you can easily use a string key to look up its associated data value. This glossary is like a hash, where the word to be defined is the key and the definition is the value. A hash is also sometimes septisyllabically called an “associative array”, which is a pretty good reason for simply calling it a 'hash' instead." (Jon Orwant et al, "Programming Perl" 4th Ed., 2012)

"In a hash cluster, a unique numeric ID that identifies a bucket. Oracle Database uses a hash function that accepts an infinite number of hash key values as input and sorts them into a finite number of buckets. Each hash value maps to the database block address for the block that stores the rows corresponding to the hash key value (department 10, 20, 30, and so on)." (Oracle, "Database SQL Tuning Guide Glossary", 2013)

"The result of applying a mathematical function or transformation to data to generate a smaller 'fingerprint' of the data. Generally, the most useful hash functions are one-way, collision-free hashes that guarantee a high level of uniqueness in their results." (Miguel Cebollero et al, "Pro T-SQL Programmer’s Guide" 4th Ed., 2015)

[hash code:] "The output of the hash function that is associated with the input object" (Nell Dale et al, "Object-Oriented Data Structures Using Java" 4th Ed., 2016)

"A numerical value produced by a mathematical function, which generates a fixed-length value typically much smaller than the input to the function. The function is many to one, but generally, for all practical purposes, each file or other data block input to a hash function yields a unique hash value." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"The number generated by a hash function to indicate the position of a given item in a hash table." (IEEE 610.5-1990)

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.