04 February 2017

⛏️Data Management: Data Matching (Definitions)

[deterministic matching:] "Deterministic matching algorithms compare and match records according to hard-coded business rules according to their precision. For instance, a rule can be set up that stipulates that every “Bill” be matched with a 'William'." (Jill Dyché & Evan Levy, "Customer Data Integration: Reaching a Single Version of the Truth", 2006)

[probabilistic matching:] "Uses statistical algorithms to deduce the best match between two records. Probabilistic matching usually tracks statistical confidence that two records refer to the same customer." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"A feature of data cleansing tools or the process that matches, or links, associated records through a user-defined or common algorithm." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Data-matching involves bringing together data from disparate services or sources, comparing it, and eliminating duplicate data. There are two types of algorithms that are used in data matching: (1) deterministic algorithms, which strictly use match criteria and weighting to determine the results, and (2) probabilistic algorithms, which use statistical models to adjust the matching based on the frequency of values found in the data." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"A highly specialized set of technologies that allows users to derive a high-confidence value of the party identification that can be used to construct a total view of a party from multiple party records." (Alex Berson & Lawrence Dubov, "Master Data Management and Data Governance", 2010)

[deterministic matching:] "A type of matching that relies on defined patterns and rules for assigning weights and scores for determining similarity." (DAMA International, "The DAMA Dictionary of Data Management" 1st Ed., 2010)

[probabilistic matching:]"A type of matching that relies on statistical analysis of a sample data set to project results on the full data set." (DAMA International, "The DAMA Dictionary of Data Management" 1st Ed., 2010)

[fuzzy matching:] "A technique of decomposing words into component parts and comparing the parts to find an acceptable level of correspondence." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Matching is a technique for statistical control of confounding. In the simplest form, individuals from the two study groups are paired on the basis of similar values of one or more covariates. Matching can be viewed as a special case of stratification in which each stratum consists of only two individuals." (Herbert I Weisberg, "Bias and Causation: Models and Judgment for Valid Comparisons", 2010)

"The process of comparing rows in data sets to determine which rows describe the same thing and are therefore either complimentary or redundant." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.