06 February 2018

Data Science: Data Profiling (Definitions)

"A process focused on generating data metrics and measuring data quality. The data metrics can be collected at the column level, e.g., value frequency, nullity measurements, and uniqueness/match quality measurements; at the table level, e.g., primary key violations; or cross-table relationships, e.g., foreign key violations." (Alex Berson & Lawrence Dubov, "Master Data Management and Customer Data Integration for a Global Enterprise", 2007)

"A set of techniques for searching through data looking for potential errors and anomalies, such as similar data with different spellings, data outside boundaries and missing values." (Keith Gordon, "Principles of Data Management", 2007)

"Data profiling (and analysis services) provides functionality to understand the quality, structure, and relationships of data across enterprise systems, from which data cleansing and standardization rules can be determined for improving the overall data quality and consistency." (Martin Oberhofer et al,"Enterprise Master Data Management", 2008)

"A process for looking at the data within the source systems and understanding the data elements and the anomalies." (Tony Fisher, "The Data Asset", 2009)

"An approach to data quality analysis, using statistics to show patterns of usage, and patterns of contents, and automated as much as possible. Some profiling activities must be done manually, but most can be automated." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data profiling is used to assess the existing state of data quality. It is also used to understand the duplicates in the master data or the gaps in linkages. It can be used to understand the scope of data enrichment to enhance the value of customer data assets." (Saumya Chaki, "Enterprise Information Management in Practice", 2015)

"An automated method of analyzing large amounts of data to determine its quality and integrity." (Gregory Lampshire, "The Data and Analytics Playbook", 2016)

"Data profiling assesses a set of data and provides information on the values, the length of strings, the level of completeness, and the distribution patterns of each column." (Robert Hawker, "Practical Data Quality", 2023)

"The process of examining the data available in different data sources and collecting statistics and information about this data. Data profiling helps to assess the quality level of the data according to a defined goal." (Talend)

"Data profiling, a critical first step in data migration, automates the identification of problematic data and metadata and enables companies to correct inconsistencies, redundancies and inaccuracies in corporate databases." (Information Management)

"Data profiling is the act of examining, cleansing and analyzing an existing data source to generate actionable summaries. Proper techniques of data profiling verify the accuracy and validity of data, leading to better data-driven decision making that customers can use to their advantage." (snowflake) [source]

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.