15 February 2018

Data Science: Data Preparation (Definitions)

Data preparation: "The process which involves checking or logging the data in; checking the data for accuracy; entering the data into the computer; transforming the data; and developing and documenting a database structure that integrates the various measures. This process includes preparation and assignment of appropriate metadata to describe the product in human readable code/format." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"Data Preparation describes a range of processing activities that take place in order to transform a source of data into a format, quality and structure suitable for further analysis or processing. It is often referred to as Data Pre-Processing due to the fact it is an activity that organises the data for a follow-on processing stage." (experian) [source]

"Data preparation [also] called data wrangling, it’s everything that is concerned with the process of getting your data in good shape for analysis. It’s a critical part of the machine learning process." (RapidMiner) [source]

"Data preparation is an iterative-agile process for exploring, combining, cleaning and transforming raw data into curated datasets for self-service data integration, data science, data discovery, and BI/analytics." (Gartner)

"Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. It is an important step prior to processing and often involves reformatting data, making corrections to data and the combining of data sets to enrich data." (Talend) [source]

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.