05 May 2018

Data Science: Clustering (Definitions)

"Grouping of similar patterns together. In this text the term 'clustering' is used only for unsupervised learning problems in which the desired groupings are not known in advance." (Laurene V Fausett, "Fundamentals of Neural Networks: Architectures, Algorithms, and Applications", 1994)

"The process of grouping similar input patterns together using an unsupervised training algorithm." (Joseph P Bigus, "Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support", 1996)

"Clustering attempts to identify groups of observations with similar characteristics." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"The process of organizing objects into groups whose members are similar in some way. A cluster is therefore a collection of objects, which are 'similar' between them and are 'dissimilar' to the objects belonging to other clusters." (Juan R González et al, "Nature-Inspired Cooperative Strategies for Optimization", 2008)

"Grouping the nodes of an ad hoc network such that each group is a self-organized entity having a cluster-head which is responsible for formation and management of its cluster." (Prayag Narula, "Evolutionary Computing Approach for Ad-Hoc Networks", 2009)

"The process of assigning individual data items into groups (called clusters) so that items from the same cluster are more similar to each other than items from different clusters. Often similarity is assessed according to a distance measure." (Alfredo Vellido & Iván Olie, "Clustering and Visualization of Multivariate Time Series", 2010)

"Verb. To output a smaller data set based on grouping criteria of common attributes." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of partitioning the data attributes of an entity or table into subsets or clusters of similar attributes, based on subject matter or characteristic (domain)." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A data mining technique that analyzes data to group records together according to their location within the multidimensional attribute space." (SQL Server 2012 Glossary, "Microsoft", 2012)

"Clustering aims to partition data into groups called clusters. Clustering is usually unsupervised in the sense that the training data is not labeled. Some clustering algorithms require a guess for the number of clusters, while other algorithms don't." (Ivan Idris, "Python Data Analysis", 2014)

"Form of data analysis that groups observations to clusters. Similar observations are grouped in the same cluster, whereas dissimilar observations are grouped in different clusters. As opposed to classification, there is not a class attribute and no predefined classes exist." (Efstathios Kirkos, "Composite Classifiers for Bankruptcy Prediction", 2014)

"Organization of data in some semantically meaningful way such that each cluster contains related data while the unrelated data are assigned to different clusters. The clusters may not be predefined." (Sanjiv K Bhatia & Jitender S Deogun, "Data Mining Tools: Association Rules", 2014)

"Techniques for organizing data into groups of similar cases." (Meta S Brown, "Data Mining For Dummies", 2014)

[cluster analysis:] "A technique that identifies homogenous subgroups or clusters of subjects or study objects." (K  N Krishnaswamy et al, "Management Research Methodology: Integration of Principles, Methods and Techniques", 2016)

"Clustering is a classification technique where similar kinds of objects are grouped together. The similarity between the objects maybe determined in different ways depending upon the use case. Therefore, clustering in measurement space may be an indicator of similarity of image regions, and may be used for segmentation purposes." (Shiwangi Chhawchharia, "Improved Lymphocyte Image Segmentation Using Near Sets for ALL Detection", 2016)

"Clustering techniques share the goal of creating meaningful categories from a collection of items whose properties are hard to directly perceive and evaluate, which implies that category membership cannot easily be reduced to specific property tests and instead must be based on similarity. The end result of clustering is a statistically optimal set of categories in which the similarity of all the items within a category is larger than the similarity of items that belong to different categories." (Robert J Glushko, "The Discipline of Organizing: Professional Edition" 4th Ed., 2016)

[cluster analysis:]"A statistical technique for finding natural groupings in data; it can also be used to assign new cases to groupings or categories." (Jonathan Ferrar et al, "The Power of People", 2017)

"Clustering or cluster analysis is a set of techniques of multivariate data analysis aimed at selecting and grouping homogeneous elements in a data set. Clustering techniques are based on measures relating to the similarity between the elements. In many approaches this similarity, or better, dissimilarity, is designed in terms of distance in a multidimensional space. Clustering algorithms group items on the basis of their mutual distance, and then the belonging to a set or not depends on how the element under consideration is distant from the collection itself." (Crescenzio Gallo, "Building Gene Networks by Analyzing Gene Expression Profiles", 2018)

"Unsupervised learning or clustering is a way of discovering hidden structure in unlabeled data. Clustering algorithms aim to discover latent patterns in unlabeled data using features to organize instances into meaningfully dissimilar groups." (Benjamin Bengfort et al, "Applied Text Analysis with Python: Enabling Language-Aware Data Products with Machine Learning", 2018)

"The term clustering refers to the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters." (Satyadhyan Chickerur et al, "Forecasting the Demand of Agricultural Crops/Commodity Using Business Intelligence Framework", 2019)

"In the machine learning context, clustering is the task of grouping examples into related groups. This is generally an unsupervised task, that is, the algorithm does not use preexisting labels, though there do exist some supervised clustering algorithms." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"A cluster is a group of data objects which have similarities among them. It's a group of the same or similar elements gathered or occurring closely together." (Hari K Kondaveeti et al, "Deep Learning Applications in Agriculture: The Role of Deep Learning in Smart Agriculture", 2021)

"Clustering describes an unsupervised machine learning technique for identifying structures among unstructured data. Clustering algorithms group sets of similar objects into clusters, and are widely used in areas including image analysis, information retrieval, and bioinformatics." (Accenture)

"Describes an unsupervised machine learning technique for identifying structures among unstructured data. Clustering algorithms group sets of similar objects into clusters, and are widely used in areas including image analysis, information retrieval, and bioinformatics." (Accenture)

"The process of identifying objects that are similar to each other and cluster them in order to understand the differences as well as the similarities within the data." (Analytics Insight)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.