11 May 2018

Data Science: K-Means Algorithm (Definitions)

"A top-down grouping method where the number of clusters is defined prior to grouping." (Glenn J Myatt, "Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining", 2006)

"An algorithm used to assign K centers to represent the clustering of N points (K< N). The points are iteratively adjusted so that each of the N points is assigned to one of the K clusters, and each of the K clusters is the mean of its assigned points." (Robert Nisbet et al, "Handbook of statistical analysis and data mining applications", 2009)

"The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k = n. The algorithm minimizes the total intra-cluster variance or the squared error function." (Dimitrios G Tsalikakis et al, "Segmentation of Cardiac Magnetic Resonance Images", 2009)

"The k-means algorithm assigns any number of data objects to one of k clusters." (Jules H Berman, "Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information", 2013)

"The clustering algorithm that divides a dataset into k groups such that the members in each group are as similar as possible, that is, closest to one another." (David Natingga, "Data Science Algorithms in a Week" 2nd Ed., 2018)

"K-Means is a technique for clustering. It works by randomly placing K points, called centroids, and iteratively moving them to minimize the squared distance of elements of a cluster to their centroid." (Alex Thomas, "Natural Language Processing with Spark NLP", 2020)

"It is an iterative algorithm that partition the hole data set into K non overlaping subsets (Clusters). Each data point belongs to only one subset." (Aman Tyagi, "Healthcare-Internet of Things and Its Components: Technologies, Benefits, Algorithms, Security, and Challenges", 2021)

[Non-scalable K-means:] "A Microsoft Clustering algorithm method that uses a distance measure to assign a data point to its closest cluster." (Microsoft Technet)

"An algorithm that places each value in the cluster with the nearest mean, and in which clusters are formed by minimizing the within-cluster deviation from the mean." (Microsoft, "SSAS Glossary")

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.