26 November 2018

🔭Data Science: Clustering (Just the Quotes)

"To the untrained eye, randomness appears as regularity or tendency to cluster." (William Feller, "An Introduction to Probability Theory and its Applications", 1950) 

"In scientific information, then, we find that subjects – the themes and topics on which books and articles are written – cluster into fields, each of which can be analysed into its characteristic set of facets of terms." (Brian C Vickery, "Classification and indexing in science", 1958)

"In comparison with Predicate Calculus encoding is of factual knowledge, semantic nets seem more natural and understandable. This is due to the one-to-one correspondence between nodes and the concepts they denote, to the clustering about a particular node of propositions about a particular thing, and to the visual immediacy of 'interrelationships' between concepts, i.e., their connections via sequences of propositional links." (Lenhart K Schubert, "Extending the Expressive Power of Semantic Networks", Artificial Intelligence 7, 1976)

"Cyberspace. A consensual hallucination experienced daily by billions of legitimate operators, in every nation, by children being taught mathematical concepts. [...] A graphic representation of data abstracted from banks of every computer in the human system. Unthinkable complexity. Lines of light ranged in the nonspace of the mind, clusters and constellations of data." (William Gibson, "Neuromancer", 1984)

"While a small domain (consisting of fifty or fewer objects) can generally be analyzed as a unit, large domains must be partitioned to make the analysis a manageable task. To make such a partitioning, we take advantage of the fact that objects on an information model tend to fall into clusters: groups of objects that are interconnected with one another by many relationships. By contrast, relatively few relationships connect objects in different clusters." (Stephen J. Mellor, "Object-Oriented Systems Analysis: Modeling the World In Data", 1988) 

"Randomness is a difficult notion for people to accept. When events come in clusters and streaks, people look for explanations and patterns. They refuse to believe that such patterns - which frequently occur in random data - could equally well be derived from tossing a coin. So it is in the stock market as well." (Burton G Malkiel, "A Random Walk Down Wall Street", 1989)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good- enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"While classification is important, it can certainly be overdone. Making too fine a distinction between things can be as serious a problem as not being able to decide at all. Because we have limited storage capacity in our brain (we still haven't figured out how to add an extender card), it is important for us to be able to cluster similar items or things together. Not only is clustering useful from an efficiency standpoint, but the ability to group like things together (called chunking by artificial intelligence practitioners) is a very important reasoning tool. It is through clustering that we can think in terms of higher abstractions, solving broader problems by getting above all of the nitty-gritty details." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Random events often come like the raisins in a box of cereal - in groups, streaks, and clusters. And although Fortune is fair in potentialities, she is not fair in outcomes." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"Granular computing is a general computation theory for using granules such as subsets, classes, objects, clusters, and elements of a universe to build an efficient computational model for complex applications with huge amounts of data, information, and knowledge. Granulation of an object a leads to a collection of granules, with a granule being a clump of points (objects) drawn together by indiscernibility, similarity, proximity, or functionality. In human reasoning and concept formulation, the granules and the values of their attributes are fuzzy rather than crisp. In this perspective, fuzzy information granulation may be viewed as a mode of generalization, which can be applied to any concept, method, or theory." (Salvatore Greco et al, "Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach", 2009)

"With the ever increasing amount of empirical information that scientists from all disciplines are dealing with, there exists a great need for robust, scalable and easy to use clustering techniques for data abstraction, dimensionality reduction or visualization to cope with and manage this avalanche of data."  (Jörg Reichardt, "Structure in Complex Networks", 2009)

"Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data." (Gary Smith, "Standard Deviations", 2014)

"Cluster analysis refers to the grouping of observations so that the objects within each cluster share similar properties, and properties of all clusters are independent of each other. Cluster algorithms usually optimize by maximizing the distance among clusters and minimizing the distance between objects in a cluster. Cluster analysis does not complete in a single iteration but goes through several iterations until the model converges. Model convergence means that the cluster memberships of all objects converge and don’t change with every new iteration." (Danish Haroon, "Python Machine Learning Case Studies", 2017)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.