31 December 2006

Jay W Forrester - Collected Quotes

"[System dynamics] is an approach that should help in important top-management problems [...] The solutions to small problems yield small rewards. Very often the most important problems are but little more difficult to handle than the unimportant. Many [people] predetermine mediocre results by setting initial goals too low. The attitude must be one of enterprise design. The expectation should be for major improvement [...] The attitude that the goal is to explain behavior; which is fairly common in academic circles, is not sufficient. The goal should be to find management policies and organizational structures that lead to greater success." (Jay W Forrester, "Industrial Dynamics", 1961)

"In complex systems cause and effect are often not closely related in either time or space. The structure of a complex system is not a simple feedback loop where one system state dominates the behavior. The complex system has a multiplicity of interacting feedback loops. Its internal rates of flow are controlled by nonlinear relationships. The complex system is of high order, meaning that there are many system states (or levels). It usually contains positive-feedback loops describing growth processes as well as negative, goal-seeking loops. In the complex system the cause of a difficulty may lie far back in time from the symptoms, or in a completely different and remote part of the system. In fact, causes are usually found, not in prior events, but in the structure and policies of the system." (Jay W Forrester, "Urban dynamics", 1969)

"Like all systems, the complex system is an interlocking structure of feedback loops [...] This loop structure surrounds all decisions public or private, conscious or unconscious. The processes of man and nature, of psychology and physics, of medicine and engineering all fall within this structure [...]" (Jay W Forrester, "Urban Dynamics", 1969)

"The structure of a complex system is not a simple feedback loop where one system state dominates the behavior. The complex system has a multiplicity of interacting feedback loops. Its internal rates of flow are controlled by non-linear relationships. The complex system is of high order, meaning that there are many system states (or levels). It usually contains positive-feedback loops describing growth processes as well as negative, goal-seeking loops." (Jay F Forrester, "Urban Dynamics", 1969)

"To model the dynamic behavior of a system, four hierarchies of structure should be recognized: closed boundary around the system; feedback loops as the basic structural elements within the boundary; level variables representing accumulations within the feedback loops; rate variables representing activity within the feedback loops." (Jay W Forrester, "Urban Dynamics", 1969)

"First, social systems are inherently insensitive to most policy changes that people choose in an effort to alter the behavior of systems. In fact, social systems draw attention to the very points at which an attempt to intervene will fail. Human intuition develops from exposure to simple systems. In simple systems, the cause of a trouble is close in both time and space to symptoms of the trouble. If one touches a hot stove, the burn occurs here and now; the cause is obvious. However, in complex dynamic systems, causes are often far removed in both time and space from the symptoms. True causes may lie far back in time and arise from an entirely different part of the system from when and where the symptoms occur. However, the complex system can mislead in devious ways by presenting an apparent cause that meets the expectations derived from simple systems." (Jay W Forrester, "Counterintuitive Behavior of Social Systems", 1995)

"Second, social systems seem to have a few sensitive influence points through which behavior can be changed. These high-influence points are not where most people expect. Furthermore, when a high-influence policy is identified, the chances are great that a person guided by intuition and judgment will alter the system in the wrong direction." (Jay W Forrester, "Counterintuitive Behavior of Social Systems", 1995)

"System dynamics models are not derived statistically from time-series data. Instead, they are statements about system structure and the policies that guide decisions. Models contain the assumptions being made about a system. A model is only as good as the expertise which lies behind its formulation. A good computer model is distinguished from a poor one by the degree to which it captures the essence of a system that it represents. Many other kinds of mathematical models are limited because they will not accept the multiple-feedback-loop and nonlinear nature of real systems." (Jay W Forrester, "Counterintuitive Behavior of Social Systems", 1995)

"Third, social systems exhibit a conflict between short-term and long-term consequences of a policy change. A policy that produces improvement in the short run is usually one that degrades a system in the long run. Likewise, policies that produce long-run improvement may initially depress behavior of a system. This is especially treacherous. The short run is more visible and more compelling. Short-run pressures speak loudly for immediate attention. However, sequences of actions all aimed at short-run improvement can eventually burden a system with long-run depressants so severe that even heroic short-run measures no longer suffice. Many problems being faced today are the cumulative result of short-run measures taken in prior decades." (Jay W Forrester, "Counterintuitive Behavior of Social Systems", 1995)

"No plea about inadequacy of our understanding of the decision-making processes can excuse us from estimating decision making criteria. To omit a decision point is to deny its presence - a mistake of far greater magnitude than any errors in our best estimate of the process." (Jay W Forrester, "Perspectives on the modelling process", 2000)

27 December 2006

Programming: Data Structure (Just the Quotes)

"The programmer's primary weapon in the never-ending battle against slow system is to change the intramodular structure. Our first response should be to reorganize the modules' data structures." (Fred Brooks, "The Mythical Man-Month: Essays on Software Engineering", 1975)

"The representation of knowledge in symbolic form is a matter that has pre-occupied the world of documentation since its origin. The problem is now relevant in many situations other than documents and indexes. The structure of records and files in databases: data structures in computer programming; the syntactic and semantic structure of natural language; knowledge representation in artificial intelligence; models of human memory: in all these fields it is necessary to decide how knowledge may be represented so that the representations may be manipulated." (Brian C Vickery, "Concepts of documentation", 1978)

"Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures." (Rob Pike, "Notes on Programming in C" , 1989)

"Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." (Rob Pike, "Notes on Programming in C", 1989)

"If a programmer designs a program, only half the job is done if they have only designed the data structures. They also have to design the procedures for operating on the structures. (Specifically, a programmer designs abstract data types.) Without the appropriate procedures for operating on data structures, a computer would literally get lost in the structures, even supposing it could start executing anything sensible." (Yin L Theng et al," 'Lost in hhyperspace': Psychological problem or bad design?", 1996)

"Often you'll see the same three or four data items together in lots of places: fields in a couple of classes, parameters in many method signatures. Bunches of data that hang around together really ought to be made into their own object." (Kent Beck, "Refactoring: Improving the Design of Existing Code", 1999)

"Smart data structures and dumb code works a lot better than the other way around." (Eric S Raymond, "The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary", 2001)

"In fact, I'm a huge proponent of designing your code around the data, rather than the other way around, and I think it's one of the reasons git has been fairly successful. […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." (Linus Torvalds, [email] 2006)

"Computation at its root consists of a data structure (for input, output, and perhaps something being stored in between) and some process. One cannot talk about the process without describing the data structure. More importantly, different data structures enable certain computations to be done easily, whereas other data structures support other computations. Thus, the choice of data structure (representation) helps explain why a problem-solver does or does not successfully engage in a given process (cognition/behavior) or perhaps why a process takes as long or as short as it does." (Christian D Schunn et al, "Complex Visual Data Analysis, Uncertainty, and Representation", 2007)

"One of the essential parts of a formal training in programming is a long and demanding study of the large collection of algorithms that have already been discovered and analyzed, together with the Data Structures (carefully tailored, seemingly unnatural ways of organizing data for effective access) that go with them. As with any other engineering profession, it is impossible to do a good job without a thorough knowledge of what has been tried before. If a programmer starts the job fully armed with what is already known, they will have some chance of finding something new. Inventiveness is important: not all problems have been seen before. A programmer who does not already know the standard algorithms and data structures is doomed to nothing more than rediscovering the basics." (Robert Plant & Stephen Murrell, "An Executive’s Guide to Information Technology: Principles, Business Models, and Terminology", 2007)

"A modeling language is usually based on some kind of computational model, such as a state machine, data flow, or data structure. The choice of this model, or a combination of many, depends on the modeling target. Most of us make this choice implicitly without further thinking: some systems call for capturing dynamics and thus we apply for example state machines, whereas other systems may be better specified by focusing on their static structures using feature diagrams or component diagrams. For these reasons a variety of modeling languages are available." (Steven Kelly & Juha-Pekka Tolvanen, "Domain-specific Modeling", 2008)

"Clearly, the search for a dividing line between code and data is fruitless—and not particularly flattering to our egos. Let’s abandon any attempt to find a higher truth here, and settle for a pragmatic definition. If a piece of generated text simply instantiates and provides values for a data structure, it’s data; otherwise, it’s code." (Steven Kelly & Juha-Pekka Tolvanen, "Domain-specific Modeling", 2008)

"Generally, the craft of programming is the factoring of a set of requirements into a a set of functions and data structures." (Douglas Crockford, "JavaScript: The Good Parts", 2008)

"If the data structure can’t be explained on a beer coaster, it’s too complex." (Felix von Leitner, "Source Code Optimization", 2009)

23 December 2006

IT: Computing (Just the Quotes)

"Let it be remarked [...] that an important difference between the way in which we use the brain and the machine is that the machine is intended for many successive runs, either with no reference to each other, or with a minimal, limited reference, and that it can be cleared between such runs; while the brain, in the course of nature, never even approximately clears out its past records. Thus the brain, under normal circumstances, is not the complete analogue of the computing machine but rather the analogue of a single run on such a machine." (Norbert Wiener, "Cybernetics: Or Control and Communication in the Animal and the Machine", 1948)

"There are two types of systems engineering - basis and applied. [...] Systems engineering is, obviously, the engineering of a system. It usually, but not always, includes dynamic analysis, mathematical models, simulation, linear programming, data logging, computing, optimating, etc., etc. It connotes an optimum method, realized by modern engineering techniques. Basic systems engineering includes not only the control system but also all equipment within the system, including all host equipment for the control system. Applications engineering is - and always has been - all the engineering required to apply the hardware of a hardware manufacturer to the needs of the customer. Such applications engineering may include, and always has included where needed, dynamic analysis, mathematical models, simulation, linear programming, data logging, computing, and any technique needed to meet the end purpose - the fitting of an existing line of production hardware to a customer's needs. This is applied systems engineering." (Instruments and Control Systems Vol. 31, 1958)

"The mathematical and computing techniques for making programmed decisions replace man but they do not generally simulate him." (Herbert A Simon, "Management and Corporations 1985", 1960)

"There is the very real danger that a number of problems which could profitably be subjected to analysis, and so treated by simpler and more revealing techniques. will instead be routinely shunted to the computing machines [...] The role of computing machines as a mathematical tool is not that of a panacea for all computational ills." (Richard E Bellman & Paul Brock, "On the Concepts of a Problem and Problem-Solving", American Mathematical Monthly 67, 1960)

"The purpose of computing is insight, not numbers." (Richard W Hamming, "Numerical Methods for Scientists and Engineers", 1962)

"Another thing I must point out is that you cannot prove a vague theory wrong. If the guess that you make is poorly expressed and rather vague, and the method that you use for figuring out the consequences is a little vague - you are not sure, and you say, 'I think everything's right because it's all due to so and so, and such and such do this and that more or less, and I can sort of explain how this works' […] then you see that this theory is good, because it cannot be proved wrong! Also if the process of computing the consequences is indefinite, then with a little skill any experimental results can be made to look like the expected consequences." (Richard P Feynman, "The Character of Physical Law", 1965)

"Computational reducibility may well be the exception rather than the rule: Most physical questions may be answerable only through irreducible amounts of computation. Those that concern idealized limits of infinite time, volume, or numerical precision can require arbitrarily long computations, and so be formally undecidable." (Stephen Wolfram, Undecidability and intractability in theoretical physics", Physical Review Letters 54 (8), 1985)

"We distinguish diagrammatic from sentential paper-and-pencil representations of information by developing alternative models of information-processing systems that are informationally equivalent and that can be characterized as sentential or diagrammatic. Sentential representations are sequential, like the propositions in a text. Diagrammatic representations are indexed by location in a plane. Diagrammatic representations also typically display information that is only implicit in sentential representations and that therefore has to be computed, sometimes at great cost, to make it explicit for use. We then contrast the computational efficiency of these representations for solving several. illustrative problems in mathematics and physics." (Herbert A Simon, "Why a diagram is (sometimes) worth ten thousand words", 1987)

"Neural computing is the study of cellular networks that have a natural property for storing experimental knowledge. Such systems bear a resemblance to the brain in the sense that knowledge is acquired through training rather than programming and is retained due to changes in node functions. The knowledge takes the form of stable states or cycles of states in the operation of the net. A central property of such nets is to recall these states or cycles in response to the presentation of cues." (Igor Aleksander & Helen Morton, "Neural computing architectures: the design of brain-like machines", 1989)

"Beauty is more important in computing than anywhere else in technology because software is so complicated. Beauty is the ultimate defense against complexity." (David Gelernter, "Machine Beauty: Elegance And The Heart Of Technolog", 1998)

"As systems became more varied and more complex, we find that no single methodology suffices to deal with them. This is particularly true of what may be called information intelligent systems - systems which form the core of modern technology. To conceive, design, analyze and use such systems we frequently have to employ the totality of tools that are available. Among such tools are the techniques centered on fuzzy logic, neurocomputing, evolutionary computing, probabilistic computing and related methodologies. It is this conclusion that formed the genesis of the concept of soft computing." (Lotfi A Zadeh, "The Birth and Evolution of Fuzzy Logic: A personal perspective", 1999)

"In science, it is a long-standing tradition to deal with perceptions by converting them into measurements. But what is becoming increasingly evident is that, to a much greater extent than is generally recognized, conversion of perceptions into measurements is infeasible, unrealistic or counter-productive. With the vast computational power at our command, what is becoming feasible is a counter-traditional move from measurements to perceptions. […] To be able to compute with perceptions it is necessary to have a means of representing their meaning in a way that lends itself to computation." (Lotfi A Zadeh, "The Birth and Evolution of Fuzzy Logic: A personal perspective", 1999)

"Why was progress in computing technology so fast compared with the lack of progress in space travel? The reason is very simple: computing technology is only now approaching scientific limits such as quantum uncertainty and the speed of light, while space technology has already run into its limits that derive from the basic principles of physics and chemistry." (Mordechai Ben-Ari, "Just a Theory: Exploring the Nature of Science", 2005)

"Granular computing is a general computation theory for using granules such as subsets, classes, objects, clusters, and elements of a universe to build an efficient computational model for complex applications with huge amounts of data, information, and knowledge. Granulation of an object a leads to a collection of granules, with a granule being a clump of points (objects) drawn together by indiscernibility, similarity, proximity, or functionality. In human reasoning and concept formulation, the granules and the values of their attributes are fuzzy rather than crisp. In this perspective, fuzzy information granulation may be viewed as a mode of generalization, which can be applied to any concept, method, or theory." (Salvatore Greco et al, "Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach", 2009)

17 December 2006

Stephen J Mellor - Collected Quotes

"When partitioning a domain, we divide the information model so that the clusters remain intact. [...] Each section of the information model then becomes a separate subsystem. Note that when the information model is partitioned into subsystems, each object is assigned to exactly one subsystem." (Stephen J Mellor, "Object-Oriented Systems Analysis: Modeling the World In Data", 1988) 

"While a small domain (consisting of fifty or fewer objects) can generally be analyzed as a unit, large domains must be partitioned to make the analysis a manageable task. To make such a partitioning, we take advantage of the fact that objects on an information model tend to fall into clusters: groups of objects that are interconnected with one another by many relationships. By contrast, relatively few relationships connect objects in different clusters." (Stephen J Mellor, "Object-Oriented Systems Analysis: Modeling the World In Data", 1988)

"Executable UML is at the next higher layer of abstraction, abstracting away both specific programming languages and decisions about the organization of the software so that a specification built in Executable UML can be deployed in various software environments without change." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"Executable UML is designed to produce a comprehensive and comprehensible model of a solution without making decisions about the organization of the software implementation. It is a highly abstract thinking tool to aid in the formalization of knowledge, a way of thinking about and describing the concepts that make up an abstract solution to a client problem." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"In the bad old days before MDA, (conceptual) models served only to facilitate communication between customers and developers and act as blueprints for construction. Nowadays, MDA establishes the infrastructure for defining and executing transformations between models of various kinds." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"We build models to increase productivity, under the justified assumption that it's cheaper to manipulate the model than the real thing. Models then enable cheaper exploration and reasoning about some universe of discourse. One important application of models is to understand a real, abstract, or hypothetical problem domain that a computer system will reflect. This is done by abstraction, classification, and generalization of subject-matter entities into an appropriate set of classes and their behavior." (Stephen J Mellor, "Executable UML: A Foundation for Model-Driven Architecture", 2002)

"What's the point of having metamodels, and why should you care? Because models must be stated in a way that yields a common understanding among all involved parties, we need a way to specify exactly what a model means. Metamodels allow you to do just that: They specify the concepts of the language you're using to specify a model." (Stephen J Mellor, "MDA Distilled. Principles of Model-Driven Architecture", 2003)

08 December 2006

Fred C Scweppe - Collected Quotes

 "A bias can be considered a limiting case of a nonwhite disturbance as a constant is the most time-correlated process possible." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Changes of variables can be helpful for iterative and parametric solutions even if they do not linearize the problem. For example, a change of variables may change the 'shape' of J(x) into a more suitable form. Unfortunately there seems to be no· general way to choose the 'right' change of variables. Success depends on the particular problem and the engineer's insight. However, the possibility of a change of variables should always be considered."(Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Decision-making problems (hypothesis testing) involve situations where it is desired to make a choice among various alternative decisions (hypotheses). Such problems can be viewed as generalized state estimation problems where the definition of state has simply been expanded." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Hypothesis testing can introduce the need for multiple models for the multiple hypotheses and,' if appropriate, a priori probabilities. The one modeling aspect of hypothesis testing that has no estimation counterpart is the problem of specifying the hypotheses to be considered. Often this is a critical step which influences both performance arid the difficulty of implementation." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Modeling is definitely the most important and critical problem. If the mathematical model is not valid, any subsequent analysis, estimation, or control study is meaningless. The development of the model in a convenient form can greatly reduce the complexity of the actual studies." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"Pattern recognition can be viewed as a special case of hypothesis testing. In pattern recognition, an observation z is to be used to decide what pattern caused it. Each possible pattern can be viewed as one hypothesis. The main problem in pattern recognition is the development of models for the z corresponding to each pattern (hypothesis)." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"System theory is a tool which engineers use to help them design the 'best' system to do the job that must be done. A dominant characteristic of system theory is the interest in the analysis and design (synthesis) of systems from an input-output point of view. System theory uses mathematical manipulation of a mathematical model to help design the actual system." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"The biggest (and sometimes insurmountable) problem is usually to use the available data (information, measurements, etc.) to find out what the system is actually doing (i.e., to estimate its state). If the system's state can be estimated to some reasonable accuracy, the desired control is often obvious (or can be obtained by the use of deterministic control theory)." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"The choice of model is often the most critical aspect of a design and development engineering job, but it is impossible to give explicit rules or techniques." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"The power and beauty of stochastic approximation theory is that it provides simple, easy to implement gain sequences which guarantee convergence without depending (explicitly) on knowledge of the function to be minimized or the noise properties. Unfortunately, convergence is usually extremely slow. This is to be expected, as "good performance" cannot be expected if no (or very little) knowledge of the nature of the problem is built into the algorithm. In other words, the strength of stochastic approximation (simplicity, little a priori knowledge) is also its weakness." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"The pseudo approach to uncertainty modeling refers to the use of an uncertainty model instead of using a deterministic model which is actually (or at least theoretically) available. The uncertainty model may be desired because it results in a simpler analysis, because it is too difficult (expensive) to gather all the data necessary for an exact model, or because the exact model is too complex to be included in the computer." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"[A] system is represented by a mathematical model which may take many forms, such as algebraic equations, finite state machines, difference equations, ordinary differential equations, partial differential equations, and functional equations. The system model may be uncertain, as the mathematical model may not be known completely." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

"The term hypothesis testing arises because the choice as to which process is observed is based on hypothesized models. Thus hypothesis testing could also be called model testing. Hypothesis testing is sometimes called decision theory. The detection theory of communication theory is a special case." (Fred C Scweppe, "Uncertain dynamic systems", 1973)

03 December 2006

Harold Koontz - Collected Quotes

"[...] authority - the right by which superiors are able to require conformity of subordinates to decisions - is the basis for responsibility and the force that binds organization together. The process of organizing encompasses grouping of activities for purposes of management and specification of authority relationships between superiors and subordinates and horizontally between managers. Consequently, authority and responsibility relationships come into being in all associative undertakings where the superior-subordinate link exists. It is these relationships that create the basic character of the managerial job." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Although organization charts are useful, necessary, and often revealing tools, they are subject to many important limitations. In the first place, a chart shows only formal authority relationships and omits the many significant informal and informational relationships that exist in a living organization. Moreover, it does not picture how much authority exists at any point in the organization." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"[...] authority for given tasks is limited to that for which an individual may properly he. held responsible." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Authority delegations from a superior to a subordinate may be made in large or small degree. The tendency to delegate much authority through the echelons of an organization structure is referred tojas decentralization of authority. On the other hand, authority is said to be centralized wherever a manager tends not to delegate authority to his subordinates." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Authority is, of course, completely centralized when a manager delegates none, and it is possible to think of the reverse situation - an infinite delegation of authority in which no manager retains any authority other than the implicit power to recover delegated authority. But this kind of delegation is obviously impracticable, since, at some point in the organization structure, delegations must stop." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Essential to organization planning, then, is the search for an ideal form of organization to reflect the basic goals of the enterprise. This entails not only charting the main lines of organization and reflecting the organizational philosophy of the enterprise leaders (e.g., shall authority be as centralized as possible, or should the company try to break its operations down into semiautonomous product or territorial divisions?), but also a sketching out of authority relationships throughout the structure." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"If charts do not reflect actual organization and if the organization is intended to be as charted, it is the job of effective management to see that actual organization conforms with that desired. Organization charts cannot supplant good organizing, nor can a chart take the place of spelling out authority relationships clearly and completely, of outlining duties of managers and their subordinates, and of defining responsibilities." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"One of the tools for making organization principles work is the organization chart. Any organization which exists can be charted, for a chart is nothing more than an indication of how departments are tied together along their principal lines of authority." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Responsibility cannot be delegated. While a manager may delegate to a subordinate authority to accomplish a service and the subordinate in turn delegate a portion of the authority received, none of these superiors delegates any of his responsibility. Responsibility, being an obligation to perform, is owed to one's superior, and no subordinate reduces his responsibility by assigning the duty to another. Authority may be delegated, but responsibility is created by the subordinate's acceptance of his assignment." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Since a chart maps lines of authority, sometimes the mere charting of an organization will show inconsistencies and complexities and lead to their correction. A chart also acts as a guide for managers and new personnel in an organization, revealing how they tie into the entire structure. Charts are, therefore, not only evidences of organization planning but also road maps for decision making, and training devices for those who would learn how a company is organized." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"The essence of managership is the achievement of coordination among people. Coordination is a complex concept, including principles by which harmonious enterprise activity can be accomplished and the many techniques for achieving the greatest synchronized effort." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"The principle of direct contact! states that coordination must be achieved through interpersonal, horizontal relationships of people in an enterprise. People exchange ideas, ideals, prejudices, and purposes through direct personal communication much more efficiently than by any other method, and, with the understanding gained in this way, they find ways to achieve both common and personal goals." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"The primary purpose of delegation of authority is to make organization possible and to make it effective in accomplishing enterprise objectives and efficient in attaining them with the least cost of time and materials. Thus delegation - the vesting of a subordinate with a portion of his superior's authority - has as its principal purpose the creation of managerial jobs." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Viewed internally with respect to the enterprise, responsibility may be defined as the obligation of a subordinate, to whom a superior has assigned a duty, to perform the service required. The essence of responsibility is, then, obligation. It has no meaning except as it is applied to a person." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"While good charting will attempt, as far as possible, to make levels on the chart conform to levels of importance in the business enterprise, it cannot always do so. This problem can be handled by clearly spelling out authority relationships." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Another approach to management theory, undertaken by a growing and scholarly group, might be referred to as the decision theory school. This group concentrates on rational approach to decision-the selection from among possible alternatives of a course of action or of an idea. The approach of this school may be to deal with the decision itself, or to the persons or organizational group making the decision, or to an analysis of the decision process. Some limit themselves fairly much to the economic rationale of the decision, while others regard anything which happens in an enterprise the subject of their analysis, and still others expand decision theory to cover the psychological and sociological aspect and environment of decisions and decision-makers." (Harold Koontz, "The Management Theory Jungle," 1961)

"Every organization structure, even a poor one, can be charted, for a chart merely indicates how departments are tied together along the principal lines of authority. It is therefore somewhat surprising to find top managers occasionally taking pride in the fact that they do not have an organization chart or, if they do have one, feeling that the chart should be kept a secret." (Harold Koontz, "Principles of management", 1968)

"Management is defined here as the accomplishment of desired objectives by establishing an environment favorable to performance by people operating in organized groups." (Harold Koontz, "Principles of Management", 1968)

"Organization charts are subject to important limitations. A chart shows only formal authority relationships and omits the many significant informal and informational relationships." (Harold Koontz & Heinz Weihrich, "Essentials Of Management", 2006)

29 November 2006

Phil Simon - Collected Quotes

"Data science is an iterative process. It starts with a hypothesis (or several hypotheses) about the system we’re studying, and then we analyze the information. The results allow us to reject our initial hypotheses and refine our understanding of the data. When working with thousands of fields and millions of rows, it’s important to develop intuitive ways to reject bad hypotheses quickly." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"It’s a mistake to think of data and data visualizations as static terms. They are the very antitheses of stasis." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Just because data is visualized doesn’t necessarily mean that it is accurate, complete, or indicative of the right course of action. Exhibiting a healthy skepticism is almost always a good thing." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Metadata serves as a strong and increasingly important complement to both structured and unstructured data. Even if you can easily visualize and interpret primary source data, it behooves you to also collect, analyze, and visualize its metadata. Incorporating metadata may very well enhance your understanding of the source data." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"The term linked data describes the practice of exposing, sharing, and connecting pieces of data, information, and knowledge on the semantic Web. Both humans and machines benefit when previously unconnected data is connected." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"There are myriad questions that we can ask from data today. As such, it’s impossible to write enough reports or design a functioning dashboard that takes into account every conceivable contingency and answers every possible question." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"To be sure, data doesn’t always need to be visualized, and many data visualizations just plain suck. Look around you. It’s not hard to find truly awful representations of information. Some work in concept but fail because they are too busy; they confuse people more than they convey information [...]. Visualization for the sake of visualization is unlikely to produce desired results - and this goes double in an era of Big Data. Bad is still bad, even and especially at a larger scale." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"Visual Organizations benefit from routinely visualizing many different types and sources of data. Doing so allows them to garner a better understanding of what’s happening and why. Equipped with this knowledge, employees are able to ask better questions and make better business decisions." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"We acquire more information through our visual system than we do through all our other senses combined. We understand things better and quicker when we see them." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"We are all becoming more comfortable with data. Data visualization is no longer just something we have to do at work. Increasingly, we want to do it as consumers and as citizens. Put simply, visualizing helps us understand what’s going on in our lives - and how to solve problems." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

"While critical, the arrival of Big Data is far from the only data-related trend to take root over the past decade. The arrival of Big Data is one of the key factors explaining the rise of the Visual Organization." (Phil Simon, "The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions", 2014)

Stephen J Gould - Collected Quotes

"Facts do not ‘speak for themselves’; they are read in the light of theory. Creative thought, in science as much as in the arts, is the motor of changing opinion. Science is a quintessentially human activity, not a mechanized, robot-like accumulation of objective information, leading by laws of logic to inescapable interpretation." (Stephen J Gould, "Ever Since Darwin: Reflections in Natural History", 1977)

"Science, since people must do it, is a socially embedded activity. It progresses by hunch, vision, and intuition. Much of its change through time does not record a closer approach to absolute truth, but the alteration of cultural contexts that influence it so strongly. Facts are not pure and unsullied bits of information; culture also influences what we see and how we see it. Theories, moreover, are not inexorable inductions from facts. The most creative theories are often imaginative visions imposed upon facts; the source of imagination is also strongly cultural." (Stephen J Gould, "The Mismeasure of Man", 1980)

"Facts and theories are different things, not rungs in a hierarchy of increasing certainty. Facts are the world's data. Theories are structures of ideas that explain and interpret facts. Facts do not go away while scientists debate rival theories for explaining them." (Stephen J Gould "Evolution as Fact and Theory", 1981)

"Perhaps randomness is not merely an adequate description for complex causes that we cannot specify. Perhaps the world really works this way, and many events are uncaused in any conventional sense of the word." (Stephen J Gould, "Hen's Teeth and Horse's Toes", 1983)

"The progress of science requires more than new data; it needs novel frameworks and contexts. And where do these fundamentally new views of the world arise? They are not simply discovered by pure observation; they require new modes of thought. And where can we find them, if old modes do not even include the right metaphors? The nature of true genius must lie in the elusive capacity to construct these new modes from apparent darkness. The basic chanciness and unpredictability of science must also reside in the inherent difficulty of such a task." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 1985)

"We often think, naïvely, that missing data are the primary impediments to intellectual progress - just find the right facts and all problems will dissipate. But barriers are often deeper and more abstract in thought. We must have access to the right metaphor, not only to the requisite information. Revolutionary thinkers are not, primarily, gatherers of facts, but weavers of new intellectual structures." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 1985)

"Numbers have undoubted powers to beguile and benumb, but critics must probe behind numbers to the character of arguments and the biases that motivate them." (Stephen J Gould, "An Urchin in the Storm: Essays About Books and Ideas", 1987)

"But our ways of learning about the world are strongly influenced by the social preconceptions and biased modes of thinking that each scientist must apply to any problem. The stereotype of a fully rational and objective ‘scientific method’, with individual scientists as logical (and interchangeable) robots, is self-serving mythology." (Stephen J Gould, "This View of Life: In the Mind of the Beholder", "Natural History", Vol. 103, No. 2, 1994)

"Misunderstanding of probability may be the greatest of all impediments to scientific literacy." (Stephen J Gould, "Dinosaur in a  Haystack: Reflections in natural  history", 1995)

"Theories rarely arise as patient inferences forced by accumulated facts. Theories are mental constructs potentiated by complex external prods (including, in idealized cases, a commanding push from empirical reality)." (Stephen J Gould, "Leonardo's Mountain of Clams and the Diet of Worms", 1998) 

"The human mind delights in finding pattern - so much so that we often mistake coincidence or forced analogy for profound meaning. No other habit of thought lies so deeply within the soul of a small creature trying to make sense of a complex world not constructed for it." (Stephen J Gould, "The Flamingo's Smile: Reflections in Natural History", 2010)

28 November 2006

Nassim N Taleb - Collected Quotes

"A mistake is not something to be determined after the fact, but in the light of the information until that point." (Nassim N Taleb, "Fooled by Randomness", 2001)

"Probability is not about the odds, but about the belief in the existence of an alternative outcome, cause, or motive." (Nassim N Taleb, "Fooled by Randomness", 2001)

"A Black Swan is a highly improbable event with three principal characteristics: It is unpredictable; it carries a massive impact; and, after the fact, we concoct an explanation that makes it appear less random, and more predictable, than it was. […] The Black Swan idea is based on the structure of randomness in empirical reality. [...] the Black Swan is what we leave out of simplification." (Nassim N Taleb, "The Black Swan" , 2007)

"Prediction, not narration, is the real test of our understanding of the world." (Nassim N Taleb, "The Black Swan", 2007)

"The inability to predict outliers implies the inability to predict the course of history.” (Nassim N Taleb, “The Black Swan”, 2007)

"While in theory randomness is an intrinsic property, in practice, randomness is incomplete information." (Nassim N Taleb, "The Black Swan", 2007)

"The higher the dimension, in other words, the higher the number of possible interactions, and the more disproportionally difficult it is to understand the macro from the micro, the general from the simple units. This disproportionate increase of computational demands is called the curse of dimensionality." (Nassim N Taleb, "Skin in the Game: Hidden Asymmetries in Daily Life", 2018)

26 November 2006

Margaret Y Chu - Collected Quotes

"An organization needs to know the condition and quality of its data to be more effective in fixing them and making them blissful. Unfortunately, pride, shame, and a fear of looking incompetent all play a part when people are asked to openly discuss dirty data issues. Because data are an asset, some people are unwilling to share their data. They think this gives them control and power over others. The role of politics in the organization is the dirty secret of dirty data." (Margaret Y Chu, "Blissful Data", 2004)

"Blissful data consist of information that is accurate, meaningful, useful, and easily accessible to many people in an organization. These data are used by the organization’s employees to analyze information and support their decision-making processes to strategic action. It is easy to see that organizations that have reached their goal of maximum productivity with blissful data can triumph over their competition. Thus, blissful data provide a competitive advantage." (Margaret Y Chu, "Blissful Data", 2004)

"Business rules should be simple and owned and defined by the business; they are declarative, indivisible, expressed in clear, concise language, and business oriented." (Margaret Y Chu, "Blissful Data", 2004)

"Clear goals, multiple strategies, clear roles and responsibilities, boldness, teamwork, speed, flexibility, the ability to change, managing risk, and seizing opportunities when they arise are important characteristics in gaining objectives." (Margaret Y Chu, "Blissful Data", 2004)

"[…] dirt and stains are more noticeable on white or light-colored clothing. In the same way, dirty data and data quality issues have existed for a long time. But due to the inherent nature of operational data these issues have not been as visible or immense enough to affect the bottom line. Just as dark clothing hides spills and stains, dirty data have been hidden or ignored in operational data for decades." (Margaret Y Chu, "Blissful Data", 2004)

"Gauging the quality of the operational data becomes an important first step in predicting potential dirty data issues for an organization. But many organizations are reluctant to commit the time and expense to assess their data. Some organizations wait until dirty data issues blow up in their faces. The greater the pain being experienced, the bigger the commitment to improving data quality." (Margaret Y Chu, "Blissful Data", 2004)

"[...] incomplete, inaccurate, and invalid data can cause problems for an organization. These problems are not only embarrassing and awkward but will also cause the organization to lose customers, new opportunities, and market share." (Margaret Y Chu, "Blissful Data", 2004)

"Let’s define dirty data as: ‘… data that are incomplete, invalid, or inaccurate’. In other words, dirty data are simply data that are wrong. […] Incomplete or inaccurate data can result in bad decisions being made. Thus, dirty data are the opposite of blissful data. Problems caused by dirty data are significant; be wary of their pitfalls."  (Margaret Y Chu, "Blissful Data", 2004)

"Organizations must know and understand the current organizational culture to be successful at implementing change. We know that it is the organization’s culture that drives its people to action; therefore, management must understand what motivates their people to attain goals and objectives. Only by understanding the current organizational culture will it be possible to begin to try and change it." (Margaret Y Chu, "Blissful Data", 2004)

"Processes must be implemented to prevent bad data from entering the system as well as propagating to other systems. That is, dirty data must be intercepted at its source. The operational systems are often the source of informational data; thus dirty data must be fixed at the operational data level. Implementing the right processes to cleanse data is, however, not easy." (Margaret Y Chu, "Blissful Data", 2004)

"So business rules are just like house rules. They are policies of an organization and contain one or more assertions that define or constrain some aspect of the business. Their purpose is to provide a structure and guideline to control or influence the behavior of the organization. Further, business rules represent the business and guide the decisions that are made by the people in the organization." (Margaret Y Chu, "Blissful Data", 2004)

"Vision and mission statements are important, but they are not an organization’s culture; they are its goals. A vision is the ideal they are striving to achieve. There may be a huge gap between the ideal and the current state of actions and behaviors."(Margaret Y Chu, "Blissful Data", 2004)

"What management notices and rewards is the best indication of the organization’s culture." (Margaret Y Chu, "Blissful Data", 2004)

25 November 2006

Darell Huff - Collected Quotes

"Another thing to watch out for is a conclusion in which a correlation has been inferred to continue beyond the data with which it has been demonstrated." (Darell Huff, "How to Lie with Statistics", 1954)

"Extrapolations are useful, particularly in the form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend to now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is 'everything else being equal' and 'present trends continuing'. And somehow everything else refuses to remain equal." (Darell Huff, "How to Lie with Statistics", 1954)

"If you can't prove what you want to prove, demonstrate something else and pretend that they are the something. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference." (Darell Huff, "How to Lie with Statistics", 1954)

"Keep in mind that a correlation may be real and based on real cause and effect -and still be almost worthless in determining action in any single case." (Darell Huff, "How to Lie with Statistics", 1954) 

"Only when there is a substantial number of trials involved is the law of averages a useful description or prediction." (Darell Huff, "How to Lie with Statistics", 1954)

"Percentages offer a fertile field for confusion. And like the ever-impressive decimal they can lend an aura of precision to the inexact. […] Any percentage figure based on a small number of cases is likely to be misleading. It is more informative to give the figure itself. And when the percentage is carried out to decimal places You begin to run the scale from the silly to the fraudulent." (Darell Huff, "How to Lie with Statistics", 1954)

"Place little faith in an average or a graph or a trend when those important figures are missing."  (Darell Huff, "How to Lie with Statistics", 1954)

"Sometimes the big ado is made about a difference that is mathematically real and demonstrable but so tiny as to have no importance. This is in defiance of the fine old saying that a difference is a difference only if it makes a difference." (Darell Huff, "How to Lie with Statistics", 1954)

"The fact is that, despite its mathematical base, statistics is as much an art as it is a science. A great many manipulations and even distortions are possible within the bounds of propriety. Often the statistician must choose among methods, a subjective process, and find the one that he will use to represent the facts." (Darell Huff, "How to Lie with Statistics", 1954)

"The purely random sample is the only kind that can be examined with entire confidence by means of statistical theory, but there is one thing wrong with it. It is so difficult and expensive to obtain for many uses that sheer cost eliminates it." (Darell Huff, "How to Lie with Statistics", 1954)

"The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical methods and statistical terms are necessary in reporting the mass data of social and economic trends, business conditions, 'opinion' polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense." (Darell Huff, "How to Lie with Statistics", 1954)

"There are often many ways of expressing any figure. […] The method is to choose the one that sounds best for the purpose at hand and trust that few who read it will recognize how imperfectly it reflects the situation." (Darell Huff, "How to Lie with Statistics", 1954)

"To be worth much, a report based on sampling must use a representative sample, which is one from which every source of bias has been removed." (Darell Huff, "How to Lie with Statistics", 1954)

"When numbers in tabular form are taboo and words will not do the work well as is often the case. There is one answer left: Draw a picture. About the simplest kind of statistical picture or graph, is the line variety. It is very useful for showing trends, something practically everybody is interested in showing or knowing about or spotting or deploring or forecasting." (Darell Huff, "How to Lie with Statistics", 1954)

"When you are told that something is an average you still don't know very much about it unless you can find out which of the common kinds of average it is-mean, median, or mode. [...] The different averages come out close together when you deal with data, such as those having to do with many human characteristics, that have the grace to fall close to what is called the normal distribution. If you draw a curve to represent it you get something shaped like a bell, and mean, median, and mode fall at the same point." (Darell Huff, "How to Lie with Statistics", 1954)

"When you find somebody - usually an interested party - making a fuss about a correlation, look first of all to see if it is not one of this type, produced by the stream of events, the trend of the times." (Darell Huff, "How to Lie with Statistics", 1954)

John D Kelleher - Collected Quotes

"A predictive model overfits the training set when at least some of the predictions it returns are based on spurious patterns present in the training data used to induce the model. Overfitting happens for a number of reasons, including sampling variance and noise in the training set. The problem of overfitting can affect any machine learning algorithm; however, the fact that decision tree induction algorithms work by recursively splitting the training data means that they have a natural tendency to segregate noisy instances and to create leaf nodes around these instances. Consequently, decision trees overfit by splitting the data on irrelevant features that only appear relevant due to noise or sampling variance in the training data. The likelihood of overfitting occurring increases as a tree gets deeper because the resulting predictions are based on smaller and smaller subsets as the dataset is partitioned after each feature test in the path." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"Decision trees are also discriminative models. Decision trees are induced by recursively partitioning the feature space into regions belonging to the different classes, and consequently they define a decision boundary by aggregating the neighboring regions belonging to the same class. Decision tree model ensembles based on bagging and boosting are also discriminative models." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"Decision trees are also considered nonparametric models. The reason for this is that when we train a decision tree from data, we do not assume a fixed set of parameters prior to training that define the tree. Instead, the tree branching and the depth of the tree are related to the complexity of the dataset it is trained on. If new instances were added to the dataset and we rebuilt the tree, it is likely that we would end up with a (potentially very) different tree." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"It is important to remember that predictive data analytics models built using machine learning techniques are tools that we can use to help make better decisions within an organization and are not an end in themselves. It is paramount that, when tasked with creating a predictive model, we fully understand the business problem that this model is being constructed to address and ensure that it does address it." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"There are two kinds of mistakes that an inappropriate inductive bias can lead to: underfitting and overfitting. Underfitting occurs when the prediction model selected by the algorithm is too simplistic to represent the underlying relationship in the dataset between the descriptive features and the target feature. Overfitting, by contrast, occurs when the prediction model selected by the algorithm is so complex that the model fits to the dataset too closely and becomes sensitive to noise in the data."(John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"The main advantage of decision tree models is that they are interpretable. It is relatively easy to understand the sequences of tests a decision tree carried out in order to make a prediction. This interpretability is very important in some domains. [...] Decision tree models can be used for datasets that contain both categorical and continuous descriptive features. A real advantage of the decision tree approach is that it has the ability to model the interactions between descriptive features. This arises from the fact that the tests carried out at each node in the tree are performed in the context of the results of the tests on the other descriptive features that were tested at the preceding nodes on the path from the root. Consequently, if there is an interaction effect between two or more descriptive features, a decision tree can model this."  (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"Tree pruning identifies and removes subtrees within a decision tree that are likely to be due to noise and sample variance in the training set used to induce it. In cases where a subtree is deemed to be overfitting, pruning the subtree means replacing the subtree with a leaf node that makes a prediction based on the majority target feature level (or average target feature value) of the dataset created by merging the instances from all the leaf nodes in the subtree. Obviously, pruning will result in decision trees being created that are not consistent with the training set used to build them. In general, however, we are more interested in creating prediction models that generalize well to new data rather than that are strictly consistent with training data, so it is common to sacrifice consistency for generalization capacity." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"When datasets are small, a parametric model may perform well because the strong assumptions made by the model - if correct - can help the model to avoid overfitting. However, as the size of the dataset grows, particularly if the decision boundary between the classes is very complex, it may make more sense to allow the data to inform the predictions more directly. Obviously the computational costs associated with nonparametric models and large datasets cannot be ignored. However, support vector machines are an example of a nonparametric model that, to a large extent, avoids this problem. As such, support vector machines are often a good choice in complex domains with lots of data." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies", 2015)

"When we find data quality issues due to valid data during data exploration, we should note these issues in a data quality plan for potential handling later in the project. The most common issues in this regard are missing values and outliers, which are both examples of noise in the data." (John D Kelleher et al, "Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, worked examples, and case studies", 2015)

"A neural network consists of a set of neurons that are connected together. A neuron takes a set of numeric values as input and maps them to a single output value. At its core, a neuron is simply a multi-input linear-regression function. The only significant difference between the two is that in a neuron the output of the multi-input linear-regression function is passed through another function that is called the activation function." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Data scientists should have some domain expertise. Most data science projects begin with a real-world, domain-specific problem and the need to design a data-driven solution to this problem. As a result, it is important for a data scientist to have enough domain expertise that they understand the problem, why it is important, an dhow a data science solution to the problem might fit into an organization’s processes. This domain expertise guides the data scientist as she works toward identifying an optimized solution." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"One of the biggest myths is the belief that data science is an autonomous process that we can let loose on our data to find the answers to our problems. In reality, data science requires skilled human oversight throughout the different stages of the process. [...] The second big myth of data science is that every data science project needs big data and needs to use deep learning. In general, having more data helps, but having the right data is the more important requirement. [...] A third data science myth is that modern data science software is easy to use, and so data science is easy to do. [...] The last myth about data science [...] is the belief that data science pays for itself quickly. The truth of this belief depends on the context of the organization. Adopting data science can require significant investment in terms of developing data infrastructure and hiring staff with data science expertise. Furthermore, data science will not give positive results on every project." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"One of the most important skills for a data scientist is the ability to frame a real-world problem as a standard data science task." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Presenting data in a graphical format makes it much easier to see and understand what is happening with the data. Data visualization applies to all phases of the data science process."  (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The goal of data science is to improve decision making by basing decisions on insights extracted from large data sets. As a field of activity, data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets. It is closely related to the fields of data mining and machine learning, but it is broader in scope." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The patterns that we extract using data science are useful only if they give us insight into the problem that enables us to do something to help solve the problem." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"The promise of data science is that it provides a way to understand the world through data." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"Using data science, we can uncover the important patterns in a data set, and these patterns can reveal the important attributes in the domain. The reason why data science is used in so many domains is that it doesn’t matter what the problem domain is: if the right data are available and the problem can be clearly defined, then data science can help."  (John D Kelleher & Brendan Tierney, "Data Science", 2018)

"We humans are reasonably good at defining rules that check one, two, or even three attributes (also commonly referred to as features or variables), but when we go higher than three attributes, we can start to struggle to handle the interactions between them. By contrast, data science is often applied in contexts where we want to look for patterns among tens, hundreds, thousands, and, in extreme cases, millions of attributes." (John D Kelleher & Brendan Tierney, "Data Science", 2018)

Daniel J Levitin - Collected Quotes

"A well-designed graph clearly shows you the relevant end points of a continuum. This is especially important if you’re documenting some actual or projected change in a quantity, and you want your readers to draw the right conclusions. […]" (Daniel J Levitin, "Weaponized Lies", 2017)

"Collecting data through sampling therefore becomes a never-ending battle to avoid sources of bias. [...] While trying to obtain a random sample, researchers sometimes make errors in judgment about whether every person or thing is equally likely to be sampled." (Daniel J Levitin, "Weaponized Lies", 2017)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin, "Weaponized Lies", 2017)

"How do you know when a correlation indicates causation? One way is to conduct a controlled experiment. Another is to apply logic. But be careful - it’s easy to get bogged down in semantics." (Daniel J Levitin, "Weaponized Lies", 2017)

"In statistics, the word 'significant' means that the results passed mathematical tests such as t-tests, chi-square tests, regression, and principal components analysis (there are hundreds). Statistical significance tests quantify how easily pure chance can explain the results. With a very large number of observations, even small differences that are trivial in magnitude can be beyond what our models of change and randomness can explain. These tests don’t know what’s noteworthy and what’s not - that’s a human judgment." (Daniel J Levitin, "Weaponized Lies", 2017)

"Infographics are often used by lying weasels to shape public opinion, and they rely on the fact that most people won’t study what they’ve done too carefully." (Daniel J Levitin, "Weaponized Lies", 2017)

"Just because there’s a number on it, it doesn’t mean that the number was arrived at properly. […] There are a host of errors and biases that can enter into the collection process, and these can lead millions of people to draw the wrong conclusions. Although most of us won’t ever participate in the collection process, thinking about it, critically, is easy to learn and within the reach of all of us." (Daniel J Levitin, "Weaponized Lies", 2017)

"Many of us feel intimidated by numbers and so we blindly accept the numbers we’re handed. This can lead to bad decisions and faulty conclusions. We also have a tendency to apply critical thinking only to things we disagree with. In the current information age, pseudo-facts masquerade as facts, misinformation can be indistinguishable from true information, and numbers are often at the heart of any important claim or decision. Bad statistics are everywhere." (Daniel J Levitin, "Weaponized Lies", 2017)

"Measurements must be standardized. There must be clear, replicable, and precise procedures for collecting data so that each person who collects it does it in the same way." (Daniel J Levitin, "Weaponized Lies", 2017)

"Most of us have difficulty figuring probabilities and statistics in our heads and detecting subtle patterns in complex tables of numbers. We prefer vivid pictures, images, and stories. When making decisions, we tend to overweight such images and stories, compared to statistical information. We also tend to misunderstand or misinterpret graphics." (Daniel J Levitin, "Weaponized Lies", 2017)

"One kind of probability - classic probability - is based on the idea of symmetry and equal likelihood […] In the classic case, we know the parameters of the system and thus can calculate the probabilities for the events each system will generate. […] A second kind of probability arises because in daily life we often want to know something about the likelihood of other events occurring […]. In this second case, we need to estimate the parameters of the system because we don’t know what those parameters are. […] A third kind of probability differs from these first two because it’s not obtained from an experiment or a replicable event - rather, it expresses an opinion or degree of belief about how likely a particular event is to occur. This is called subjective probability […]." (Daniel J Levitin, "Weaponized Lies", 2017)

"One way to lie with statistics is to compare things - datasets, populations, types of products - that are different from one another, and pretend that they’re not. As the old idiom says, you can’t compare apples with oranges." (Daniel J Levitin, "Weaponized Lies", 2017)

"Probabilities allow us to quantify future events and are an important aid to rational decision making. Without them, we can become seduced by anecdotes and stories." (Daniel J Levitin, "Weaponized Lies", 2017)

"Samples give us estimates of something, and they will almost always deviate from the true number by some amount, large or small, and that is the margin of error. […] The margin of error does not address underlying flaws in the research, only the degree of error in the sampling procedure. But ignoring those deeper possible flaws for the moment, there is another measurement or statistic that accompanies any rigorously defined sample: the confidence interval." (Daniel J Levitin, "Weaponized Lies", 2017)

"Statistics, because they are numbers, appear to us to be cold, hard facts. It seems that they represent facts given to us by nature and it’s just a matter of finding them. But it’s important to remember that people gather statistics. People choose what to count, how to go about counting, which of the resulting numbers they will share with us, and which words they will use to describe and interpret those numbers. Statistics are not facts. They are interpretations. And your interpretation may be just as good as, or better than, that of the person reporting them to you." (Daniel J Levitin, "Weaponized Lies", 2017)

"The margin of error is how accurate the results are, and the confidence interval is how confident you are that your estimate falls within the margin of error." (Daniel J Levitin, "Weaponized Lies", 2017)

"The most accurate but least interpretable form of data presentation is to make a table, showing every single value. But it is difficult or impossible for most people to detect patterns and trends in such data, and so we rely on graphs and charts. Graphs come in two broad types: Either they represent every data point visually (as in a scatter plot) or they implement a form of data reduction in which we summarize the data, looking, for example, only at means or medians." (Daniel J Levitin, "Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"We are a storytelling species, and a social species, easily swayed by the opinions of others. We have three ways to acquire information: We can discover it ourselves, we can absorb it implicitly, or we can be told it explicitly. Much of what we know about the world falls in this last category - somewhere along the line, someone told us a fact or we read about it, and so we know it only second-hand. We rely on people with expertise to tell us." (Daniel J Levitin, "Weaponized Lies", 2017)

"We use the word probability in different ways to mean different things. It’s easy to get swept away thinking that a person means one thing when they mean another, and that confusion can cause us to draw the wrong conclusion." (Daniel J Levitin, "Weaponized Lies", 2017) 

David S Salsburg - Collected Quotes

"A good estimator has to be more than just consistent. It also should be one whose variance is less than that of any other estimator. This property is called minimum variance. This means that if we run the experiment several times, the 'answers' we get will be closer to one another than 'answers' based on some other estimator." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"All methods of dealing with big data require a vast number of mind-numbing, tedious, boring mathematical steps." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"An estimate (the mathematical definition) is a number derived from observed values that is as close as we can get to the true parameter value. Useful estimators are those that are 'better' in some sense than any others." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Correlation is not equivalent to cause for one major reason. Correlation is well defined in terms of a mathematical formula. Cause is not well defined." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Estimators are functions of the observed values that can be used to estimate specific parameters. Good estimators are those that are consistent and have minimum variance. These properties are guaranteed if the estimator maximizes the likelihood of the observations." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"One final warning about the use of statistical models (whether linear or otherwise): The estimated model describes the structure of the data that have been observed. It is unwise to extend this model very far beyond the observed data." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The central limit conjecture states that most errors are the result of many small errors and, as such, have a normal distribution. The assumption of a normal distribution for error has many advantages and has often been made in applications of statistical models." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The degree to which one variable can be predicted from another can be calculated as the correlation between them. The square of the correlation (R^2) is the proportion of the variance of one that can be 'explained' by knowledge of the other." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"The lack of variability is often a hallmark of faked data. […] The failure of faked data to have sufficient variability holds as long as the liar does not know this. If the liar knows this, his best approach is to start with real data and use it cleverly to adapt it to his needs." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"There are other problems with Big Data. In any large data set, there are bound to be inconsistencies, misclassifications, missing data - in other words, errors, blunders, and possibly lies. These problems with individual items occur in any data set, but they are often hidden in a large mass of numbers even when these numbers are generated out of computer interactions." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"There is a constant battle between the cold abstract absolutes of pure mathematics and, the sometimes sloppy way in which mathematical methods are applied in science." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Two clouds of uncertainty may have the same center, but one may be much more dispersed than the other. We need a way of looking at the scatter about the center. We need a measure of the scatter. One such measure is the variance. We take each of the possible values of error and calculate the squared difference between that value and the center of the distribution. The mean of those squared differences is the variance." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"What properties should a good statistical estimator have? Since we are dealing with probability, we start with the probability that our estimate will be very close to the true value of the parameter. We want that probability to become greater and greater as we get more and more data. This property is called consistency. This is a statement about probability. It does not say that we are sure to get the right answer. It says that it is highly probable that we will be close to the right answer." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"When we use algebraic notation in statistical models, the problem becomes more complicated because we cannot 'observe' a probability and know its exact number. We can only estimate probabilities on the basis of observations." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

24 November 2006

Joel Best - Collected Quotes

"All human knowledge - including statistics - is created  through people's actions; everything we know is shaped by our language, culture, and society. Sociologists call this the social construction of knowledge. Saying that knowledge is socially constructed does not mean that all we know is somehow fanciful, arbitrary, flawed, or wrong. For example, scientific knowledge can be remarkably accurate, so accurate that we may forget the people and social processes that produced it." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Any statistic based on more than a guess requires some sort of counting. Definitions specify what will be counted. Measuring involves deciding how to go about counting. We cannot begin counting until we decide how we will identify and count instances of a social problem. [...] Measurement involves choices. [...] Often, measurement decisions are hidden." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Big numbers warn us that the problem is a common one, compelling our attention, concern, and action. The media like to report statistics because numbers seem to be 'hard facts' - little nuggets of indisputable truth. [...] One common innumerate error involves not distinguishing among large numbers. [...] Because many people have trouble appreciating the differences among big numbers, they tend to uncritically accept social statistics (which often, of course, feature big numbers)." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"But people treat mutant statistics just as they do other statistics - that is, they usually accept even the most implausible claims without question. [...] And people repeat bad statistics [...] bad statistics live on; they take on lives of their own. [...] Statistics, then, have a bad reputation. We suspect that statistics may be wrong, that people who use statistics may be 'lying' - trying to manipulate us by using numbers to somehow distort the truth. Yet, at the same time, we need statistics; we depend upon them to summarize and clarify the nature of our complex society. This is particularly true when we talk about social problems." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Changing measures are a particularly common problem with comparisons over time, but measures also can cause problems of their own. [...] We cannot talk about change without making comparisons over time. We cannot avoid such comparisons, nor should we want to. However, there are several basic problems that can affect statistics about change. It is important to consider the problems posed by changing - and sometimes unchanging - measures, and it is also important to recognize the limits of predictions. Claims about change deserve critical inspection; we need to ask ourselves whether apples are being compared to apples - or to very different objects." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Clear, precise definitions are not enough. Whatever is defined must also be measured, and meaningless measurements will produce meaningless statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Compound errors can begin with any of the standard sorts of bad statistics - a guess, a poor sample, an inadvertent transformation, perhaps confusion over the meaning of a complex statistic. People inevitably want to put statistics to use, to explore a number's implications. [...] The strengths and weaknesses of those original numbers should affect our confidence in the second-generation statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"First, good statistics are based on more than guessing. [...] Second, good statistics are based on clear, reasonable definitions. Remember, every statistic has to define its subject. Those definitions ought to be clear and made public. [...] Third, good statistics are based on clear, reasonable measures. Again, every statistic involves some sort of measurement; while all measures are imperfect, not all flaws are equally serious. [...] Finally, good statistics are based on good samples." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"In order to interpret statistics, we need more than a checklist of common errors. We need a general approach, an orientation, a mind-set that we can use to think about new statistics that we encounter. We ought to approach statistics thoughtfully. This can be hard to do, precisely because so many people in our society treat statistics as fetishes." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Innumeracy - widespread confusion about basic mathematical ideas - means that many statistical claims about social problems don't get the critical attention they deserve. This is not simply because an innumerate public is being manipulated by advocates who cynically promote inaccurate statistics. Often, statistics about social problems originate with sincere, well-meaning people who are themselves innumerate; they may not grasp the full implications of what they are saying. Similarly, the media are not immune to innumeracy; reporters commonly repeat the figures their sources give them without bothering to think critically about them." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Knowledge is factual when evidence supports it and we have great confidence in its accuracy. What we call 'hard fact' is information supported by  strong, convincing evidence; this means evidence that, so far as we know, we cannot deny, however we examine or test it. Facts always can be questioned, but they hold up under questioning. How did people come by this information? How did they interpret it? Are other interpretations possible? The more satisfactory the answers to such questions, the 'harder' the facts." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Like definitions, measurements always involve choices. Advocates of different measures can defend their own choices and criticize those made by their opponents - so long as the various choices being made are known and understood. However, when measurement choices are kept hidden, it becomes difficult to assess the statistics based on those choices." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"No definition of a social problem is perfect, but there are two principal ways such definitions can be flawed. On the one hand, we may worry that a definition is too broad, that it encompasses more than it ought to include. That is, broad definitions identify some cases as part of the problem that we might think ought not to be included; statisticians call such cases false positives (that is, they mistakenly identify cases as part of the problem). On the other hand, a definition that is too narrow excludes cases that we might think ought to be included; these are false negatives (incorrectly identified as not being part of the problem)." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Not all statistics start out bad, but any statistic can be made worse. Numbers - even good numbers - can be misunderstood or misinterpreted. Their meanings can be stretched, twisted, distorted, or mangled. These alterations create what we can call mutant statistics - distorted versions of the original figures." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"One reason we tend to accept statistics uncritically is that we assume that numbers come from experts who know what they're doing. [...] There is a natural tendency to treat these figures as straightforward facts that cannot be questioned." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"People who create or repeat a statistic often feel they have a stake in defending the number. When someone disputes an estimate and offers a very different (often lower) figure, people may rush to defend the original estimate and attack the new number and anyone who dares to use it. [...] any estimate can be defended by challenging the motives of anyone who disputes the figure." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Statistics are not magical. Nor are they always true - or always false. Nor need they be incomprehensible. Adopting a Critical approach offers an effective way of responding to the numbers we are sure to encounter. Being Critical requires more thought, but failing to adopt a Critical mind-set makes us powerless to evaluate what others tell us. When we fail to think critically, the statistics we hear might just as well be magical." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Statisticians can calculate the probability that such random samples represent the population; this is usually expressed in terms of sampling error [...]. The real problem is that few samples are random. Even when researchers know the nature of the population, it can be time-consuming and expensive to draw a random sample; all too often, it is impossible to draw a true random sample because the population cannot be defined. This is particularly true for studies of social problems. [...] The best samples are those that come as close as possible to being random.(Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"The ease with which somewhat complex statistics can produce confusion is important, because we live in a world in which complex numbers are becoming more common. Simple statistical ideas - fractions, percentages, rates - are reasonably well understood by many people. But many social problems involve complex chains of cause and effect that can be understood only through complicated models developed by experts. [...] environment has an influence. Sorting out the interconnected causes of these problems requires relatively complicated statistical ideas - net additions, odds ratios, and the like. If we have an imperfect understanding of these ideas, and if the reporters and other people who relay the statistics to us share our confusion - and they probably do - the chances are good that we'll soon be hearing - and repeating, and perhaps making decisions on the basis of - mutated statistics." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"There are two problems with sampling - one obvious, and  the other more subtle. The obvious problem is sample size. Samples tend to be much smaller than their populations. [...] Obviously, it is possible to question results based on small samples. The smaller the sample, the less confidence we have that the sample accurately reflects the population. However, large samples aren't necessarily good samples. This leads to the second issue: the representativeness of a sample is actually far more important than sample size. A good sample accurately reflects (or 'represents') the population." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"We often hear warnings that some social problem is 'epidemic'. This expression suggests that the problem's growth is rapid, widespread, and out of control. If things are getting worse, and particularly if they're getting worse fast, we need to act." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Whenever examples substitute for definitions, there is a risk that our understanding of the problem will be distorted." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"While some social problems statistics are deliberate deceptions, many - probably the great majority - of bad statistics are the result of confusion, incompetence, innumeracy, or selective, self-righteous efforts to produce numbers that reaffirm principles and interests that their advocates consider just and right. The best response to stat wars is not to try and guess who's lying or, worse, simply to assume that the people we disagree with are the ones telling lies. Rather, we need to watch for the standard causes of bad statistics - guessing, questionable definitions or methods, mutant numbers, and inappropriate comparisons." (Joel Best, "Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"Every number has its limitations; every number is a product of choices that inevitably involve compromise. Statistics are intended to help us summarize, to get an overview of part of the world’s complexity. But some information is always sacrificed in the process of choosing what will be counted and how. Something is, in short, always missing. In evaluating statistics, we should not forget what has been lost, if only because this helps us understand what we still have." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"Good statistics are not only products of people counting; the quality of statistics also depends on people’s willingness and ability to count thoughtfully and on their decisions about what, exactly, ought to be counted so that the resulting numbers will be both accurate and meaningful." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"In much the same way, people create statistics: they choose what to count, how to go about counting, which of the resulting numbers they share with others, and which words they use to describe and interpret those figures. Numbers do not exist independent of people; understanding numbers requires knowing who counted what, why they bothered counting, and how they went about it." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"In short, some numbers are missing from discussions of social issues because certain phenomena are hard to quantify, and any effort to assign numeric values to them is subject to debate. But refusing to somehow incorporate these factors into our calculations creates its own hazards. The best solution is to acknowledge the difficulties we encounter in measuring these phenomena, debate openly, and weigh the options as best we can." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"Nonetheless, the basic principles regarding correlations between variables are not that diffcult to understand. We must look for patterns that reveal potential relationships and for evidence that variables are actually related. But when we do spot those relationships, we should not jump to conclusions about causality. Instead, we need to weigh the strength of the relationship and the plausibility of our theory, and we must always try to discount the possibility of spuriousness." (Joel Best, "More Damned Lies and Statistics : How numbers confuse public issues", 2004)

"Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

"When people use statistics, they assume - or, at least, they want their listeners to assume - that the numbers are meaningful. This means, at a minimum, that someone has actually counted something and that they have done the counting in a way that makes sense. Statistical information is one of the best ways we have of making sense of the world’s complexities, of identifying patterns amid the confusion. But bad statistics give us bad information." (Joel Best, "More Damned Lies and Statistics: How numbers confuse public issues", 2004)

Related Posts Plugin for WordPress, Blogger...