Showing posts with label controls. Show all posts
Showing posts with label controls. Show all posts

01 September 2024

Data Management: Data Governance (Part I: No Guild of Heroes)

Data Management Series
Data Management Series

Data governance appeared around 1980s as topic though it gained popularity in early 2000s [1]. Twenty years later, organizations still miss the mark, respectively fail to understand and implement it in a consistent manner. As usual, the reasons for failure are multiple and they vary from misunderstanding what governance is all about to poor implementation of methodologies and inadequate management or leadership. 

Moreover, methodologies tend to idealize the various aspects and is not what organizations need, but pragmatism. For example, data governance is not about heroes and heroism [2], which can give the impression that heroic actions are involved and is not the case! Actions for the sake of action don’t necessarily lead to change by themselves. Organizations are in general good at creating meaningless action without results, especially when people preoccupy themselves, miss or ignore the mark. Big organizations are very good at generating actions without effects. 

People do talk to each other, though they try to solve their own problems and optimize their own areas without necessarily thinking about the bigger picture. The problem is not necessarily communication or the lack of depth into business issues, people do communicate, know the issues without a business impact assessment. The challenge is usually in convincing the upper management that the effort needs to be consolidated, supported, respectively the needed resources made available. 

Probably, one of the issues with data governance is the attempt of creating another structure in the organization focused on quality, which has the chances to fail, and unfortunately does fail. Many issues appear when the structure gains weight and it becomes a separate entity instead of being the backbone of organizations. 

As soon organizations separate the data governance from the key users, management and the other important decisional people in the organization, it takes a life of its own that has the chances to diverge from the initial construct. Then, organizations need "alignment" and probably other big words to coordinate the effort. Also such constructs can work but they are suboptimal because the forces will always pull in different directions.

Making each manager and the upper management responsible for governance is probably the way to go, though they’ll need the time for it. In theory, this can be achieved when many of the issues are solved at the lower level, when automation and further aspects allow them to supervise things, rather than hiding behind every issue. 

When too much mircomanagement is involved, people tend to busy themselves with topics rather than solve the issues they are confronted with. The actual actors need to be empowered to take decisions and optimize their work when needed. Kaizen, the philosophy of continuous improvement, proved itself that it works when applied correctly. They’ll need the knowledge, skills, time and support to do it though. One of the dangers is however that this becomes a full-time responsibility, which tends to create a separate entity again.

The challenge for organizations lies probably in the friction between where they are and what they must do to move forward toward the various objectives. Moving in small rapid steps is probably the way to go, though each person must be aware when something doesn’t work as expected and react. That’s probably the most important aspect. 

So, the more functions are created that diverge from the actual organization, the higher the chances for failure. Unfortunately, failure is visible in the later phases, and thus self-awareness, self-control and other similar “qualities” are needed, like small actors that keep the system in check and react whenever is needed. Ideally, the employees are the best resources to react whenever something doesn’t work as per design. 

Previous Post <<||>> Next Post 

Resources:
[1] Wikipedia (2023) Data Management [link]
[2] Tiankai Feng (2023) How to Turn Your Data Team Into Governance Heroes [link]


14 December 2019

Governance: Control (Just the Quotes)

"To manage is to forecast and plan, to organize, to command, to coordinate and to control. To foresee and plan means examining the future and drawing up the plan of action. To organize means building up the dual structure, material and human, of the undertaking. To command means binding together, unifying and harmonizing all activity and effort. To control means seeing that everything occurs in conformity with established rule and expressed demand." (Henri Fayol, 1916)

"The concern of OR with finding an optimum decision, policy, or design is one of its essential characteristics. It does not seek merely to define a better solution to a problem than the one in use; it seeks the best solution... [It] can be characterized as the application of scientific methods, techniques, and tools to problems involving the operations of systems so as to provide those in control of the operations with optimum solutions to the problems." (C West Churchman et al, "Introduction to Operations Research", 1957)

"Management is a distinct process consisting of planning, organising, actuating and controlling; utilising in each both science and art, and followed in order to accomplish pre-determined objectives." (George R Terry, "Principles of Management", 1960)

"The term architecture is used here to describe the attributes of a system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flow and controls, the logical design, and the physical implementation." (Gene Amdahl et al, "Architecture of the IBM System", IBM Journal of Research and Development. Vol 8 (2), 1964)

"If cybernetics is the science of control, management is the profession of control." (Anthony S Beer, "Decision and Control", 1966)

"Most of our beliefs about complex organizations follow from one or the other of two distinct strategies. The closed-system strategy seeks certainty by incorporating only those variables positively associated with goal achievement and subjecting them to a monolithic control network. The open-system strategy shifts attention from goal achievement to survival and incorporates uncertainty by recognizing organizational interdependence with environment. A newer tradition enables us to conceive of the organization as an open system, indeterminate and faced with uncertainty, but subject to criteria of rationality and hence needing certainty." (James D Thompson, "Organizations in Action", 1967)

"Policy-making, decision-taking, and control: These are the three functions of management that have intellectual content." (Anthony S Beer, "Management Science" , 1968)

"The management of a system has to deal with the generation of the plans for the system, i. e., consideration of all of the things we have discussed, the overall goals, the environment, the utilization of resources and the components. The management sets the component goals, allocates the resources, and controls the system performance." (C West Churchman, "The Systems Approach", 1968)

"One difficulty in developing a good [accounting] control system is that quantitative results will differ according to the accounting principles used, and accounting principles may change." (Ernest Dale, "Readings in Management", 1970)

"To be productive the individual has to have control, to a substantial extent, over the speed, rhythm, and attention spans with which he is working […] While work is, therefore, best laid out as uniform, working is best organized with a considerable degree of diversity. Working requires latitude to change speed, rhythm, and attention span fairly often. It requires fairly frequent changes in operating routines as well. What is good industrial engineering for work is exceedingly poor human engineering for the worker." (Peter F Drucker, "Management: Tasks, Responsibilities, Practices", 1973)

"A mature science, with respect to the matter of errors in variables, is not one that measures its variables without error, for this is impossible. It is, rather, a science which properly manages its errors, controlling their magnitudes and correctly calculating their implications for substantive conclusions." (Otis D Duncan, "Introduction to Structural Equation Models", 1975)

"Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." (Charles Goodhart, "Problems of Monetary Management: the U.K. Experience", 1975)

"When information is centralized and controlled, those who have it are extremely influential. Since information is [usually] localized in control subsystems, these subsystems have a great deal of organization influence." (Henry L Tosi & Stephen J Carroll, "Management", 1976)

"[...] when a variety of tasks have all to be performed in cooperation, synchronization, and communication, a business needs managers and a management. Otherwise, things go out of control; plans fail to turn into action; or, worse, different parts of the plans get going at different speeds, different times, and with different objectives and goals, and the favor of the 'boss' becomes more important than performance." (Peter F Drucker, "People and Performance", 1977)

"Uncontrolled variation is the enemy of quality." (W Edwards Deming, 1980)

"The key mission of contemporary management is to transcend the old models which limited the manager's role to that of controller, expert or morale booster. These roles do not produce the desired result of aligning the goals of the employees and the corporation. [...] These older models, vestiges of a bygone era, have served their function and must be replaced with a model of the manager as a developer of human resources." (Michael Durst, "Small Systems World", 1985)

"The outcome of any professional's effort depends on the ability to control working conditions." (Joseph A Raelin, "Clash of Cultures: Managers and Professionals", 1986)

"Executives have to start understanding that they have certain legal and ethical responsibilities for information under their control." (Jim Leeke, PC Week, 1987)

"Give up control even if it means the employees have to make some mistakes." (Frank Flores, Hispanic Business, 1987)

"In complex situations, we may rely too heavily on planning and forecasting and underestimate the importance of random factors in the environment. That reliance can also lead to delusions of control." (Hillel J Einhorn & Robin M. Hogarth, Harvard Business Review, 1987)

"Managers exist to plan, direct and control the project. Part of the way they control is to listen to and weigh advice. Once a decision is made, that's the way things should proceed until a new decision is reached. Erosion of management decisions by [support] people who always 'know better' undermines managers' credibility and can bring a project to grief." (Philip W Metzger, "Managing Programming People", 1987)

"To be effective, a manager must accept a decreasing degree of direct control." (Eric G Flamholtz & Yvonne Randal, "The Inner Game of Management", 1987)

"[Well-managed modern organizations] treat everyone as a source of creative input. What's most interesting is that they cannot be described as either democratically or autocratically managed. Their managers define the boundaries, and their people figure out the best way to do the job within those boundaries. The management style is an astonishing combination of direction and empowerment. They give up tight control in order to gain control over what counts: results." (Robert H Waterman, "The Renewal Factor", 1987)

"We have created trouble for ourselves in organizations by confusing control with order. This is no surprise, given that for most of its written history, leadership has been defined in terms of its control functions." (Margaret J Wheatley, "Leadership and the New Science: Discovering Order in a Chaotic World", 1992)

"Management is not founded on observation and experiment, but on a drive towards a set of outcomes. These aims are not altogether explicit; at one extreme they may amount to no more than an intention to preserve the status quo, at the other extreme they may embody an obsessional demand for power, profit or prestige. But the scientist's quest for insight, for understanding, for wanting to know what makes the system tick, rarely figures in the manager's motivation. Secondly, and therefore, management is not, even in intention, separable from its own intentions and desires: its policies express them. Thirdly, management is not normally aware of the conventional nature of its intellectual processes and control procedures. It is accustomed to confuse its conventions for recording information with truths-about-the-business, its subjective institutional languages for discussing the business with an objective language of fact and its models of reality with reality itself." (Stanford Beer, "Decision and Control", 1994)

"Without some element of governance from the top, bottom-up control will freeze when options are many. Without some element of leadership, the many at the bottom will be paralysed with choices." (Kevin Kelly, "Out of Control: The New Biology of Machines, Social Systems and the Economic World", 1995)

"Management is a set of processes that can keep a complicated system of people and technology running smoothly. The most important aspects of management include planning, budgeting, organizing, staffing, controlling, and problem solving." (John P Kotter, "Leading Change", 1996) 

"The manager [...] is understood as one who observes the causal structure of an organization in order to be able to control it [...] This is taken to mean that the manager can choose the goals of the organization and design the systems or actions to realize those goals [...]. The possibility of so choosing goals and strategies relies on the predictability provided by the efficient and formative causal structure of the organization, as does the possibility of managers staying 'in control' of their organization's development. According to this perspective, organizations become what they are because of the choices made by their managers." (Ralph D Stacey et al, "Complexity and Management: Fad or Radical Challenge to Systems Thinking?", 2000)

"Success or failure of a project depends upon the ability of key personnel to have sufficient data for decision-making. Project management is often considered to be both an art and a science. It is an art because of the strong need for interpersonal skills, and the project planning and control forms attempt to convert part of the 'art' into a science." (Harold Kerzner, "Strategic Planning for Project Management using a Project Management Maturity Model", 2001)

"The premise here is that the hierarchy lines on the chart are also the only communication conduit. Information can flow only along the lines. [...] The hierarchy lines are paths of authority. When communication happens only over the hierarchy lines, that's a priori evidence that the managers are trying to hold on to all control. This is not only inefficient but an insult to the people underneath." (Tom DeMarco, "Slack: Getting Past Burnout, Busywork, and the Myth of Total Efficiency", 2001)

"Management can be defined as the attainment of organizational goals in an effective and efficient manner through planning, organizing, staffing, directing, and controlling organizational resources." (Richard L Daft, "The Leadership Experience" 4th Ed., 2008)

"In a complex society, individuals, organizations, and states require a high degree of confidence - even if it is misplaced - in the short-term future and a reasonable degree of confidence about the longer term. In its absence they could not commit themselves to decisions, investments, and policies. Like nudging the frame of a pinball machine to influence the path of the ball, we cope with the dilemma of uncertainty by doing what we can to make our expectations of the future self-fulfilling. We seek to control the social and physical worlds not only to make them more predictable but to reduce the likelihood of disruptive and damaging shocks (e.g., floods, epidemics, stock market crashes, foreign attacks). Our fallback strategy is denial." (Richard N Lebow, "Forbidden Fruit: Counterfactuals and International Relations", 2010)

"Almost by definition, one is rarely privileged to 'control' a disaster. Yet the activity somewhat loosely referred to by this term is a substantial portion of Management, perhaps the most important part. […] It is the business of a good Manager to ensure, by taking timely action in the real world, that scenarios of disaster remain securely in the realm of Fantasy." (John Gall, "The Systems Bible: The Beginner's Guide to Systems Large and Small"[Systematics 3rd Ed.], 2011)

"Without precise predictability, control is impotent and almost meaningless. In other words, the lesser the predictability, the harder the entity or system is to control, and vice versa. If our universe actually operated on linear causality, with no surprises, uncertainty, or abrupt changes, all future events would be absolutely predictable in a sort of waveless orderliness." (Lawrence K Samuels, "Defense of Chaos", 2013)

"The problem of complexity is at the heart of mankind’s inability to predict future events with any accuracy. Complexity science has demonstrated that the more factors found within a complex system, the more chances of unpredictable behavior. And without predictability, any meaningful control is nearly impossible. Obviously, this means that you cannot control what you cannot predict. The ability ever to predict long-term events is a pipedream. Mankind has little to do with changing climate; complexity does." (Lawrence K Samuels, "The Real Science Behind Changing Climate", LewRockwell.com, August 1, 2014) 

12 August 2019

Information Security: Access Control (Definitions)

"The ability to selectively control who can get at or manipulate information in, for example, a Web server." (Tim Berners-Lee, "Weaving the Web", 1999)

"The methods by which interactions with resources are limited to collections of users or programs for the purpose of enforcing integrity, confidentiality, or availability constraints." (Kim Haase et al, "The J2EE™ Tutorial", 2002)

"Limiting access to resources according to rights granted by the system administrator, application, or policy." (Tom Petrocelli, "Data Protection and Information Lifecycle Management", 2005)

"Determining who or what can go where, when, and how." (Judith Hurwitz et al, "Service Oriented Architecture For Dummies" 2nd Ed., 2009)

"Management of who is allowed access and who is not allowed access to networks, data files, applications, or other digital resources." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"Any mechanism to regulate access to something, but for parallel programs this term generally applies to shared memory. The term is sometimes extended to I/O devices as well. For parallel programming, the objective is generally to provide deterministic results by preventing an object from being modified by multiple tasks simultaneously. Most often this is referred to as mutual exclusion, which includes locks, mutexes, atomic operations, and transactional memory models. This may also require some control on reading access to prevent viewing of an object in a partially modified state." (Michael McCool et al, "Structured Parallel Programming", 2012)

"Secures content and identifies who can read, create, modify, and delete content." (Charles Cooper & Ann Rockley, "Managing Enterprise Content: A Unified Content Strategy" 2nd Ed., 2012)

"A technique used to permit or deny use of data or information system resources to specific users, programs, processes, or other systems based on previously granted authorization to those resources." (Mark Rhodes-Ousley, "Information Security: The Complete Reference, Second Edition" 2nd Ed., 2013)

"The act of limiting access to information system resources only to authorized users, programs, processes, or other systems." (Manish Agrawal, "Information Security and IT Risk Management", 2014)

"The means to ensure that access to assets is authorised and restricted on business and security requirements." (David Sutton, "Information Risk Management: A practitioner’s guide", 2014)

"Are security features that control how users and systems communicate and interact with other systems and resources." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"Mechanisms, controls, and methods of limiting access to resources to authorized subjects only." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed, 2018)

"The process of granting or denying specific requests (1) for accessing and using information and related information processing services and (2) to enter specific physical facilities. Access control ensures that access to assets is authorized and restricted based on business and security requirements." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

05 August 2019

Information Security: Security Policy (Definitions)

"The active policy on the client's computer that programmatically generates a granted set of permissions from a set of requested permissions. A security policy consists of several levels that interact; by default only permissions granted by all layers are allowed to be granted." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

"A collection of standards, policies, and procedures created to guarantee the security of a system and ensure auditing and compliance." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)

"The set of decisions that govern security controls." (Mark Rhodes-Ousley, "Information Security: The Complete Reference" 2nd Ed., 2013)

"In label-based access control, a database object that is associated with one or more tables and that defines how LBAC can be used to protect those tables. The security policy defines what security labels can be used, how the security labels are compared to each other, and whether optional behaviors are used. See also label-based access control, security label." (IBM, "Informix Servers 12.1", 2014)

"A written statement describing the constraints or behavior an organization embraces regarding the information provided by its users" (Nell Dale & John Lewis, "Computer Science Illuminated" 6th Ed., 2015)

"Strategic tool used to dictate how sensitive information and resources are to be managed and protected." (Adam Gordon, "Official (ISC)2 Guide to the CISSP CBK" 4th Ed., 2015)

"Set of rules, guidelines and procedures represented in official security documents that define way in which state will protect its own national security interests." (Olivera Injac & Ramo Šendelj, "National Security Policy and Strategy and Cyber Security Risks", 2016)

"A set of rules and practices that specify or regulate how a system or an organization provides security services to protect sensitive and critical system resources." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"A statement of the rules governing the access to a system’s protected resources." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"In label-based access control, a database object that is associated with one or more tables and that defines how LBAC can be used to protect those tables. The security policy defines what security labels can be used, how the security labels are compared to each other, and whether optional behaviors are used. See also label-based access control, security label." (Sybase, "Open Server Server-Library/C Reference Manual", 2019)

"A set of criteria for the provision of security services." (CNSSI 4009-2015 NIST)

 "A set of methods for protecting a database from accidental or malicious destruction of data or damage to the database infrastructure." (Oracle)

"Security policies define the objectives and constraints for the security program. Policies are created at several levels, ranging from organization or corporate policy to specific operational constraints (e.g., remote access). In general, policies provide answers to the questions 'what' and 'why' without dealing with 'how'. Policies are normally stated in terms that are technology-independent." (NIST SP 800-82 Rev. 2)

03 August 2019

Information Security: Countermeasure (Definitions)

"A control, method, technique, or procedure that is put into place to prevent a threat agent from exploiting a vulnerability. A countermeasure is put into place to mitigate risk. Also called a safeguard or control." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed., 2018)

"A defensive mechanism intended to address a class of attack." (O Sami Saydjari, "Engineering Trustworthy Systems: Get Cybersecurity Design Right the First Time", 2018)

"An action, a device, a procedure, or a technique that reduces a threat, a vulnerability, or an attack by eliminating or preventing it, by minimizing the harm it can cause, or by discovering and reporting it so that corrective action can be taken." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"Countermeasures are steps that can be taken, and systems that can be implemented, to prevent internal and external threats from accessing your data and causing issues." (Michael Coles & Rodney Landrum, , "Expert SQL Server 2008 Encryption", 2008)

"Used to refer to any type of control" (ITIL)

17 July 2019

IT: Configuration Management (Definitions)

 "A discipline applying technical and administrative direction and surveillance to: identify and document the functional and physical characteristics of a configuration item, control changes to those characteristics, record and report change processing and implementation status, and verify compliance with specified requirements. (IEEE 610, 1990)

"The process of identifying and defining the configuration items in a system, controlling the release and change of these items throughout the system life cycle, recording and reporting the status of configuration items and change requests, and verifying the completeness and correctness of configuration items." (Richard D Stutzke, "Estimating Software-Intensive Systems: Projects, Products, and Processes", 2005)

"Process for the definition and management of configurations, allowing change control and change monitoring over a defined period. Configuration management allows access to individual configurations or configuration items (i.e., work products). Differences between individual configurations are readily identifiable. A configuration can be used to form a baseline; see also Baseline." (Lars Dittmann et al, "Automotive SPICE in Practice", 2008)

"A generic term that is often used to describe the whole of the activities concerned with the creation, maintenance, and control of databases and their environments." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The management of configurations, normally involving holding configuration data in a database so that the data can be managed and changed where necessary." (Marcia Kaufman et al, "Big Data For Dummies", 2013)

"Managing the items produced by the project such as requirements documents, designs, and, of course, source code. This may include controlling changes to those items so that changes don’t happen willy-nilly." (Rod Stephens, "Beginning Software Engineering", 2015)

"The detailed recording, management, and updating of the details of an information system." (Weiss, "Auditing IT Infrastructures for Compliance, 2nd Ed", 2015)

"An operational process aimed at ensuring that systems and controls are configured correctly and are responsive to the current threat and operational environments." (Shon Harris & Fernando Maymi, "CISSP All-in-One Exam Guide" 8th Ed, 2018)

"The process of controlling modifications to a system’s hardware, software, and documentation, which provides sufficient assurance that the system is protected against the introduction of improper modification before, doing, and after system implementation." (William Stallings, "Effective Cybersecurity: A Guide to Using Best Practices and Standards", 2018)

"The process of managing versions of configuration items and their coherent consistent sets, in order to control their modification and release, and to ensure their consistency, completeness, and accuracy." (Bruce P Douglass, "Real-Time Agility: The Harmony/ESW Method for Real-Time and Embedded Systems Development", 2009)

"Process responsible for maintaining information about CIs required to deliver an IT service, including their relationships" (ITIL)

24 April 2019

Project Management: The Butterflies of Project Management

Mismanagement

Expressed metaphorically as "the flap of a butterfly’s wings in Brazil set off a tornado in Texas”, in Chaos Theory the “butterfly effect” is a hypothesis rooted in Edward N Lorenz’s work on weather forecasting and used to depict the sensitive dependence on initial conditions in nonlinear processes, systems in which the change in input is not proportional to the change in output.  

Even if overstated, the flapping of wings advances the idea that a small change (the flap of wings) in the initial conditions of a system cascades to a large-scale chain of events leading to large-scale phenomena (the tornado) . The chain of events is known as the domino effect and represents the cumulative effect produced when one event sets off a chain of similar events. If the butterfly metaphor doesn’t catch up maybe it’s easier to visualize the impact as a big surfing wave – it starts small and increases in size to the degree that it can bring a boat to the shore or make an armada drown under its force. 

Projects start as narrow activities however the longer they take and the broader they become tend to accumulate force and behave like a wave, having the force to push or drawn an organization in the flood that comes with it. A project is not only a system but a complex ecosystem - aggregations of living organisms and nonliving components with complex interactions forming a unified whole with emergent behavior deriving from the structure rather than its components - groups of people tend to  self-organize, to swarm in one direction or another, much like birds do, while knowledge seems to converge from unrelated sources (aka consilience). 

 Quite often ignored, the context in which a project starts is very important, especially because these initial factors or conditions can have a considerable impact reflected in people’s perception regarding the state or outcomes of the project, perception reflected eventually also in the decisions made during the later phases of the project. The positive or negative auspices can be easily reinforced by similar events. Given the complex correlations and implications, aspects not always correct perceived and understood can have a domino effect. 

The preparations for the project start – the Business Case, setting up the project structure, communicating project’s expectation and addressing stakeholders’ expectations, the kick-off meeting, the approval of the needed resources, the knowledge available in the team, all these have a certain influence on the project. A bad start can haunt a project long time after its start, even if the project is on the right track and makes a positive impact. In reverse, a good start can shade away some mishaps on the way, however there’s also the danger that the mishaps are ignored and have greater negative impact on the project. It may look as common sense however the first image often counts and is kept in people’s memory for a long time. 

As people are higher perceptive to negative as to positive events, there are higher the chances that a multitude of negative aspects will have bigger impact on the project. It’s again something that one can address as the project progresses. It’s not necessarily about control but about being receptive to the messages around and of allowing people to give (constructive) feedback early in the project. It’s about using the positive force of a wave and turning negative flow into a positive one. 

Being aware of the importance of the initial context is just a first step toward harnessing waves or winds’ power, it takes action and leadership to pull the project in the right direction.

12 February 2019

IT: IT Governance (Definitions)

"Framework for the leadership, organizational structures and business processes, standards and compliance to these standards, which ensure that the organization’s IT supports and enables the achievement of its strategies and objectives." (Alan Calder, "IT Governance: Guidelines for Directors", 2005)

"The processes, policies, relationships, and mechanisms that ensure that information technology delivers business value while balancing risk and investment decisions. IT governance ensures accountability and provides rigor for managing IT capabilities in the context of a larger corporate governance framework." (Evan Levy & Jill Dyché, "Customer Data Integration", 2006)

"Addresses the application of governance to an IT organization and its people, processes, and information to guide the way those assets support the needs of the business. It may be characterized by assigning decision rights and measures to processes." (Tilak Mitra et al, "SOA Governance", 2008)

"IT governance is the system and structure for defining policy and monitoring and controlling the policy implementation, and managing and coordinating the procedures and resources aimed at ensuring the efficient and effective execution of services." (Anton Joha & Marijn Janssen, "The Strategic Determinants of Shared Services", 2008)

"The discipline of managing IT as a service to the business, aligning IT objectives with business goals." (Allen Dreibelbis et al, "Enterprise Master Data Management", 2008)

"An integral part of enterprise governance and consists of the leadership and organizational structures and processes that ensure the enterprise’s IT sustains and extends the organization’s strategies and objectives." (Edephonce N Nfuka & Lazar Rusu, IT Governance in the Public Sector in a Developing Country, 2009)

"(1) Locus of IT decision-making authority (narrow definition). (2) The distribution of IT decision-making rights and responsibilities among different stakeholders in the organization, and the rules and procedures for making and monitoring decisions on strategic IT concerns (comprehensive definition)." (Ryan R Peterson, "Trends in Information Technology Governance", 2009)

"Structure of relationships and processes to direct and control the IT enterprise to achieve IT’s goals by adding value while balancing risk versus return over IT and its processes." (IT Governance Institute, "IT Governance Implementation Guide, Using COBIT and Val IT", 2010)

"The discipline of tracking, managing, and steering an IS/IT landscape. Architectural governance is concerned with change processes (design governance). Operational governance looks at the operational performance of systems against contracted performance levels, the definition of operational performance levels, and the implementation of systems that ensure the effective operation of systems." (David Lyle & John G Schmidt, "Lean Integration", 2010)

"Formally established statements that direct the policies regarding IT alignment with organizational goals and allocation of resources." (Linda Volonino & Efraim Turban, "Information Technology for Management 8th Ed", 2011)

"Supervision monitoring and control of an organization's IT assets." (Linda Volonino & Efraim Turban, "Information Technology for Management" 8th Ed, 2011)

"The processes and relationships that lead to reasoned decision making in IT." (Steven Romero, "Eliminating ‘Us and Them’", 2011)

"The function of ensuring that the enterprise's IT activities match and support the organization's strategies and objectives. Governance is very often associated with budgeting, project management, and compliance activities." (Bill Holtsnider & Brian D Jaffe, "IT Manager's Handbook" 3rd Ed, 2012)

"Controls and process to improve the effectiveness of information technology; also, the primary way that stakeholders can ensure that investments in IT create business value and contribute toward meeting business objectives." (Robert F Smallwood, "Information Governance: Concepts, Strategies, and Best Practices", 2014)

"Processes used to ensure that IT resources are aligned with the goals of the organization. Organizations often use frameworks to help them with IT governance." (Darril Gibson, "Effective Help Desk Specialist Skills", 2014)

"The framework of rules and practices by which an organization structures its technology decision-making process in order to ensure alignment of the organization's business strategy with its operations." (David K Pham, "From Business Strategy to Information Technology Roadmap", 2016)

"Set of methods and techniques for reaching full alignment between business strategy and IT strategy." (Dalia S Vugec, "IT Strategic Grid: A Longitudinal Multiple Case Study", 2019)

"The processes that ensure the effective and efficient use of IT in enabling an organization to achieve its goals." (Lili Aunimo et al, "Big Data Governance in Agile and Data-Driven Software Development: A Market Entry Case in the Educational Game Industry", 2019)

"The structures, processes, and mechanisms by which the current and future use of ICT is directed and controlled." (Konstantinos Tsilionis & Yves Wautelet, "Aligning Strategic-Driven Governance of Business IT Services With Their Agile Development: A Conceptual Modeling-Based Approach", 2021)

"IT governance (ITG) is defined as the processes that ensure the effective and efficient use of IT in enabling an organization to achieve its goals." (Gartner)

"The system by which the current and future use of IT is directed and controlled, Corporate Governance of IT involves evaluating and directing the use of IT to support the organisation and monitoring this use to achieve plans." (ISO/IEC 38500)

08 January 2019

Governance: Authority (Just the Quotes)

"When the general is weak and without authority; when his orders are not clear and distinct; when there are no fixed duties assigned to officers and men, and the ranks are formed in a slovenly haphazard manner, the result is utter disorganization." (Sun Tzu, "The Art of War", cca. 5th century)

"Authority is never without hate." (Euripides, "Ion", cca. 422 BC)

"In questions of science, the authority of a thousand is not worth the humble reasoning of a single individual" (Galileo Galilei, 1632)

"Authority without wisdom is like a heavy axe without an edge, fitter to bruise than polish." (Anne Bradstreet, "Meditations Divine and Moral", 1664)

"Lawful and settled authority is very seldom resisted when it is well employed." (Samuel Johnson, "The Rambler", 1750)

"The most absolute authority is that which penetrates into a man's innermost being and concerns itself no less with his will than with his actions." (Jean-Jacques Rousseau, "On the origin of inequality", 1755)

"The wise executive never looks upon organizational lines as being settled once and for all. He knows that a vital organization must keep growing and changing with the result that its structure must remain malleable. Get the best organization structure you can devise, but do not be afraid to change it for good reason: This seems to be the sound rule. On the other hand, beware of needless change, which will only result in upsetting and frustrating your employees until they become uncertain as to what their lines of authority actually are." (Marshall E Dimock, "The Executive in Action", 1915)

"No amount of learning from books or of listening to the words of authority can be substituted for the spade-work of investigation." (Richard Gregory, "Discovery; or, The Spirit and Service of Science", 1916)

"In organization it means the graduation of duties, not according to differentiated functions, for this involves another and distinct principle of organization, but simply according to degrees of authority and corresponding responsibility." (James D Mooney, "Onward Industry!", 1931)

"It is sufficient here to observe that the supreme coordinating authority must be anterior to leadership in logical order, for it is this coordinating force which makes the organization. Leadership, on the other hand, always presupposes the organization. There can be no leader without something to lead." (James D Mooney, "Onward Industry!", 1931)

"Leadership is the form that authority assumes when it enters into process. As such it constitutes the determining principle of the entire scalar process, existing not only at the source, but projecting itself through its own action throughout the entire chain, until, through functional definition, it effectuates the formal coordination of the entire structure." (James D Mooney, "Onward Industry!", 1931)

"The staff function in organization means the service of advice or counsel, as distinguished from the function of authority or command. This service has three phases, which appear in a clearly integrated relationship. These phases are the informative, the advisory, and the supervisory." (James D Mooney, "Onward Industry!", 1931)

"Human beings are compounded of cognition and emotion and do not function well when treated as though they were merely cogs in motion.... The task of the administrator must be accomplished less by coercion and discipline, and more and more by persuasion.... Management of the future must look more to leadership and less to authority as the primary means of coordination." (Luther H Gulick, "Papers on the Science of Administration", 1937)

"A person can and will accept a communication as authoritative only when four conditions simultaneously obtain: (a) he can and does understand the communication; (b) at the time of his decision he believes that it is not inconsistent with the purpose of the organization; (c) at the time of his decision, he believes it to be compatible with his personal interest as a whole; and (d) he is able mentally and physically to comply with it." (Chester I Barnard, "The Functions of the Executive", 1938)

"The fine art of executive decision consists in not deciding questions that are not now pertinent, in not deciding prematurely, in not making decision that cannot be made effective, and in not making decisions that others should make. Not to decide questions that are not pertinent at the time is uncommon good sense, though to raise them may be uncommon perspicacity. Not to decide questions prematurely is to refuse commitment of attitude or the development of prejudice. Not to make decisions that cannot be made effective is to refrain from destroying authority. Not to make decisions that others should make is to preserve morale, to develop competence, to fix responsibility, and to preserve authority.
From this it may be seen that decisions fall into two major classes, positive decisions - to do something, to direct action, to cease action, to prevent action; and negative decisions, which are decisions not to decide. Both are inescapable; but the negative decisions are often largely unconscious, relatively nonlogical, "instinctive," "good sense." It is because of the rejections that the selection is good." (Chester I Barnard, "The Functions of the Executive", 1938)

"To hold a group or individual accountable for activities of any kind without assigning to him or them the necessary authority to discharge that responsibility is manifestly both unsatisfactory and inequitable. It is of great Importance to smooth working that at all levels authority and responsibility should be coterminous and coequal." (Lyndall Urwick, "Dynamic Administration", 1942)

"All behavior involves conscious or unconscious selection of particular actions out of all those which are physically possible to the actor and to those persons over whom he exercises influence and authority." (Herbert A Simon, "Administrative Behavior: A Study of Decision-making Processes in Administrative Organization", 1947)

"Coordination, therefore, is the orderly arrangement of group efforts, to provide unity of action in the pursuit of a common purpose. As coordination is the all inclusive principle of organization it must have its own principle and foundation in authority, or the supreme coordination power. Always, in every form of organization, this supreme authority must rest somewhere, else there would be no directive for any coordinated effort." (James D Mooney, "The Principles of Organization", 1947)

"Delegation means the conferring of a specified authority by a higher authority. In its essence it involves a dual responsibility. The one to whom responsibility is delegated becomes responsible to the superior for doing the job. but the superior remains responsible for getting the Job done. This principle of delegation is the center of all processes in formal organization. Delegation is inherent in the very nature of the relation between superior and subordinate. The moment the objective calls for the organized effort of more than one person, there is always leadership with its delegation of duties." (James D Mooney, "The Principles of Organization", 1947)

"Power on the one side, fear on the other, are always the buttresses on which irrational authority is built." (Erich Fromm, "Man for Himself: An Inquiry Into the Psychology of Ethics", 1947)

"Authority is not a quality one person 'has', in the sense that he has property or physical qualities. Authority refers to an interpersonal relation in which one person looks upon another as somebody superior to him." (Erich Fromm, "The Fear of Freedom", 1950)

"The only way for a large organization to function is to decentralize, to delegate real authority and responsibility to the man on the job. But be certain you have the right man on the job." (Robert E Wood, 1951)

"[...] authority - the right by which superiors are able to require conformity of subordinates to decisions - is the basis for responsibility and the force that binds organization together. The process of organizing encompasses grouping of activities for purposes of management and specification of authority relationships between superiors and subordinates and horizontally between managers. Consequently, authority and responsibility relationships come into being in all associative undertakings where the superior-subordinate link exists. It is these relationships that create the basic character of the managerial job." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Although organization charts are useful, necessary, and often revealing tools, they are subject to many important limitations. In the first place, a chart shows only formal authority relationships and omits the many significant informal and informational relationships that exist in a living organization. Moreover, it does not picture how much authority exists at any point in the organization." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"[...] authority for given tasks is limited to that for which an individual may properly held responsible." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Authority delegations from a superior to a subordinate may be made in large or small degree. The tendency to delegate much authority through the echelons of an organization structure is referred tojas decentralization of authority. On the other hand, authority is said to be centralized wherever a manager tends not to delegate authority to his subordinates." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"Authority is, of course, completely centralized when a manager delegates none, and it is possible to think of the reverse situation - an infinite delegation of authority in which no manager retains any authority other than the implicit power to recover delegated authority. But this kind of delegation is obviously impracticable, since, at some point in the organization structure, delegations must stop." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"If charts do not reflect actual organization and if the organization is intended to be as charted, it is the job of effective management to see that actual organization conforms with that desired. Organization charts cannot supplant good organizing, nor can a chart take the place of spelling out authority relationships clearly and completely, of outlining duties of managers and their subordinates, and of defining responsibilities." (Harold Koontz & Cyril O Donnell, "Principles of Management", 1955)

"It is highly important for managers to be honest and clear in describing what authority they are keeping and what role they are asking their subordinates to assume." (Robert Tannenbaum & Warren H Schmidt, Harvard Business Review, 1958)

"Formal theories of organization have been taught in management courses for many years, and there is an extensive literature on the subject. The textbook principles of organization — hierarchical structure, authority, unity of command, task specialization, division of staff and line, span of control, equality of responsibility and authority, etc. - comprise a logically persuasive set of assumptions which have had a profound influence upon managerial behavior." (Douglas McGregor, 'The Human Side of Enterprise", 1960)

"If there is a single assumption which pervades conventional organizational theory, it is that authority is the central, indispensable means of managerial control." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"The ingenuity of the average worker is sufficient to outwit any system of controls devised by management." (Douglas McGregor, "The Human Side of Enterprise", 1960)

"You can delegate authority, but you can never delegate responsibility by delegating a task to someone else. If you picked the right man, fine, but if you picked the wrong man, the responsibility is yours - not his." (Richard E Krafve, The Boston Sunday Globe, 1960)

"Centralized controls are designed to ensure that the chief executive can find out how well the delegated authority and responsibility are being exercised." (Ernest Dale, "Management: Theory and practice", 1965)

"In large-scale organizations, the factual approach must be constantly nurtured by high-level executives. The more layers of authority through which facts must pass before they reach the decision maker, the greater the danger that they will be suppressed, modified, or softened, so as not to displease the 'brass"' For this reason, high-level executives must keep reaching for facts or soon they won't know what is going on. Unless they make visible efforts to seek and act on facts, major problems will not be brought to their attention, the quality of their decisions will decline, and the business will gradually get out of touch with its environment." (Marvin Bower, "The Will to Manage", 1966)

"The concept of organizational goals, like the concepts of power, authority, or leadership, has been unusually resistant to precise, unambiguous definition. Yet a definition of goals is necessary and unavoidable in organizational analysis. Organizations are established to do something; they perform work directed toward some end." (Charles Perrow, "Organizational Analysis: A Sociological View", 1970)

"[Management] has authority only as long as it performs." (Peter F Drucker, "Management: Tasks, Responsibilities, Practices", 1973)

"'Management' means, in the last analysis, the substitution of thought for brawn and muscle, of knowledge for folkways and superstition, and of cooperation for force. It means the substitution of responsibility for obedience to rank, and of authority of performance for authority of rank. (Peter F Drucker, "People and Performance", 1977)

"The key to successful leadership today is influence, not authority." (Kenneth H Blanchard, "Managing By Influence", 1986)

"Strange as it sounds, great leaders gain authority by giving it away." (James B Stockdale, "Military Ethics" 1987)

"Perhaps nothing in our society is more needed for those in positions of authority than accountability." (Larry Burkett, "Business By The Book: Complete Guide of Biblical Principles for the Workplace", 1990)

"When everything is connected to everything in a distributed network, everything happens at once. When everything happens at once, wide and fast moving problems simply route around any central authority. Therefore overall governance must arise from the most humble interdependent acts done locally in parallel, and not from a central command. " (Kevin Kelly, "Out of Control: The New Biology of Machines, Social Systems and the Economic World", 1995)

"Authority alone is like pushing from behind. What automatic reaction do you have when pushed from behind? Resistance - unless you are travelling in that direction anyway and you experience the push as helpful. When you do not know what lies ahead and you are not sure whether you want to move forward, resistance is completely understandable. [...] Authority alone pushes. Leadership pulls, because it draws people towards a vision of the future that attracts them." (Joseph O’Connor, "Leading With NLP: Essential Leadership Skills for Influencing and Managing People", 1998)

"Authority works best where you have an accepted hierarchy [...]. Then people move together because of the strong implicit accepted values that everyone shares. If you are trying to lead people who do not share similar goals and values, then authority is not enough." (Joseph O’Connor, "Leading With NLP: Essential Leadership Skills for Influencing and Managing People", 1998)

"The ultimate authority must always rest with the individual's own reason and critical analysis." (Tenzin Gyatso, "Path To Tranquility", 1998)

"The premise here is that the hierarchy lines on the chart are also the only communication conduit. Information can flow only along the lines. [...] The hierarchy lines are paths of authority. When communication happens only over the hierarchy lines, that's a priori evidence that the managers are trying to hold on to all control. This is not only inefficient but an insult to the people underneath." (Tom DeMarco, "Slack: Getting Past Burnout, Busywork, and the Myth of Total Efficiency", 2001)

"A system is a framework that orders and sequences activity within the organisation to achieve a purpose within a band of variance that is acceptable to the owner of the system.  Systems are the organisational equivalent of behaviour in human interaction. Systems are the means by which organisations put policies into action.  It is the owner of a system who has the authority to change it, hence his or her clear acceptance of the degree of variation generated by the existing system." (Catherine Burke et al, "Systems Leadership" 2nd Ed., 2018)

"Responsibility means an inevitable punishment for mistakes; authority means full power to make them." (Yegor Bugayenko, "Code Ahead", 2018)

"Control is not leadership; management is not leadership; leadership is leadership. If you seek to lead, invest at least 50% of your time in leading yourself–your own purpose, ethics, principles, motivation, conduct. Invest at least 20% leading those with authority over you and 15% leading your peers." (Dee Hock)

"Delegation of authority is one of the most important functions of a leader, and he should delegate authority to the maximum degree possible with regard to the capabilities of his people. Once he has established policy, goals, and priorities, the leader accomplishes his objectives by pushing authority right down to the bottom. Doing so trains people to use their initiative; not doing so stifles creativity and lowers morale." (Thornas H Moorer)

"Leadership means that a group, large or small, is willing to entrust authority to a person who has shown judgement, wisdom, personal appeal, and proven competence." (Walt Disney)

"The teams and staffs through which the modern commander absorbs information and exercises his authority must be a beautifully interlocked, smooth-working mechanism. Ideally, the whole should be practically a single mind." (Dwight D Eisenhower)

"While basic laws underlie command authority, the real foundation of successful leadership is the moral authority derived from professional competence and integrity. Competence and integrity are not separable." (William C Westmoreland)

30 December 2018

Data Science: Testing (Just the Quotes)

"We must trust to nothing but facts: These are presented to us by Nature, and cannot deceive. We ought, in every instance, to submit our reasoning to the test of experiment, and never to search for truth but by the natural road of experiment and observation." (Antoin-Laurent de Lavoisiere, "Elements of Chemistry", 1790)

"A law of nature, however, is not a mere logical conception that we have adopted as a kind of memoria technical to enable us to more readily remember facts. We of the present day have already sufficient insight to know that the laws of nature are not things which we can evolve by any speculative method. On the contrary, we have to discover them in the facts; we have to test them by repeated observation or experiment, in constantly new cases, under ever-varying circumstances; and in proportion only as they hold good under a constantly increasing change of conditions, in a constantly increasing number of cases with greater delicacy in the means of observation, does our confidence in their trustworthiness rise." (Hermann von Helmholtz, "Popular Lectures on Scientific Subjects", 1873)

"A discoverer is a tester of scientific ideas; he must not only be able to imagine likely hypotheses, and to select suitable ones for investigation, but, as hypotheses may be true or untrue, he must also be competent to invent appropriate experiments for testing them, and to devise the requisite apparatus and arrangements." (George Gore, "The Art of Scientific Discovery", 1878)

"The preliminary examination of most data is facilitated by the use of diagrams. Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitutes for such critical tests as may be applied to the data, but are valuable in suggesting such tests, and in explaining the conclusions founded upon them." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of the empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment." (Karl Popper, "The Logic of Scientific Discovery", 1934)

"Science, in the broadest sense, is the entire body of the most accurately tested, critically established, systematized knowledge available about that part of the universe which has come under human observation. For the most part this knowledge concerns the forces impinging upon human beings in the serious business of living and thus affecting man’s adjustment to and of the physical and the social world. […] Pure science is more interested in understanding, and applied science is more interested in control […]" (Austin L Porterfield, "Creative Factors in Scientific Research", 1941)

"To a scientist a theory is something to be tested. He seeks not to defend his beliefs, but to improve them. He is, above everything else, an expert at ‘changing his mind’." (Wendell Johnson, 1946)

"As usual we may make the errors of I) rejecting the null hypothesis when it is true, II) accepting the null hypothesis when it is false. But there is a third kind of error which is of interest because the present test of significance is tied up closely with the idea of making a correct decision about which distribution function has slipped furthest to the right. We may make the error of III) correctly rejecting the null hypothesis for the wrong reason." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"Errors of the third kind happen in conventional tests of differences of means, but they are usually not considered, although their existence is probably recognized. It seems to the author that there may be several reasons for this among which are 1) a preoccupation on the part of mathematical statisticians with the formal questions of acceptance and rejection of null hypotheses without adequate consideration of the implications of the error of the third kind for the practical experimenter, 2) the rarity with which an error of the third kind arises in the usual tests of significance." (Frederick Mosteller, "A k-Sample Slippage Test for an Extreme Population", The Annals of Mathematical Statistics 19, 1948)

"If significance tests are required for still larger samples, graphical accuracy is insufficient, and arithmetical methods are advised. A word to the wise is in order here, however. Almost never does it make sense to use exact binomial significance tests on such data - for the inevitable small deviations from the mathematical model of independence and constant split have piled up to such an extent that the binomial variability is deeply buried and unnoticeable. Graphical treatment of such large samples may still be worthwhile because it brings the results more vividly to the eye." (Frederick Mosteller & John W Tukey, "The Uses and Usefulness of Binomial Probability Paper?", Journal of the American Statistical Association 44, 1949)

"Statistics is the fundamental and most important part of inductive logic. It is both an art and a science, and it deals with the collection, the tabulation, the analysis and interpretation of quantitative and qualitative measurements. It is concerned with the classifying and determining of actual attributes as well as the making of estimates and the testing of various hypotheses by which probable, or expected, values are obtained. It is one of the means of carrying on scientific research in order to ascertain the laws of behavior of things - be they animate or inanimate. Statistics is the technique of the Scientific Method." (Bruce D Greenschields & Frank M Weida, "Statistics with Applications to Highway Traffic Analyses", 1952)

"The only relevant test of the validity of a hypothesis is comparison of prediction with experience." (Milton Friedman, "Essays in Positive Economics", 1953)

"The main purpose of a significance test is to inhibit the natural enthusiasm of the investigator." (Frederick Mosteller, "Selected Quantitative Techniques", 1954)

"The methods of science may be described as the discovery of laws, the explanation of laws by theories, and the testing of theories by new observations. A good analogy is that of the jigsaw puzzle, for which the laws are the individual pieces, the theories local patterns suggested by a few pieces, and the tests the completion of these patterns with pieces previously unconsidered." (Edwin P Hubble, "The Nature of Science and Other Lectures", 1954)

"Science is the creation of concepts and their exploration in the facts. It has no other test of the concept than its empirical truth to fact." (Jacob Bronowski, "Science and Human Values", 1956)

"Null hypotheses of no difference are usually known to be false before the data are collected [...] when they are, their rejection or acceptance simply reflects the size of the sample and the power of the test, and is not a contribution to science." (I Richard Savage, "Nonparametric statistics", Journal of the American Statistical Association 52, 1957)

"The well-known virtue of the experimental method is that it brings situational variables under tight control. It thus permits rigorous tests of hypotheses and confidential statements about causation. The correlational method, for its part, can study what man has not learned to control. Nature has been experimenting since the beginning of time, with a boldness and complexity far beyond the resources of science. The correlator’s mission is to observe and organize the data of nature’s experiments." (Lee J Cronbach, "The Two Disciplines of Scientific Psychology", The American Psychologist Vol. 12, 1957)

"A satisfactory prediction of the sequential properties of learning data from a single experiment is by no means a final test of a model. Numerous other criteria - and some more demanding - can be specified. For example, a model with specific numerical parameter values should be invariant to changes in independent variables that explicitly enter in the model." (Robert R Bush & Frederick Mosteller,"A Comparison of Eight Models?", Studies in Mathematical Learning Theory, 1959)

"One feature [...] which requires much more justification than is usually given, is the setting up of unplausible null hypotheses. For example, a statistician may set out a test to see whether two drugs have exactly the same effect, or whether a regression line is exactly straight. These hypotheses can scarcely be taken literally." (Cedric A B Smith, "Book review of Norman T. J. Bailey: Statistical Methods in Biology", Applied Statistics 9, 1960)

"The null-hypothesis significance test treats ‘acceptance’ or ‘rejection’ of a hypothesis as though these were decisions one makes. But a hypothesis is not something, like a piece of pie offered for dessert, which can be accepted or rejected by a voluntary physical action. Acceptance or rejection of a hypothesis is a cognitive process, a degree of believing or disbelieving which, if rational, is not a matter of choice but determined solely by how likely it is, given the evidence, that the hypothesis is true." (William W Rozeboom, "The fallacy of the null–hypothesis significance test", Psychological Bulletin 57, 1960)

"It is easy to obtain confirmations, or verifications, for nearly every theory - if we look for confirmations. Confirmations should count only if they are the result of risky predictions. […] A theory which is not refutable by any conceivable event is non-scientific. Irrefutability is not a virtue of a theory (as people often think) but a vice. Every genuine test of a theory is an attempt to falsify it, or refute it." (Karl R Popper, "Conjectures and Refutations: The Growth of Scientific Knowledge", 1963)

"The final test of a theory is its capacity to solve the problems which originated it." (George Dantzig, "Linear Programming and Extensions", 1963)

"The mediation of theory and praxis can only be clarified if to begin with we distinguish three functions, which are measured in terms of different criteria: the formation and extension of critical theorems, which can stand up to scientific discourse; the organization of processes of enlightenment, in which such theorems are applied and can be tested in a unique manner by the initiation of processes of reflection carried on within certain groups toward which these processes have been directed; and the selection of appropriate strategies, the solution of tactical questions, and the conduct of the political struggle. On the first level, the aim is true statements, on the second, authentic insights, and on the third, prudent decisions." (Jürgen Habermas, "Introduction to Theory and Practice", 1963)

"The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. […] Significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data." (Cherry A Clark, "Hypothesis Testing in Relation to Statistical Methodology", Review of Educational Research Vol. 33, 1963)

"The usefulness of the models in constructing a testable theory of the process is severely limited by the quickly increasing number of parameters which must be estimated in order to compare the predictions of the models with empirical results" (Anatol Rapoport, "Prisoner's Dilemma: A study in conflict and cooperation", 1965)

"The validation of a model is not that it is 'true' but that it generates good testable hypotheses relevant to important problems.” (Richard Levins, "The Strategy of Model Building in Population Biology”, 1966)

"Discovery always carries an honorific connotation. It is the stamp of approval on a finding of lasting value. Many laws and theories have come and gone in the history of science, but they are not spoken of as discoveries. […] Theories are especially precarious, as this century profoundly testifies. World views can and do often change. Despite these difficulties, it is still true that to count as a discovery a finding must be of at least relatively permanent value, as shown by its inclusion in the generally accepted body of scientific knowledge." (Richard J. Blackwell, "Discovery in the Physical Sciences", 1969)

"Science consists simply of the formulation and testing of hypotheses based on observational evidence; experiments are important where applicable, but their function is merely to simplify observation by imposing controlled conditions." (Henry L Batten, "Evolution of the Earth", 1971)

"A hypothesis is empirical or scientific only if it can be tested by experience. […] A hypothesis or theory which cannot be, at least in principle, falsified by empirical observations and experiments does not belong to the realm of science." (Francisco J Ayala, "Biological Evolution: Natural Selection or Random Walk", American Scientist, 1974)

"An experiment is a failure only when it also fails adequately to test the hypothesis in question, when the data it produces don't prove anything one way or the other." (Robert M Pirsig, "Zen and the Art of Motorcycle Maintenance", 1974)

"Science is systematic organisation of knowledge about the universe on the basis of explanatory hypotheses which are genuinely testable. Science advances by developing gradually more comprehensive theories; that is, by formulating theories of greater generality which can account for observational statements and hypotheses which appear as prima facie unrelated." (Francisco J Ayala, "Studies in the Philosophy of Biology: Reduction and Related Problems", 1974)

"A good scientific law or theory is falsifiable just because it makes definite claims about the world. For the falsificationist, If follows fairly readily from this that the more falsifiable a theory is the better, in some loose sense of more. The more a theory claims, the more potential opportunities there will be for showing that the world does not in fact behave in the way laid down by the theory. A very good theory will be one that makes very wide-ranging claims about the world, and which is consequently highly falsifiable, and is one that resists falsification whenever it is put to the test." (Alan F Chalmers,  "What Is This Thing Called Science?", 1976)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, "Mind and Nature, A necessary unity", 1979)

"The fact must be expressed as data, but there is a problem in that the correct data is difficult to catch. So that I always say 'When you see the data, doubt it!' 'When you see the measurement instrument, doubt it!' [...]For example, if the methods such as sampling, measurement, testing and chemical analysis methods were incorrect, data. […] to measure true characteristics and in an unavoidable case, using statistical sensory test and express them as data." (Kaoru Ishikawa, Annual Quality Congress Transactions, 1981)

"All interpretations made by a scientist are hypotheses, and all hypotheses are tentative. They must forever be tested and they must be revised if found to be unsatisfactory. Hence, a change of mind in a scientist, and particularly in a great scientist, is not only not a sign of weakness but rather evidence for continuing attention to the respective problem and an ability to test the hypothesis again and again." (Ernst Mayr, "The Growth of Biological Thought: Diversity, Evolution and Inheritance", 1982)

"Theoretical scientists, inching away from the safe and known, skirting the point of no return, confront nature with a free invention of the intellect. They strip the discovery down and wire it into place in the form of mathematical models or other abstractions that define the perceived relation exactly. The now-naked idea is scrutinized with as much coldness and outward lack of pity as the naturally warm human heart can muster. They try to put it to use, devising experiments or field observations to test its claims. By the rules of scientific procedure it is then either discarded or temporarily sustained. Either way, the central theory encompassing it grows. If the abstractions survive they generate new knowledge from which further exploratory trips of the mind can be planned. Through the repeated alternation between flights of the imagination and the accretion of hard data, a mutual agreement on the workings of the world is written, in the form of natural law." (Edward O Wilson, "Biophilia", 1984)

"Models are often used to decide issues in situations marked by uncertainty. However statistical differences from data depend on assumptions about the process which generated these data. If the assumptions do not hold, the inferences may not be reliable either. This limitation is often ignored by applied workers who fail to identify crucial assumptions or subject them to any kind of empirical testing. In such circumstances, using statistical procedures may only compound the uncertainty." (David A Greedman & William C Navidi, "Regression Models for Adjusting the 1980 Census", Statistical Science Vol. 1 (1), 1986)

"Science has become a social method of inquiring into natural phenomena, making intuitive and systematic explorations of laws which are formulated by observing nature, and then rigorously testing their accuracy in the form of predictions. The results are then stored as written or mathematical records which are copied and disseminated to others, both within and beyond any given generation. As a sort of synergetic, rigorously regulated group perception, the collective enterprise of science far transcends the activity within an individual brain." (Lynn Margulis & Dorion Sagan, "Microcosmos", 1986)

"Beware of the problem of testing too many hypotheses; the more you torture the data, the more likely they are to confess, but confessions obtained under duress may not be admissible in the court of scientific opinion." (Stephen M. Stigler, "Neutral Models in Biology", 1987)

"Prediction can never be absolutely valid and therefore science can never prove some generalization or even test a single descriptive statement and in that way arrive at final truth." (Gregory Bateson, Mind and Nature: A necessary unity", 1988)

"Science doesn't purvey absolute truth. Science is a mechanism. It's a way of trying to improve your knowledge of nature. It's a system for testing your thoughts against the universe and seeing whether they match. And this works, not just for the ordinary aspects of science, but for all of life. I should think people would want to know that what they know is truly what the universe is like, or at least as close as they can get to it." (Isaac Asimov, [Interview by Bill Moyers] 1988)

"The heart of the scientific method is the problem-hypothesis-test process. And, necessarily, the scientific method involves predictions. And predictions, to be useful in scientific methodology, must be subject to test empirically." (Paul Davies, "The Cosmic Blueprint: New Discoveries in Nature's Creative Ability to, Order the Universe", 1988)

"Science doesn’t purvey absolute truth. Science is a mechanism, a way of trying to improve your knowledge of nature. It’s a system for testing your thoughts against the universe, and seeing whether they match." (Isaac Asimov, [interview with Bill Moyers in The Humanist] 1989)

"A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that’s the only way you can take it in formal hypothesis testing), is always false in the real world. [...] If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what’s the big deal about rejecting it?" (Jacob Cohen, "Things I Have Learned (So Far)", American Psychologist, 1990)

"On this view, we recognize science to be the search for algorithmic compressions. We list sequences of observed data. We try to formulate algorithms that compactly represent the information content of those sequences. Then we test the correctness of our hypothetical abbreviations by using them to predict the next terms in the string. These predictions can then be compared with the future direction of the data sequence. Without the development of algorithmic compressions of data all science would be replaced by mindless stamp collecting - the indiscriminate accumulation of every available fact. Science is predicated upon the belief that the Universe is algorithmically compressible and the modern search for a Theory of Everything is the ultimate expression of that belief, a belief that there is an abbreviated representation of the logic behind the Universe's properties that can be written down in finite form by human beings." (John D Barrow, New Theories of Everything", 1991)

"Scientists use mathematics to build mental universes. They write down mathematical descriptions - models - that capture essential fragments of how they think the world behaves. Then they analyse their consequences. This is called 'theory'. They test their theories against observations: this is called 'experiment'. Depending on the result, they may modify the mathematical model and repeat the cycle until theory and experiment agree. Not that it's really that simple; but that's the general gist of it, the essence of the scientific method." (Ian Stewart & Martin Golubitsky, "Fearful Symmetry: Is God a Geometer?", 1992)

"The amount of understanding produced by a theory is determined by how well it meets the criteria of adequacy - testability, fruitfulness, scope, simplicity, conservatism - because these criteria indicate the extent to which a theory systematizes and unifies our knowledge." (Theodore Schick Jr.,  "How to Think about Weird Things: Critical Thinking for a New Age", 1995)

"The science of statistics may be described as exploring, analyzing and summarizing data; designing or choosing appropriate ways of collecting data and extracting information from them; and communicating that information. Statistics also involves constructing and testing models for describing chance phenomena. These models can be used as a basis for making inferences and drawing conclusions and, finally, perhaps for making decisions." (Fergus Daly et al, "Elements of Statistics", 1995)

"Science is distinguished not for asserting that nature is rational, but for constantly testing claims to that or any other affect by observation and experiment." (Timothy Ferris, "The Whole Shebang: A State-of-the Universe’s Report", 1996)

"There are two kinds of mistakes. There are fatal mistakes that destroy a theory; but there are also contingent ones, which are useful in testing the stability of a theory." (Gian-Carlo Rota, [lecture] 1996)

"Validation is the process of testing how good the solutions produced by a system are. The results produced by a system are usually compared with the results obtained either by experts or by other systems. Validation is an extremely important part of the process of developing every knowledge-based system. Without comparing the results produced by the system with reality, there is little point in using it." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"The rate of the development of science is not the rate at which you make observations alone but, much more important, the rate at which you create new things to test." (Richard Feynman, "The Meaning of It All", 1998)

"Let us regard a proof of an assertion as a purely mechanical procedure using precise rules of inference starting with a few unassailable axioms. This means that an algorithm can be devised for testing the validity of an alleged proof simply by checking the successive steps of the argument; the rules of inference constitute an algorithm for generating all the statements that can be deduced in a finite number of steps from the axioms." (Edward Beltrami, "What is Random?: Chaos and Order in Mathematics and Life", 1999)

"The greatest plus of data modeling is that it produces a simple and understandable picture of the relationship between the input variables and responses [...] different models, all of them equally good, may give different pictures of the relation between the predictor and response variables [...] One reason for this multiplicity is that goodness-of-fit tests and other methods for checking fit give a yes–no answer. With the lack of power of these tests with data having more than a small number of dimensions, there will be a large number of models whose fit is acceptable. There is no way, among the yes–no methods for gauging fit, of determining which is the better model." (Leo Breiman, "Statistical Modeling: The two cultures", Statistical Science 16(3), 2001)

"When significance tests are used and a null hypothesis is not rejected, a major problem often arises - namely, the result may be interpreted, without a logical basis, as providing evidence for the null hypothesis." (David F Parkhurst, "Statistical Significance Tests: Equivalence and Reverse Tests Should Reduce Misinterpretation", BioScience Vol. 51 (12), 2001)

"Visualizations can be used to explore data, to confirm a hypothesis, or to manipulate a viewer. [...] In exploratory visualization the user does not necessarily know what he is looking for. This creates a dynamic scenario in which interaction is critical. [...] In a confirmatory visualization, the user has a hypothesis that needs to be tested. This scenario is more stable and predictable. System parameters are often predetermined." (Usama Fayyad et al, "Information Visualization in Data Mining and Knowledge Discovery", 2002)

"There is a tendency to use hypothesis testing methods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis." (Larry A Wasserman, "All of Statistics: A concise course in statistical inference", 2004)

"In science, for a theory to be believed, it must make a prediction - different from those made by previous theories - for an experiment not yet done. For the experiment to be meaningful, we must be able to get an answer that disagrees with that prediction. When this is the case, we say that a theory is falsifiable - vulnerable to being shown false. The theory also has to be confirmable, it must be possible to verify a new prediction that only this theory makes. Only when a theory has been tested and the results agree with the theory do we advance the statement to the rank of a true scientific theory." (Lee Smolin, "The Trouble with Physics", 2006)

"A type of error used in hypothesis testing that arises when incorrectly rejecting the null hypothesis, although it is actually true. Thus, based on the test statistic, the final conclusion rejects the Null hypothesis, but in truth it should be accepted. Type I error equates to the alpha (α) or significance level, whereby the generally accepted default is 5%." (Lynne Hambleton, "Treasure Chest of Six Sigma Growth Methods, Tools, and Best Practices", 2007)

"Each systems archetype embodies a particular theory about dynamic behavior that can serve as a starting point for selecting and formulating raw data into a coherent set of interrelationships. Once those relationships are made explicit and precise, the 'theory' of the archetype can then further guide us in our data-gathering process to test the causal relationships through direct observation, data analysis, or group deliberation." (Daniel H Kim, "Systems Archetypes as Dynamic Theories", The Systems Thinker Vol. 24 (1), 2013)

"In common usage, prediction means to forecast a future event. In data science, prediction more generally means to estimate an unknown value. This value could be something in the future (in common usage, true prediction), but it could also be something in the present or in the past. Indeed, since data mining usually deals with historical data, models very often are built and tested using events from the past." (Foster Provost & Tom Fawcett, "Data Science for Business", 2013)

"Another way to secure statistical significance is to use the data to discover a theory. Statistical tests assume that the researcher starts with a theory, collects data to test the theory, and reports the results - whether statistically significant or not. Many people work in the other direction, scrutinizing the data until they find a pattern and then making up a theory that fits the pattern." (Gary Smith, "Standard Deviations", 2014)

"Data clusters are everywhere, even in random data. Someone who looks for an explanation will inevitably find one, but a theory that fits a data cluster is not persuasive evidence. The found explanation needs to make sense and it needs to be tested with uncontaminated data." (Gary Smith, "Standard Deviations", 2014)

"Machine learning is a science and requires an objective approach to problems. Just like the scientific method, test-driven development can aid in solving a problem. The reason that TDD and the scientific method are so similar is because of these three shared characteristics: Both propose that the solution is logical and valid. Both share results through documentation and work over time. Both work in feedback loops." (Matthew Kirk, "Thoughtful Machine Learning", 2015)

"Science, at its core, is simply a method of practical logic that tests hypotheses against experience. Scientism, by contrast, is the worldview and value system that insists that the questions the scientific method can answer are the most important questions human beings can ask, and that the picture of the world yielded by science is a better approximation to reality than any other." (John M Greer, "After Progress: Reason and Religion at the End of the Industrial Age", 2015)

"The dialectical interplay of experiment and theory is a key driving force of modern science. Experimental data do only have meaning in the light of a particular model or at least a theoretical background. Reversely theoretical considerations may be logically consistent as well as intellectually elegant: Without experimental evidence they are a mere exercise of thought no matter how difficult they are. Data analysis is a connector between experiment and theory: Its techniques advise possibilities of model extraction as well as model testing with experimental data." (Achim Zielesny, "From Curve Fitting to Machine Learning" 2nd Ed., 2016)

"Bias is error from incorrect assumptions built into the model, such as restricting an interpolating function to be linear instead of a higher-order curve. [...] Errors of bias produce underfit models. They do not fit the training data as tightly as possible, were they allowed the freedom to do so. In popular discourse, I associate the word 'bias' with prejudice, and the correspondence is fairly apt: an apriori assumption that one group is inferior to another will result in less accurate predictions than an unbiased one. Models that perform lousy on both training and testing data are underfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"Early stopping and regularization can ensure network generalization when you apply them properly. [...] With early stopping, the choice of the validation set is also important. The validation set should be representative of all points in the training set. When you use Bayesian regularization, it is important to train the network until it reaches convergence. The sum-squared error, the sum-squared weights, and the effective number of parameters should reach constant values when the network has converged. With both early stopping and regularization, it is a good idea to train the network starting from several different initial conditions. It is possible for either method to fail in certain circumstances. By testing several different initial conditions, you can verify robust network performance." (Mark H Beale et al, "Neural Network Toolbox™ User's Guide", 2017)

"Scientists generally agree that no theory is 100 percent correct. Thus, the real test of knowledge is not truth, but utility." (Yuval N Harari, "Sapiens: A brief history of humankind", 2017)

"Variance is error from sensitivity to fluctuations in the training set. If our training set contains sampling or measurement error, this noise introduces variance into the resulting model. [...] Errors of variance result in overfit models: their quest for accuracy causes them to mistake noise for signal, and they adjust so well to the training data that noise leads them astray. Models that do much better on testing data than training data are overfit." (Steven S Skiena, "The Data Science Design Manual", 2017)

"[...] a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data." (William C Blackwelder)

09 December 2018

Data Science: Distributions (Just the Quotes)

"If the number of experiments be very large, we may have precise information as to the value of the mean, but if our sample be small, we have two sources of uncertainty: (I) owing to the 'error of random sampling' the mean of our series of experiments deviates more or less widely from the mean of the population, and (2) the sample is not sufficiently large to determine what is the law of distribution of individuals." (William S Gosset, "The Probable Error of a Mean", Biometrika, 1908)

"We know not to what are due the accidental errors, and precisely because we do not know, we are aware they obey the law of Gauss. Such is the paradox." (Henri Poincaré, "The Foundations of Science", 1913)

"The problems which arise in the reduction of data may thus conveniently be divided into three types: (i) Problems of Specification, which arise in the choice of the mathematical form of the population. (ii) When a specification has been obtained, problems of Estimation arise. These involve the choice among the methods of calculating, from our sample, statistics fit to estimate the unknow n parameters of the population. (iii) Problems of Distribution include the mathematical deduction of the exact nature of the distributions in random samples of our estimates of the parameters, and of other statistics designed to test the validity of our specification (tests of Goodness of Fit)." (Sir Ronald A Fisher, "Statistical Methods for Research Workers", 1925)

"An inference, if it is to have scientific value, must constitute a prediction concerning future data. If the inference is to be made purely with the help of the distribution theory of statistics, the experiments that constitute evidence for the inference must arise from a state of statistical control; until that state is reached, there is no universe, normal or otherwise, and the statistician’s calculations by themselves are an illusion if not a delusion. The fact is that when distribution theory is not applicable for lack of control, any inference, statistical or otherwise, is little better than a conjecture. The state of statistical control is therefore the goal of all experimentation. (William E Deming, "Statistical Method from the Viewpoint of Quality Control", 1939)

"Normality is a myth; there never has, and never will be, a normal distribution." (Roy C Geary, "Testing for Normality", Biometrika Vol. 34, 1947)

"A good estimator will be unbiased and will converge more and more closely (in the long run) on the true value as the sample size increases. Such estimators are known as consistent. But consistency is not all we can ask of an estimator. In estimating the central tendency of a distribution, we are not confined to using the arithmetic mean; we might just as well use the median. Given a choice of possible estimators, all consistent in the sense just defined, we can see whether there is anything which recommends the choice of one rather than another. The thing which at once suggests itself is the sampling variance of the different estimators, since an estimator with a small sampling variance will be less likely to differ from the true value by a large amount than an estimator whose sampling variance is large." (Michael J Moroney, "Facts from Figures", 1951)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney, "Facts from Figures", 1951)

"[A] sequence is random if it has every property that is shared by all infinite sequences of independent samples of random variables from the uniform distribution." (Joel N Franklin, 1962)

"Mathematical statistics provides an exceptionally clear example of the relationship between mathematics and the external world. The external world provides the experimentally measured distribution curve; mathematics provides the equation (the mathematical model) that corresponds to the empirical curve. The statistician may be guided by a thought experiment in finding the corresponding equation." (Marshall J Walker, "The Nature of Scientific Thought", 1963)

"Pencil and paper for construction of distributions, scatter diagrams, and run-charts to compare small groups and to detect trends are more efficient methods of estimation than statistical inference that depends on variances and standard errors, as the simple techniques preserve the information in the original data." (William E Deming, "On Probability as Basis for Action" American Statistician Vol. 29 (4), 1975)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"At the heart of probabilistic statistical analysis is the assumption that a set of data arises as a sample from a distribution in some class of probability distributions. The reasons for making distributional assumptions about data are several. First, if we can describe a set of data as a sample from a certain theoretical distribution, say a normal distribution (also called a Gaussian distribution), then we can achieve a valuable compactness of description for the data. For example, in the normal case, the data can be succinctly described by giving the mean and standard deviation and stating that the empirical (sample) distribution of the data is well approximated by the normal distribution. A second reason for distributional assumptions is that they can lead to useful statistical procedures. For example, the assumption that data are generated by normal probability distributions leads to the analysis of variance and least squares. Similarly, much of the theory and technology of reliability assumes samples from the exponential, Weibull, or gamma distribution. A third reason is that the assumptions allow us to characterize the sampling distribution of statistics computed during the analysis and thereby make inferences and probabilistic statements about unknown aspects of the underlying distribution. For example, assuming the data are a sample from a normal distribution allows us to use the t-distribution to form confidence intervals for the mean of the theoretical distribution. A fourth reason for distributional assumptions is that understanding the distribution of a set of data can sometimes shed light on the physical mechanisms involved in generating the data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"Symmetry is also important because it can simplify our thinking about the distribution of a set of data. If we can establish that the data are (approximately) symmetric, then we no longer need to describe the  shapes of both the right and left halves. (We might even combine the information from the two sides and have effectively twice as much data for viewing the distributional shape.) Finally, symmetry is important because many statistical procedures are designed for, and work best on, symmetric data." (John M Chambers et al, "Graphical Methods for Data Analysis", 1983)

"We will use the convenient expression 'chosen at random' to mean that the probabilities of the events in the sample space are all the same unless some modifying words are near to the words 'at random'. Usually we will compute the probability of the outcome based on the uniform probability model since that is very common in modeling simple situations. However, a uniform distribution does not imply that it comes from a random source; […]" (Richard W Hamming, "The Art of Probability for Scientists and Engineers", 1991)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"Fitting data means finding mathematical descriptions of structure in the data. An additive shift is a structural property of univariate data in which distributions differ only in location and not in spread or shape. […] The process of identifying a structure in data and then fitting the structure to produce residuals that have the same distribution lies at the heart of statistical analysis. Such homogeneous residuals can be pooled, which increases the power of the description of the variation in the data." (William S Cleveland, "Visualizing Data", 1993)

"Many good things happen when data distributions are well approximated by the normal. First, the question of whether the shifts among the distributions are additive becomes the question of whether the distributions have the same standard deviation; if so, the shifts are additive. […] A second good happening is that methods of fitting and methods of probabilistic inference, to be taken up shortly, are typically simple and on well understood ground. […] A third good thing is that the description of the data distribution is more parsimonious." (William S Cleveland, "Visualizing Data", 1993)

"Probabilistic inference is the classical paradigm for data analysis in science and technology. It rests on a foundation of randomness; variation in data is ascribed to a random process in which nature generates data according to a probability distribution. This leads to a codification of uncertainly by confidence intervals and hypothesis tests." (William S Cleveland, "Visualizing Data", 1993)

"When distributions are compared, the goal is to understand how the distributions shift in going from one data set to the next. […] The most effective way to investigate the shifts of distributions is to compare corresponding quantiles." (William S Cleveland, "Visualizing Data", 1993)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland, "Visualizing Data", 1993)

"A normal distribution is most unlikely, although not impossible, when the observations are dependent upon one another - that is, when the probability of one event is determined by a preceding event. The observations will fail to distribute themselves symmetrically around the mean." (Peter L Bernstein, "Against the Gods: The Remarkable Story of Risk", 1996)

"Linear regression assumes that in the population a normal distribution of error values around the predicted Y is associated with each X value, and that the dispersion of the error values for each X value is the same. The assumptions imply normal and similarly dispersed error distributions." (Fred C Pampel, "Linear Regression: A primer", 2000)

"The principle of maximum entropy is employed for estimating unknown probabilities (which cannot be derived deductively) on the basis of the available information. According to this principle, the estimated probability distribution should be such that its entropy reaches maximum within the constraints of the situation, i.e., constraints that represent the available information. This principle thus guarantees that no more information is used in estimating the probabilities than available." (George J Klir & Doug Elias, "Architecture of Systems Problem Solving" 2nd Ed, 2003) 

"The principle of minimum entropy is employed in the formulation of resolution forms and related problems. According to this principle, the entropy of the estimated probability distribution, conditioned by a particular classification of the given events (e.g., states of the variable involved), is minimum subject to the constraints of the situation. This principle thus guarantees that all available information is used, as much as possible within the given constraints (e.g., required number of states), in the estimation of the unknown probabilities." (George J Klir & Doug Elias, "Architecture of Systems Problem Solving" 2nd Ed, 2003)

"In the laws of probability theory, likelihood distributions are fixed properties of a hypothesis. In the art of rationality, to explain is to anticipate. To anticipate is to explain." (Eliezer S. Yudkowsky, "A Technical Explanation of Technical Explanation", 2005)

"The central limit theorem says that, under conditions almost always satisfied in the real world of experimentation, the distribution of such a linear function of errors will tend to normality as the number of its components becomes large. The tendency to normality occurs almost regardless of the individual distributions of the component errors. An important proviso is that several sources of error must make important contributions to the overall error and that no particular source of error dominate the rest." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"Two things explain the importance of the normal distribution: (1) The central limit effect that produces a tendency for real error distributions to be 'normal like'. (2) The robustness to nonnormality of some common statistical procedures, where 'robustness' means insensitivity to deviations from theoretical normality." (George E P Box et al, "Statistics for Experimenters: Design, discovery, and innovation" 2nd Ed., 2005)

"For some scientific data the true value cannot be given by a constant or some straightforward mathematical function but by a probability distribution or an expectation value. Such data are called probabilistic. Even so, their true value does not change with time or place, making them distinctly different from  most statistical data of everyday life." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"To fulfill the requirements of the theory underlying uncertainties, variables with random uncertainties must be independent of each other and identically distributed. In the limiting case of an infinite number of such variables, these are called normally distributed. However, one usually speaks of normally distributed variables even if their number is finite." (Manfred Drosg, "Dealing with Uncertainties: A Guide to Error Analysis", 2007)

"Traditional statistics is strong in devising ways of describing data and inferring distributional parameters from sample. Causal inference requires two additional ingredients: a science-friendly language for articulating causal knowledge, and a mathematical machinery for processing that knowledge, combining it with data and drawing new causal conclusions about a phenomenon." (Judea Pearl, "Causal inference in statistics: An overview", Statistics Surveys 3, 2009)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg, "Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"It is not enough to give a single summary for a distribution - we need to have an idea of the spread, sometimes known as the variability. [...] The range is a natural choice, but is clearly very sensitive to extreme values [...] In contrast the inter-quartile range (IQR) is unaffected by extremes. This is the distance between the 25th and 75th percentiles of the data and so contains the ‘central half’ of the numbers [...] Finally the standard deviation is a widely used measure of spread. It is the most technically complex measure, but is only really appropriate for well-behaved symmetric data since it is also unduly influenced by outlying values." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"[...] the Central Limit Theorem [...] says that the distribution of sample means tends towards the form of a normal distribution with increasing sample size, almost regardless of the shape of the original data distribution." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"There is no ‘correct’ way to display sets of numbers: each of the plots we have used has some advantages: strip-charts show individual points, box-and-whisker plots are convenient for rapid visual summaries, and histograms give a good feel for the underlying shape of the data distribution." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

More quotes on "Distributions" at the-web-of-knowledge.blogspot.com

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.