Showing posts with label Data science. Show all posts
Showing posts with label Data science. Show all posts

23 June 2026

🖍️James G Scott - Collected Quotes

"A histogram is a great way to depict the distribution of a numerical variable. To construct one, we first partition the range of possible outcomes (here, temperatures) into a set of disjoint intervals ('bins'). Next, we count the number of cases that fall into each bin. Finally, we draw a rectangle over each bin whose height is equal to the count within each bin." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"A model is a metaphor, a description of a system that helps us to reason more clearly. Like all metaphors, models are approximations, and will never account for every last detail. A useful mantra here is: all models are wrong, but some models are useful." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[...] always remember that the construction of an ANOVA table is inherently sequential. For example, first we add the clutter variable, which remains in the model at every subsequent step; then we add the distance variable, which remains in the model at every subsequent step; and so forth. Thus the actual question being answered at each stage of an analysis of variance is: how much variation in the response can this new variable predict, in the context of what has already been predicted by other variables in the model? This point - the importance of context in interpreting an ANOVA table - is subtle, but important." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"An obvious question is: do bootstrapped confidence intervals satisfy the frequentist coverage property? If your sample is fairly representative of the population, then the answer is a qualified yes. That is, the bootstrapping procedure yields nominal X% intervals that cover the true value 'approximately' X% of the time. Moreover, as the size of the original sample gets bigger, the quality of the approximation gets better. Alas, it is necessary to appeal to some very advanced probability theory to put both of these claims on firm footing." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"At the core of the resampling approach to statistical inference lies a simple idea. Most of the time, we can’t feasibly take repeated samples of size n from the population, to see how our estimate changes from one sample to the next. But we can repeatedly take samples of size n from the sample itself, and apply our estimator afresh to each notional sample. The idea is that the variability of the estimates across all these samples can be used to approximate our estimator’s true sampling distribution. This process - pretending that our sample is the whole population, and taking repeated samples of size n with replacement from our original sample of size n - is called bootstrap resampling, or just bootstrapping" (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"By themselves, sums of squares are hard to interpret, because they are measured in squared units of the Y variable. But their ratios are highly meaningful. In fact, the ratio of PV to TV - or what fraction of the total variation has been predicted by the model - is one of the most frequently quoted summary measures in all of statistical modeling. This ratio is called the coefficient of determination, and is usually denoted by the symbol R2 [...] The correct interpretation of R2 sometimes trips people up, and is therefore worth repeating: it is the proportion of variance in the data that can be predicted using the statistical model in question." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[boxplots] allow you to assess variability both between and within the groups. [...] Each box shows the within-group variability, as measured by the interquartile range of the numerical variable (SAT score) for all cases in that category. The middle line within each box is the median of that category, and the differences between these medians give you a sense of the between-group variability. In this boxplot, the whiskers extend outside the box no further than 1.5 times the interquartile range. Points outside this interval are shown as individual dots." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Good estimators are those that usually yield estimates close to the truth, with minimal variation. Therefore, we typically summarize a sampling distribution using its standard deviation, which we refer to as the standard error. In quoting the standard error of an estimator’s sampling distribution, you are saying: 'If I were to take repeated samples from the population and use this estimatorfor every sample, my estimate is typically off from the truth by about this much.' Notice again that this is a claim about a procedure, not a particular estimate. The bigger the standard error, the less stable the estimator across different samples, and the less you can trust the estimate for any particular sample." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"In fitting statistical models, we typically equate the trustworthiness of a procedure with its stability under the influence of luck, and we seek to measure the degree to which that procedure might have given a different answer if the forces of randomness had made the world look a bit different. Specifically, the question we seek to answer is: 'if our data set had been different merely due to chance, would our answer have been different, too?'" (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Model-building requires much more than just technical knowledge of statistical ideas. It also requires care and judgment, and cannot be reduced to a flowchart, a table of formulas, or a tidy set of numerical summaries that wring every last drop of truth from a data set. There is almost never a single 'right' statistical model for some problem. But there are definitely such things as good models and bad models, and learning to tell the difference is important. Just remember: calling a model good or bad requires knowing both the tool and the task." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"[...] complexity sometimes comes at the expense of explanatory power. We must avoid building models calibrated so perfectly to past experience that they do not generalize to future cases." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"It is common to view a statistical model as nothing more than a recipe for calculating the fitted values, and to think that the residuals are just the errors made by this model. But we’ll have a richer picture if instead we view the residuals as part of the model. If you’ve ignored the variation in the residuals, then you really haven’t specified a complete forecast." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Resampling won’t yield the true sampling distribution of an estimator, but it is often good enough for approximating the standard error (which you’ll remember is just the standard deviation of the sampling distribution). We use the term bootstrapped standard error for the standard deviation of the bootstrapped sampling distribution. The bootstrapped standard error is an estimate of the true standard error." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"Tables are almost always the best way to display categorical data sets with few classifying variables, for the simple reason that they convey a lot of information in a small space." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

"The residuals from a regression model are sometimes called 'errors'. This is especially true in experimental science, where measurements of some Y variable will be taken at different values of the X variable (called design points), and where noisy measurement instruments can introduce random errors into theobservations. But in many cases this interpretation of a residual as an error can be misleading. A regression model can still give a nonzero residual, even if there is no mistake in the measurement of the Y variable. It’s often far more illuminating to think of the residual as the part of the Y variable that it is left unpredicted by X." (James G Scott, "Statistical Modeling: A Gentle Introduction", 2017)

🤖Prompt Engineering: Large Language Modeld [LLMs] (Just the Quotes)

"Another problem that can be confusing is that LLMs seldom put out the same thing twice. [...] Traditional databases are straightforward - you ask for something specific, and you get back exactly what was stored. Search engines work similarly, finding existing information. LLMs work differently. They analyze massive amounts of text data to understand statistical patterns in language. The model processes information through multiple layers, each capturing different aspects - from simple word patterns to complex relationships between ideas." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Generative AI for coding and language tools is based on the LLM concept. A large language model is a type of neural network that processes and generates text in a humanlike way. It does this by being trained on a massive dataset of text, which allows it to learn human language patterns, as described previously. It lets LLMs translate, write, and answer questions with text. LLMs can contain natural language, source code, and  more." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Generative AI tools for coding are sometimes inaccurate. They can produce results that look good but are wrong. This is common with LLMs. They can write code or chat like a person. And sometimes, they share information that’s just plain wrong. Not just a bit off, but totally backwards or nonsense. And they say it so confidently! We call this 'hallucinating', which is a funny term, but it makes sense." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"In prompt engineering, we customize the prompts or questions we give the model to get more accurate or insightful responses. The way a prompt is structured has a massive impact on how well a model understands the task at hand and, ultimately, how well it performs. Given LLMs’ versatility, prompt engineering has become an important skill for getting the most out of these models across different domains and tasks. The key is to understand how different prompt structures lead to different model behaviors. There are various strategies - ranging from simple one-shot prompting to more complex techniques like chain-of-thought prompting - that can significantly improve the effectiveness of LLMs." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"It’s essentially a sophisticated prediction system. Instead of looking up stored answers, an LLM calculates probabilities to determine what text should come next. While these predictions are often accurate, they’re still predictions - which is why it’s crucial to verify any code or factual claims the model generates. This probabilistic nature makes LLMs powerful tools for generating text and code but also means they can make mistakes, even when seeming very confident. Understanding this helps set realistic expectations about what these tools can and cannot do reliably."  (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"LLMs can inadvertently produce toxic content or biased language, leak private information, or be vulnerable to jailbreak prompts. These risks carry serious legal and reputational consequences. To mitigate them, evaluation tools must integrate automated filters and classifiers that flag problematic outputs in real time, as we discussed earlier in the chapter. Metrics such as safety scores, toxicity indices, and bias measurements should be collected alongside model metadata for auditing purposes." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"LLM deployment failures often trace back not to the model itself, but to the prompts it receives. In production environments, prompts are rarely fixed, handcrafted snippets. Instead, they are dynamically generated, assembled from templates, and parameterized based on upstream data sources or evolving user state. This dynamism introduces complexity and variability that can subtly undermine the system’s performance if not carefully managed." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"LLMs excel at understanding context and making associations among words, phrases, and concepts to provide relevant information based on the input query or prompt. While structured knowledge bases rely on humancurated data, LLMs can  automatically extract knowledge from unstructured text. When trained on diverse textual sources, they can process a vast amount of information without explicit human intervention. However, this also introduces a challenge, as the model can learn biased or incorrect information from the training data." (Abi Aryan, "LLMOps: Managing Large Language Models in Production", 2025)

"Prompt engineering is a crucial aspect of working with large language models (LLMs) like OpenAI's GPT, Google's PaLM, and others in the space of AI and machine learning. It involves the art and science of designing inputs (prompts) in a way that maximizes the quality, relevance, and accuracy of the AI-generated output. As the capabilities of AI continue to improve, the task of crafting effective prompts has become an essential skill for anyone leveraging these tools for real-world applications, including natural language understanding, translation, summarization, code generation, and more." (Code Planet, "Python for Large Language Models", 2025)

"[...] LLMs raise serious concerns about ethics, bias and fairness, errors in reasoning, hallucinations, and misuse (e.g., misinformation and disinformation). These concerns are exacerbated by modern LLMs being both literal and figurative 'black boxes': Literal black boxes because many advanced AI systems are proprietary and the weights (trained parameters of the models) are not released to the public; and figurative black boxes because even the open-source AI models are so complicated that understanding them and developing safety guardrails has thus far proven extremely difficult." (Mike X Cohen,"50 ML Projects To Understand LLMs", 2026)

"ML is a useful - and under-utilized - framework for studying LLMs. For one thing, LLMs are literally composed of simple ML algorithms (linear weighted averages and nonlinear transformations). Furthermore, using ML techniques like regression, classification, and clustering, can help reveal how concepts like grammar rules are represented inside LLMs. And finally, many people find LLMs to be intimidatingly complicated while finding ML to be much more approachable. Thus, using ML to study LLMs involves using simple tools to understand complicated tools." (Mike X Cohen,"50 ML Projects To Understand LLMs", 2026)

28 May 2026

🔭Data Science: Chance (Just the Quotes)

"The universal cause is one thing, a particular cause another. An effect can be haphazard with respect to the plan of the second, but not of the first. For an effect is not taken out of the scope of one particular cause save by another particular cause which prevents it, as when wood dowsed with water, will not catch fire. The first cause, however, cannot have a random effect in its own order, since all particular causes are comprehended in its causality. When an effect does escape from a system of particular causality, we speak of it as fortuitous or a chance happening […]" (Thomas Aquinas, "Summa Theologica", cca. 1266-1273)

"[…] chance, that is, an infinite number of events, with respect to which our ignorance will not permit us to perceive their causes, and the chain that connects them together. Now, this chance has a greater share in our education than is imagined. It is this that places certain objects before us and, in consequence of this, occasions more happy ideas, and sometimes leads us to the greatest discoveries […]" (Claude A Helvetius, "On Mind", 1751)

"But ignorance of the different causes involved in the production of events, as well as their complexity, taken together with the imperfection of analysis, prevents our reaching the same certainty about the vast majority of phenomena. Thus there are things that are uncertain for us, things more or less probable, and we seek to compensate for the impossibility of knowing them by determining their different degrees of likelihood. So it was that we owe to the weakness of the human mind one of the most delicate and ingenious of mathematical theories, the science of chance or probability." (Pierre-Simon Laplace,Recherches, 1º, sur l'Intégration des Équations Différentielles aux Différences Finies, et sur leur Usage dans la Théorie des Hasards", 1773)

"Probability has reference partly to our ignorance, partly to our knowledge [..] The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all cases possible is the measure of this probability, which is thus simply a fraction whose number is the number of favorable cases and whose denominator is the number of all cases possible." (Pierre-Simon Laplace, "Philosophical Essay on Probabilities", 1814)

"The facts of greatest outcome are those we think simple; may be they really are so, because they are influenced only by a small number of well-defined circumstances, may be they take on an appearance of simplicity because the various circumstances upon which they depend obey the laws of chance and so come to mutually compensate." (Henri Poincaré, "The Foundations of Science", 1913)

"The most important application of the theory of probability is to what we may call 'chance-like' or 'random' events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe - after many unsuccessful attempts - that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events." (Karl R Popper,The Logic of Scientific Discovery", 1934)

"In relation to any experiment we may speak of this hypothesis as the null hypothesis, and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis." (Ronald Fisher,The Design of Experiments", 1935)

"The fundamental difference between engineering with and without statistics boils down to the difference between the use of a scientific method based upon the concept of laws of nature that do not allow for chance or uncertainty and a scientific method based upon the concepts of laws of probability as an attribute of nature." (Walter A Shewhart, 1940)

"If the chance of error alone were the sole basis for evaluating methods of inference, we would never reach a decision, but would merely keep increasing the sample size indefinitely." (C West Churchman, "Theory of Experimental Inference", 1948)

"It will, of course, happen but rarely that the proportions will be identical, even if no real association exists. Evidently, therefore, we need a significance test to reassure ourselves that the observed difference of proportion is greater than could reasonably be attributed to chance. The significance test will test the reality of the association, without telling us anything about the intensity of association. It will be apparent that we need two distinct things:" (a) a test of significance, to be used on the data first of all, and" (b) some measure of the intensity of the association, which we shall only be justified in using if the significance test confirms that the association is real." (Michael J Moroney,Facts from Figures", 1951)

"People have erroneous intuitions about the laws of chance. In particular, they regard a sample randomly drawn from a population as highly representative, that is, similar to the population in all essential characteristics. The prevalence of the belief and its unfortunate consequences for psychological research are illustrated by the responses of professional psychologists to a questionnaire concerning research decisions." (Amos Tversky & Daniel Kahneman,Belief in the law of small numbers", Psychological Bulletin 76(2), 1971)

"Averaging results, whether weighted or not, needs to be done with due caution and commonsense. Even though a measurement has a small quoted error it can still be, not to put too fine a point on it, wrong. If two results are in blatant and obvious disagreement, any average is meaningless and there is no point in performing it. Other cases may be less outrageous, and it may not be clear whether the difference is due to incompatibility or just unlucky chance." (Roger J Barlow,Statistics: A guide to the use of statistical methods in the physical sciences", 1989)

"[…] an honest exploratory study should indicate how many comparisons were made […] most experts agree that large numbers of comparisons will produce apparently statistically significant findings that are actually due to chance. The data torturer will act as if every positive result confirmed a major hypothesis. The honest investigator will limit the study to focused questions, all of which make biologic sense. The cautious reader should look at the number of ‘significant’ results in the context of how many comparisons were made." (James L Mills, "Data torturing", New England Journal of Medicine, 1993)

"To understand what kinds of problems are solvable by the Monte Carlo method, it is important to note that the method enables simulation of any process whose development is influenced by random factors. Second, for many mathematical problems involving no chance, the method enables us to artificially construct a probabilistic model" (or several such models), making possible the solution of the problems." (Ilya M Sobol, "A Primer for the Monte Carlo Method", 1994)

"Regression to the mean' […] says that, in any series of events where chance is involved, very good or bad performances, high or low scores, extreme events, etc. tend on the average, to be followed by more average performance or less extreme events. If we do extremely well, we're likely to do worse the next time, while if we do poorly, we're likely to do better the next time. But regression to the mean is not a natural law. Merely a statistical tendency. And it may take a long time before it happens." (Peter Bevelin,Seeking Wisdom: From Darwin to Munger",  2003)

"Each systematic error associated with a given measurement process is always of the same sign and magnitude. It persists measurement after measurement. When its existence is established, such an error is called a bias, and reasonable effort should be made to correct for it. Sometimes the observed bias is the result of the concurrence of several biases that cannot or at least have not been individually identified. One of the purposes of statistical treatment of data is to decide whether an apparently erroneous result is real and indicates a bias or whether it could happen as the result of chance variability, even in a well-behaved measurement system. There can be, of course, biases that have not been identified as such. Also, there are limits to how well one can correct for known biases, and this inadequacy must be considered when limits of uncertainty are assigned to data." (Cheryl Cihon & John K Taylor, "Statistical Techniques for Data Analysis" 2nd. ed., 2005)

"Probability is about making decisions under uncertainty - indeed, where there is no uncertainty, no decision is required, as you would simply choose the outcome that you know will occur. A 'good' or 'rational' decision favours the Cartesian principle that ‘when it is not in our power to follow what is true, we ought to follow what is most probable’. Of course, rational decisions sometimes turn out to be wrong. That does not mean that the decisions were bad - they may have been the best choices, given the information available at the time. […] In the long run, the vagaries of chance tend to even out, but in particular cases it can happen that the long shot comes in first. This is the corollary of a 'good' decision that has bad consequences - a 'bad' or 'irrational' decision that turns out to be right." (Alan Graham, "Developing Thinking in Statistics", 2006) 

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"In bagging, generating complementary base-learners is left to chance and to the unstability of the learning method. In boosting, we actively try to generate complementary base-learners by training the next learner boosting on the mistakes of the previous learners." (Ethem Alpaydin, "Introduction to Machine Learning" 2nd Ed, 2010)

"Be careful not to confuse clustering and stratification. Even though both of these sampling strategies involve dividing the population into subgroups, both the way in which the subgroups are sampled and the optimal strategy for creating the subgroups are different. In stratified sampling, we sample from every stratum, whereas in cluster sampling, we include only selected whole clusters in the sample. Because of this difference, to increase the chance of obtaining a sample that is representative of the population, we want to create homogeneous groups for strata and heterogeneous" (reflecting the variability in the population) groups for clusters." (Roxy Peck et al,Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"The closer that sample-selection procedures approach the gold standard of random selection - for which the definition is that every individual in the population has an equal chance of appearing in the sample - the more we should trust them. If we don’t know whether a sample is random, any statistical measure we conduct may be biased in some unknown way." (Richard E Nisbett,Mindware: Tools for Smart Thinking", 2015)

"Statistical significance is a concept used by scientists and researchers to set an objective standard that can be used to determine whether or not a particular relationship 'statistically' exists in the data. Scientists test for statistical significance to distinguish between whether an observed effect is present in the data" (given a high degree of probability), or just due to chance. It is important to note that finding a statistically significant relationship tells us nothing about whether a relationship is a simple correlation or a causal one, and it also can’t tell us anything about whether some omitted factor is driving the result." (John H Johnson & Mike Gluck, "Everydata: The misinformation hidden in the little data you consume every day", 2016)

"In statistics, the word 'significant' means that the results passed mathematical tests such as t-tests, chi-square tests, regression, and principal components analysis" (there are hundreds). Statistical significance tests quantify how easily pure chance can explain the results. With a very large number of observations, even small differences that are trivial in magnitude can be beyond what our models of change and randomness can explain. These tests don’t know what’s noteworthy and what’s not - that’s a human judgment." (Daniel J Levitin,Weaponized Lies", 2017)

"To be any good, a sample has to be representative. A sample is representative if every person or thing in the group you’re studying has an equally likely chance of being chosen. If not, your sample is biased. […] The job of the statistician is to formulate an inventory of all those things that matter in order to obtain a representative sample. Researchers have to avoid the tendency to capture variables that are easy to identify or collect data on - sometimes the things that matter are not obvious or are difficult to measure." (Daniel J Levitin, "Weaponized Lies", 2017)

"This problem with adding additional variables is referred to as the curse of dimensionality. If you add enough variables into your black box, you will eventually find a combination of variables that performs well - but it may do so by chance. As you increase the number of variables you use to make your predictions, you need exponentially more data to distinguish true predictive capacity from luck." (Carl T Bergstrom & Jevin D West, "Calling Bullshit: The Art of Skepticism in a Data-Driven World", 2020)

"A well-known theorem called the 'no free lunch' theorem proves exactly what we anecdotally witness when designing and building learning systems. The theorem states that any bias-free learning system will perform no better than chance when applied to arbitrary problems. This is a fancy way of stating that designers of systems must give the system a bias deliberately, so it learns what’s intended. As the theorem states, a truly bias- free system is useless." (Erik J Larson,The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do", 2021)

"Visualizations can remove the background noise from enormous sets of data so that only the most important points stand out to the intended audience. This is particularly important in the era of big data. The more data there is, the more chance for noise and outliers to interfere with the core concepts of the data set." (Kate Strachnyi, "ColorWise: A Data Storyteller’s Guide to the Intentional Use of Color", 2023)

22 May 2026

🔭Data Science: Asymmetry (Just the Quotes)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney,Facts from Figures", 1951)

“Rut seldom is asymmetry merely the absence of symmetry. Even in asymmetric designs one feels symmetry as the norm from which one deviates under the influence of forces of non-formal character.” (Hermann Weyl, “Symmetry”, 1952)

"If a distribution were perfectly symmetrical, all symmetry-plot points would be on the diagonal line. Off-line points indicate asymmetry. Points fall above the line when distance above the median is greater than corresponding distance below the median. A consistent run of above-the-line points indicates positive skew; a run of below-the-line points indicates negative skew." (Lawrence C Hamilton,Regression with Graphics: A second course in applied statistics", 1991)

“An asymmetry in the present is understood as having originated from a past symmetry.” (Michael Leyton, “Symmetry, Causality, Mind”, 1992)

"Chaos demonstrates that deterministic causes can have random effects […] There's a similar surprise regarding symmetry: symmetric causes can have asymmetric effects. […] This paradox, that symmetry can get lost between cause and effect, is called symmetry-breaking. […] From the smallest scales to the largest, many of nature's patterns are a result of broken symmetry; […]" (Ian Stewart & Martin Golubitsky,Fearful Symmetry: Is God a Geometer?", 1992)

“Approximate symmetry is a softening of the hard dichotomy between symmetry and asymmetry. The extent of deviation from exact symmetry that can still be considered approximate symmetry will depend on the context and the application and could very well be a matter of personal taste.” (Joe Rosen, “Symmetry Rules: How Science and Nature Are Founded on Symmetry”, 2008)

"[…] in cybernetics, control is seen not as a function of one agent over something else, but as residing within circular causal networks, maintaining stabilities in a system. Circularities have no beginning, no end and no asymmetries. The control metaphor of communication, by contrast, punctuates this circularity unevenly. It privileges the conceptions and actions of a designated controller by distinguishing between messages sent in order to cause desired effects and feedback that informs the controller of successes or failures." (Klaus Krippendorff,On Communicating: Otherness, Meaning, and Information", 2009)

“[…] asymmetry can be defined only relative to symmetry, and vice versa. Asymmetric elements in paintings or buildings are most effective when superimposed against a background of symmetry.” (Alan Lightman, “The Accidental Universe: The World You Thought You Knew”, 2014)

"The higher the dimension, in other words, the higher the number of possible interactions, and the more disproportionally difficult it is to understand the macro from the micro, the general from the simple units. This disproportionate increase of computational demands is called the curse of dimensionality." (Nassim N Taleb,Skin in the Game: Hidden Asymmetries in Daily Life", 2018)

"Many statistical procedures perform more effectively on data that are normally distributed, or at least are symmetric and not excessively kurtotic" (fat-tailed), and where the mean and variance are approximately constant. Observed time series frequently require some form of transformation before they exhibit these distributional properties, for in their 'raw' form they are often asymmetric." (Terence C Mills,Applied Time Series Analysis: A practical guide to modeling and forecasting", 2019)

17 May 2026

🔭Data Science: Misconceptions (Just the Quotes)

"Science does not begin with facts; one of its tasks is to uncover the facts by removing misconceptions." (Lancelot L Whyte, "Accent on Form", 1954)

"A common misconception is that an effect exists only if it is statistically significant and that it does not exist if it is not [statistically significant]." (Jonas Ranstam, "A common misconception about p-value and its consequences", Acta Orthopaedica Scandinavica 67, 1996)

"The standard deviation (often SD) is a measure of variability. When we calculate the standard deviation of a sample, we are using it as an estimate of the variability of the population from which the sample was drawn. For data with a normal distribution, about 95% of individu als will have values within 2 standard deviations of the mean, the other 5% being equally scattered above and below these limits. Contrary to popular misconception, the standard deviation is a valid measure of variability regardless of the distribution. About 95% of observa tions of any distribution usually fall within the 2 standard deviation limits, though those outside may all be at one end. We may choose a different summary statistic, how ever, when data have a skewed distribution." (Douglas G Altman & J Martin Bland, "Statistics Notes: Standard Deviations And Standard Errors", British Medical Journal Vol. 331 (7521) 2005)

"[...] the term statistical misconception refers to any of several widely held but incorrect notions about statistical concepts, about procedures for analyzing data and about the meaning of results produced by such analyses. To illustrate, many people think that (1) normal curves are bell shaped, (2) a correlation coeffi cient should never be used to address questios of causality, and (3) the level of signifi cance dictates the probability of a Type I error. Some people, of course, have only one or two (rather than all three) of these misconceptions, and a few individuals realize that all three of those beliefs are false."(Schuyler W Huck, "Statistical Misconceptions", 2008)

"Science would be better understood if we called theories ‘misconceptions’ from the outset, instead of only after we have discovered their successors." (David Deutsch, "Beginning of Infinity", 2011)

"A popular misconception holds that the era of Big Data means the end of a need for sampling. In fact, the proliferation of data of varying quality and relevance reinforces the need for sampling as a tool to work efficiently with a variety of data, and minimize bias. Even in a Big Data project, predictive models are typically developed and piloted with samples." (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"An oft-repeated rule of thumb in any sort of statistical model fitting is 'you can't fit a model with more parameters than data points'. This idea appears to be as wide-spread as it is incorrect. On the contrary, if you construct your models carefully, you can fit models with more parameters than datapoints [...]. A model with more parameters than datapoints is known as an under-determined system, and it's a common misperception that such a model cannot be solved in any circumstance. [...] this misconception, which I like to call the 'model complexity myth' [...] is not true in general, it is true in the specific case of simple linear models, which perhaps explains why the myth is so pervasive." (Jake Vanderplas", "The Model Complexity Myth", 2015)


16 May 2026

🔭Data Science: Central Tendency (Just the Quotes)

"An average value is a single value within the range of the data that is used to represent all of the values in the series. Since an average is somewhere within the range of the data, it is sometimes called a measure of central value." (Frederick E Croxton & Dudley J Cowden, "Practical Business Statistics", 1937

"A good estimator will be unbiased and will converge more and more closely (in the long run) on the true value as the sample size increases. Such estimators are known as consistent. But consistency is not all we can ask of an estimator. In estimating the central tendency of a distribution, we are not confined to using the arithmetic mean; we might just as well use the median. Given a choice of possible estimators, all consistent in the sense just defined, we can see whether there is anything which recommends the choice of one rather than another. The thing which at once suggests itself is the sampling variance of the different estimators, since an estimator with a small sampling variance will be less likely to differ from the true value by a large amount than an estimator whose sampling variance is large." (Michael J Moroney, "Facts from Figures", 1951)

"The mode would form a very poor basis for any further calculations of an arithmetical nature, for it has deliberately excluded arithmetical precision in the interests of presenting a typical result. The arithmetic average, on the other hand, excellent as it is for numerical purposes, has sacrificed its desire to be typical in favour of numerical accuracy. In such a case it is often desirable to quote both measures of central tendency.(Michael J Moroney,Facts from Figures", 1951)

"An average is sometimes called a 'measure of central tendency' because individual values of the variable usually cluster around it. Averages are useful, however, for certain types of data in which there is little or no central tendency." (William A Spirr & Charles P Bonini,Statistical Analysis for Business Decisions" 3rd Ed., 1967)

"Central tendency is the formal expression for the notion of where data is centered, best understood by most readers as 'average'. There is no one way of measuring where data are centered, and different measures provide different insights." (Charles Livingston & Paul Voakes,Working with Numbers and Statistics: A handbook for journalists", 2005)

"Distributional shape is an important attribute of data, regardless of whether scores are analyzed descriptively or inferentially. Because the degree of skewness can be summarized by means of a single number, and because computers have no difficul ty providing such measures (or estimates) of skewness, those who prepare research reports should include a numerical index of skewness every time they provide measures of central tendency and variability." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"It is best to think of the various kinds of central tendency indices as falling into three categories based on the computational procedures one uses to summarize the data. One category deals with means, with techniques put into this category if scores are added together and then divided by the number of scores that are summed. The second category involves different kinds of medians, with various techniques grouped here if the goal is to find some sort of midpoint. The third category contains different kinds of modes, with these techniques focused on the frequency with which scores appear in the data." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"Various measures of central tendency have been invented because the proper notion of the 'average' score can vary from study to study. Depending on the kind of data collected, the degree of skewness in the data, and the possible existence of outliers, it may be that the most appropriate measure of central tendency is found by doing something other than (1) dividing the sum of the scores by the number of scores (to get the mean), (2) calculating the midpoint in the distribution (to get the median), or (3) determining the most frequently observed score (to get the mode)." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"Measures of central tendency and variability work together to give you a concise summary of your data. When you don’t have any outliers, the mean is the most common indicator of central tendency. But on its own, the mean doesn’t tell you much. It isn’t until you also take the standard deviation into account that you really have a sense of your data."  (Shonna D Watters et al, "The Practical Guide for HR Analytics: Using data to inform, transform, and empower HR decisions", 2019)

"Statistical analysis seeks to develop concise summary figures which describe a large body of quantitative data. One of the most widely used set of summary figures is known as measures of location, which are often referred to as averages, measures of central tendency or central location. The purpose for computing an average value for a set of observations is to obtain a single value which is representative of all the items and which the mind can grasp simply and quickly. The single value is the point or location around which the individual items cluster." (Lawrence J Kaplan)

15 May 2026

🔭Data Science: Centrality (Just the Quotes)

"An average value is a single value within the range of the data that is used to represent all of the values in the series. Since an average is somewhere within the range of the data, it is sometimes called a measure of central value." (Frederick E Croxton & Dudley J Cowden,Practical Business Statistics", 1937)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney,Facts from Figures", 1951)

"Numerical data, which have been recorded at intervals of time, form what is generally described as a time series. [...] The purpose of analyzing time series is not always the determination of the trend by itself. Interest may be centered on the seasonal movement displayed by the series and, in such a case, the determination of the trend is merely a stage in the process of measuring and analyzing the seasonal variation. If a regular basic or under- lying seasonal movement can be clearly established, forecasting of future movements becomes rather less a matter of guesswork and more a matter of intelligent forecasting." (Alfred R Ilersic, "Statistics", 1959)

"Dispersion or spread is the degree of the scatter or variation of the variables about a central value." (Bertram C Brookes & W F L Dick,Introduction to Statistical Method", 1969)

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails" (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al,Graphical Methods for Data Analysis", 1983)

"There are several reasons why symmetry is an important concept in data analysis. First, the most important single summary of a set of data is the location of the center, and when data meaning of 'center' is unambiguous. We can take center to mean any of the following things, since they all coincide exactly for symmetric data, and they are together for nearly symmetric data: (l) the center of symmetry. (2) the arithmetic average or center of gravity, (3) the median or 50%. Furthermore, if data a single point of highest concentration instead of several" (that is, they are unimodal), then we can add to the list (4) point of highest concentration. When data are far from symmetric, we may have trouble even agreeing on what we mean by center; in fact, the center may become an inappropriate summary for the data." (John M Chambers et al,Graphical Methods for Data Analysis", 1983)

"A connected graph is appropriate when the time series is smooth, so that perceiving individual values is not important. A vertical line graph is appropriate when it is important to see individual values, when we need to see short-term fluctuations, and when the time series has a large number of values; the use of vertical lines allows us to pack the series tightly along the horizontal axis. The vertical line graph, however, usually works best when the vertical lines emanate from a horizontal line through the center of the data and when there are no long-term trends in the data." (William S Cleveland,The Elements of Graphing Data", 1985)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin,Common Errors in Statistics" (and How to Avoid Them)", 2003)

"Central tendency is the formal expression for the notion of where data is centered, best understood by most readers as 'average'. There is no one way of measuring where data are centered, and different measures provide different insights." (Charles Livingston & Paul Voakes,Working with Numbers and Statistics: A handbook for journalists", 2005)

"Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side [...], typically with a large group of standard cases but with a tail of a few either very high" (for example, income) or low" (for example, legs) values." (David Spiegelhalter,The Art of Statistics: Learning from Data", 2019)

"The elements of this cloud of uncertainty (the set of all possible errors) can be described in terms of probability. The center of the cloud is the number zero, and elements of the cloud that are close to zero are more probable than elements that are far away from that center. We can be more precise in this definition by defining the cloud of uncertainty in terms of a mathematical function, called the probability distribution." (David S Salsburg,Errors, Blunders, and Lies: How to Tell the Difference", 2017)

"Two clouds of uncertainty may have the same center, but one may be much more dispersed than the other. We need a way of looking at the scatter about the center. We need a measure of the scatter. One such measure is the variance. We take each of the possible values of error and calculate the squared difference between that value and the center of the distribution. The mean of those squared differences is the variance." (David S Salsburg,Errors, Blunders, and Lies: How to Tell the Difference", 2017)

10 May 2026

🔭Data Science: Location (Just the Quotes)

"There are several reasons why symmetry is an important concept in data analysis. First, the most important single summary of a set of data is the location of the center, and when data meaning of 'center' is unambiguous. We can take center to mean any of the following things, since they all coincide exactly for symmetric data, and they are together for nearly symmetric data: (l) the center of symmetry. (2) the arithmetic average or center of gravity, (3) the median or 50%. Furthermore, if data a single point of highest concentration instead of several (that is, they are unimodal), then we can add to the list (4) point of highest concentration. When data are far from symmetric, we may have trouble even agreeing on what we mean by center; in fact, the center may become an inappropriate summary for the data." (John M Chambers et al,Graphical Methods for Data Analysis", 1983)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland,Visualizing Data", 1993)

"Fitting data means finding mathematical descriptions of structure in the data. An additive shift is a structural property of univariate data in which distributions differ only in location and not in spread or shape. […] The process of identifying a structure in data and then fitting the structure to produce residuals that have the same distribution lies at the heart of statistical analysis. Such homogeneous residuals can be pooled, which increases the power of the description of the variation in the data." (William S Cleveland,Visualizing Data", 1993)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland,Visualizing Data", 1993)

"Since the average is a measure of location, it is common to use averages to compare two data sets. The set with the greater average is thought to ‘exceed’ the other set. While such comparisons may be helpful, they must be used with caution. After all, for any given data set, most of the values will not be equal to the average." (Donald J Wheeler,Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Distinguish among confidence, prediction, and tolerance intervals. Confidence intervals are statements about population means or other parameters. Prediction intervals address future" (single or multiple) observations. Tolerance intervals describe the location of a specific proportion of a population, with specified confidence." (Gerald van Belle,Statistical Rules of Thumb", 2002)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin,Common Errors in Statistics" (and How to Avoid Them)", 2003)

"The central limit theorem is often used to justify the assumption of normality when using the sample mean and the sample standard deviation. But it is inevitable that real data contain gross errors. Five to ten percent unusual values in a dataset seem to be the rule rather than the exception. The distribution of such data is no longer Normal." (A S Hedayat & Guoqin Su,Robustness of the Simultaneous Estimators of Location and Scale From Approximating a Histogram by a Normal Density Curve", The American Statistician 66, 2012)

09 May 2026

🔭Data Science: Guessing (Just the Quotes)

"Summing up, then, it would seem as if the mind of the great discoverer must combine contradictory attributes. He must be fertile in theories and hypotheses, and yet full of facts and precise results of experience. He must entertain the feeblest analogies, and the merest guesses at truth, and yet he must hold them as worthless till they are verified in experiment. When there are any grounds of probability he must hold tenaciously to an old opinion, and yet he must be prepared at any moment to relinquish it when a clearly contradictory fact is encountered." (William S Jevons,The Principles of Science: A Treatise on Logic and Scientific Method", 1874)

"It would be an error to suppose that the great discoverer seizes at once upon the truth, or has any unerring method of divining it. In all probability the errors of the great mind exceed in number those of the less vigorous one. Fertility of imagination and abundance of guesses at truth are among the first requisites of discovery; but the erroneous guesses must be many times as numerous as those that prove well founded. The weakest analogies, the most whimsical notions, the most apparently absurd theories, may pass through the teeming brain, and no record remain of more than the hundredth part. […] The truest theories involve suppositions which are inconceivable, and no limit can really be placed to the freedom of hypotheses." (W Stanley Jevons,The Principles of Science: A Treatise on Logic and Scientific Method", 1877)

"Heuristic reasoning is reasoning not regarded as final and strict but as provisional and plausible only, whose purpose is to discover the solution of the present problem. We are often obliged to use heuristic reasoning. We shall attain complete certainty when we shall have obtained the complete solution, but before obtaining certainty we must often be satisfied with a more or less plausible guess. We may need the provisional before we attain the final. We need heuristic reasoning when we construct a strict proof as we need scaffolding when we erect a building." (George Pólya,How to Solve It", 1945)

"The scientist who discovers a theory is usually guided to his discovery by guesses; he cannot name a method by means of which he found the theory and can only say that it appeared plausible to him, that he had the right hunch or that he saw intuitively which assumption would fit the facts." (Hans Reichenbach,The Rise of Scientific Philosophy", 1951)

"Extrapolations are useful, particularly in the form of soothsaying called forecasting trends. But in looking at the figures or the charts made from them, it is necessary to remember one thing constantly: The trend to now may be a fact, but the future trend represents no more than an educated guess. Implicit in it is 'everything else being equal' and 'present trends continuing'. And somehow everything else refuses to remain equal." (Darell Huff,How to Lie with Statistics", 1954)

"In plausible reasoning the principal thing is to distinguish... a more reasonable guess from a less reasonable guess." (George Pólya,Mathematics and plausible reasoning" Vol. 1, 1954)

"We know many laws of nature and we hope and expect to discover more. Nobody can foresee the next such law that will be discovered. Nevertheless, there is a structure in laws of nature which we call the laws of invariance. This structure is so far-reaching in some cases that laws of nature were guessed on the basis of the postulate that they fit into the invariance structure." (Eugene P Wigner,The Role of Invariance Principles in Natural Philosophy", 1963)

"Another thing I must point out is that you cannot prove a vague theory wrong. If the guess that you make is poorly expressed and rather vague, and the method that you use for figuring out the consequences is a little vague - you are not sure, and you say, 'I think everything's right because it's all due to so and so, and such and such do this and that more or less, and I can sort of explain how this works' […] then you see that this theory is good, because it cannot be proved wrong! Also if the process of computing the consequences is indefinite, then with a little skill any experimental results can be made to look like the expected consequences." (Richard P Feynman,The Character of Physical Law", 1965)

"The method of guessing the equation seems to be a pretty effective way of guessing new laws. This shows again that mathematics is a deep way of expressing nature, and any attempt to express nature in philosophical principles, or in seat-of-the-pants mechanical feelings, is not an efficient way." (Richard Feynman,The Character of Physical Law", 1965)

"Every discovery, every enlargement of the understanding, begins as an imaginative preconception of what the truth might be. The imaginative preconception - a ‘hypothesis’ - arises by a process as easy or as difficult to understand as any other creative act of mind; it is a brainwave, an inspired guess, a product of a blaze of insight. It comes anyway from within and cannot be achieved by the exercise of any known calculus of discovery." (Sir Peter B Medawar,Advice to a Young Scientist", 1979)

"Scientists reach their  conclusions  for the damnedest of reasons: intuition, guesses, redirections after wild-goose chases, all combing with a dollop of rigorous observation and logical  reasoning to be sure […] This  messy and personal side of science should not be  disparaged, or covered up, by  scientists for two  major reasons. First, scientists should proudly show this  human face to  display their kinship with all other  modes of creative human thought […] Second, while biases and references often impede understanding, these  mental idiosyncrasies  may  also serve as powerful, if  quirky and personal, guides to solutions." (Stephen J Gould,Dinosaur in a  Haystack: Reflections in natural  history", 1995)

"Compound errors can begin with any of the standard sorts of bad statistics - a guess, a poor sample, an inadvertent transformation, perhaps confusion over the meaning of a complex statistic. People inevitably want to put statistics to use, to explore a number's implications. [...] The strengths and weaknesses of those original numbers should affect our confidence in the second-generation statistics." (Joel Best,Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"First, good statistics are based on more than guessing. [...] Second, good statistics are based on clear, reasonable definitions. Remember, every statistic has to define its subject. Those definitions ought to be clear and made public. [...] Third, good statistics are based on clear, reasonable measures. Again, every statistic involves some sort of measurement; while all measures are imperfect, not all flaws are equally serious. [...] Finally, good statistics are based on good samples." (Joel Best,Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"While some social problems statistics are deliberate deceptions, many - probably the great majority - of bad statistics are the result of confusion, incompetence, innumeracy, or selective, self-righteous efforts to produce numbers that reaffirm principles and interests that their advocates consider just and right. The best response to stat wars is not to try and guess who's lying or, worse, simply to assume that the people we disagree with are the ones telling lies. Rather, we need to watch for the standard causes of bad statistics - guessing, questionable definitions or methods, mutant numbers, and inappropriate comparisons." (Joel Best,Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists", 2001)

"The well-known 'No Free Lunch' theorem indicates that there does not exist a pattern classification method that is inherently superior to any other, or even to random guessing without using additional information. It is the type of problem, prior information, and the amount of training samples that determine the form of classifier to apply. In fact, corresponding to different real-world problems, different classes may have different underlying data structures. A classifier should adjust the discriminant boundaries to fit the structures which are vital for classification, especially for the generalization capacity of the classifier." (Hui Xue et al,SVM: Support Vector Machines", 2009)

"Data science isn’t just about the existence of data, or making guesses about what that data might mean; it’s about testing hypotheses and making sure that the conclusions you’re drawing from the data are valid." (Mike Loukides,What Is Data Science?", 2011)

"GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed." (Daniel J Levitin,Weaponized Lies", 2017)

"In statistical inference and machine learning, we often talk about estimates and estimators. Estimates are basically our best guesses regarding some quantities of interest given" (finite) data. Estimators are computational devices or procedures that allow us to map between a given" (finite) data sample and an estimate of interest." (Aleksander Molak,Causal Inference and Discovery in Python", 2023)


08 May 2026

🔭Data Science: Heuristics (Just the Quotes)

"Heuristic reasoning is reasoning not regarded as final and strict but as provisional and plausible only, whose purpose is to discover the solution of the present problem. We are often obliged to use heuristic reasoning. We shall attain complete certainty when we shall have obtained the complete solution, but before obtaining certainty we must often be satisfied with a more or less plausible guess. We may need the provisional before we attain the final. We need heuristic reasoning when we construct a strict proof as we need scaffolding when we erect a building." (George Pólya,How to Solve It", 1945)

"The attempt to characterize exactly models of an empirical theory almost inevitably yields a more precise and clearer understanding of the exact character of a theory. The emptiness and shallowness of many classical theories in the social sciences is well brought out by the attempt to formulate in any exact fashion what constitutes a model of the theory. The kind of theory which mainly consists of insightful remarks and heuristic slogans will not be amenable to this treatment. The effort to make it exact will at the same time reveal the weakness of the theory." (Patrick Suppes," A Comparison of the Meaning and Uses of Models in Mathematics and the Empirical Sciences", Synthese  Vol. 12" (2/3), 1960)

"Design problems - generating or discovering alternatives - are complex largely because they involve two spaces, an action space and a state space, that generally have completely different structures. To find a design requires mapping the former of these on the latter. For many, if not most, design problems in the real world systematic algorithms are not known that guarantee solutions with reasonable amounts of computing effort. Design uses a wide range of heuristic devices - like means-end analysis, satisficing, and the other procedures that have been outlined - that have been found by experience to enhance the efficiency of search. Much remains to be learned about the nature and effectiveness of these devices." (Herbert A Simon,The Logic of Heuristic Decision Making", [inThe Logic of Decision and Action"], 1966)

"Intelligence has two parts, which we shall call the epistemological and the heuristic. The epistemological part is the representation of the world in such a form that the solution of problems follows from the facts expressed in the representation. The heuristic part is the mechanism that on the basis of the information solves the problem and decides what to do." (John McCarthy & Patrick J Hayes,Some Philosophical Problems from the Standpoint of Artificial Intelligence", Machine Intelligence 4, 1969)

"Consider any of the heuristics that people have come up with for supervised learning: avoid overfitting, prefer simpler to more complex models, boost your algorithm, bag it, etc. The no free lunch theorems say that all such heuristics fail as often" (appropriately weighted) as they succeed. This is true despite formal arguments some have offered trying to prove the validity of some of these heuristics." (David H Wolpert,The lack of a priori distinctions between learning algorithms", Neural Computation Vol. 8(7), 1996)

"Heuristic (it is of Greek origin) means discovery. Heuristic methods are based on experience, rational ideas, and rules of thumb. Heuristics are based more on common sense than on mathematics. Heuristics are useful, for example, when the optimal solution needs an exhaustive search that is not realistic in terms of time. In principle, a heuristic does not guarantee the best solution, but a heuristic solution can provide a tremendous shortcut in cost and time." (Nikola K Kasabov,Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"Theories of choice are at best approximate and incomplete. One reason for this pessimistic assessment is that choice is a constructive and contingent process. When faced with a complex problem, people employ a variety of heuristic procedures in order to simplify the representation and the evaluation of prospects. These procedures include computational shortcuts and editing operations, such as eliminating common components and discarding nonessential differences. The heuristics of choice do not readily lend themselves to formal analysis because their application depends on the formulation of the problem, the method of elicitation, and the context of choice." (Amos Tversky & Daniel Kahneman,Advances in Prospect Theory: Cumulative Representation of Uncertainty" [inChoices, Values, and Frames"], 2000)

"Behavioural research shows that we tend to use simplifying heuristics when making judgements about uncertain events. These are prone to biases and systematic errors, such as stereotyping, disregard of sample size, disregard for regression to the mean, deriving estimates based on the ease of retrieving instances of the event, anchoring to the initial frame, the gambler’s fallacy, and wishful thinking, which are all affected by our inability to consider more than a few aspects or dimensions of any phenomenon or situation at the same time." (Hans G Daellenbach & Donald C McNickle,Management Science: Decision making through systems thinking", 2005)

"A decision theory that rests on the assumptions that human cognitive capabilities are limited and that these limitations are adaptive with respect to the decision environments humans frequently encounter. Decision are thought to be made usually without elaborate calculations, but instead by using fast and frugal heuristics. These heuristics certainly have the advantage of speed and simplicity, but if they are well matched to a decision environment, they can even outperform maximizing calculations with respect to accuracy. The reason for this is that many decision environments are characterized by incomplete information and noise. The information we do have is usually structured in a specific way that clever heuristics can exploit." (E Ebenhoh,Agent-Based Modelnig with Boundedly Rational Agents", 2007)

"Optimization systems (or optimizers, as they are often referred to) aim to optimize in a systematic way, oftentimes using a heuristics-based approach. Such an approach enables the AI system to use a macro level concept as part of its low-level calculations, accelerating the whole process and making it more light-weight. After all, most of these systems are designed with scalability in mind, so the heuristic approach is most practical." (Yunus E Bulut & Zacharias Voulgaris,AI for Data Science: Artificial Intelligence Frameworks and Functionality for Deep Learning, Optimization, and Beyond", 2018)

"The social world that humans have made for themselves is so complex that the mind simplifies the world by using heuristics, customs, and habits, and by making models or assumptions about how things generally work (the ‘causal structure of the world’). And because people rely upon" (and are invested in) these mental models, they usually prefer that they remain uncontested." (Dr James Brennan,Psychological  Adjustment to Illness and Injury", West of England Medical Journal Vol. 117 (2), 2018)

"Many AI systems employ heuristic decision making, which uses a strategy to find the most likely correct decision to avoid the high cost" (time) of processing lots of information. We can think of those heuristics as shortcuts or rules of thumb that we would use to make fast decisions." (Jesús Barrasa et al,Knowledge Graphs: Data in Context for Responsive Businesses", 2021)

"Once we know something is fat-tailed, we can use heuristics to see how an exposure there reacts to random events: how much is a given unit harmed by them. It is vastly more effective to focus on being insulated from the harm of random events than try to figure them out in the required details" (as we saw the inferential errors under thick tails are huge). So it is more solid, much wiser, more ethical, and more effective to focus on detection heuristics and policies rather than fabricate statistical properties." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

03 May 2026

🔭Data Science: Tails (Just the Quotes)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney,Facts from Figures", 1951)

"Logging size transforms the original skewed distribution into a more symmetrical one by pulling in the long right tail of the distribution toward the mean. The short left tail is, in addition, stretched. The shift toward symmetrical distribution produced by the log transform is not, of course, merely for convenience. Symmetrical distributions, especially those that resemble the normal distribution, fulfill statistical assumptions that form the basis of statistical significance testing in the regression model." (Edward R Tufte,Data Analysis for Politics and Policy", 1974)

"Equal variability is not always achieved in plots. For instance, if the theoretical distribution for a probability plot has a density that drops off gradually to zero in the tails" (as the normal density does), then the variability of the data in the tails of the probability plot is greater than in the center. Another example is provided by the histogram. Since the height of any one bar has a binomial distribution, the standard deviation of the height is approximately proportional to the square root of the expected height; hence, the variability of the longer bars is greater." (John M Chambers et al,Graphical Methods for Data Analysis", 1983)

"If the sample is not representative of the population because the sample is small or biased, not selected at random, or its constituents are not independent of one another, then the bootstrap will fail. […] For a given size sample, bootstrap estimates of percentiles in the tails will always be less accurate than estimates of more centrally located percentiles. Similarly, bootstrap interval estimates for the variance of a distribution will always be less accurate than estimates of central location such as the mean or median because the variance depends strongly upon extreme values in the population." (Phillip I Good & James W Hardin,Common Errors in Statistics" (and How to Avoid Them)", 2003)

"Bell curves don't differ that much in their bells. They differ in their tails. The tails describe how frequently rare events occur. They describe whether rare events really are so rare. This leads to the saying that the devil is in the tails." (Bart Kosko,Noise", 2006)

"Readability in visualization helps people interpret data and make conclusions about what the data has to say. Embed charts in reports or surround them with text, and you can explain results in detail. However, take a visualization out of a report or disconnect it from text that provides context" (as is common when people share graphics online), and the data might lose its meaning; or worse, others might misinterpret what you tried to show." (Nathan Yau,Data Points: Visualization That Means Something", 2013)

"A very different - and very incorrect - argument is that successes must be balanced by failures (and failures by successes) so that things average out. Every coin flip that lands heads makes tails more likely. Every red at roulette makes black more likely. […] These beliefs are all incorrect. Good luck will certainly not continue indefinitely, but do not assume that good luck makes bad luck more likely, or vice versa." (Gary Smith,Standard Deviations", 2014)

"The more complex the system, the more variable (risky) the outcomes. The profound implications of this essential feature of reality still elude us in all the practical disciplines. Sometimes variance averages out, but more often fat-tail events beget more fat-tail events because of interdependencies. If there are multiple projects running, outlier (fat-tail) events may also be positively correlated - one IT project falling behind will stretch resources and increase the likelihood that others will be compromised." (Paul Gibbons,The Science of Successful Organizational Change",  2015)

"Many statistical procedures perform more effectively on data that are normally distributed, or at least are symmetric and not excessively kurtotic" (fat-tailed), and where the mean and variance are approximately constant. Observed time series frequently require some form of transformation before they exhibit these distributional properties, for in their 'raw' form they are often asymmetric." (Terence C Mills,Applied Time Series Analysis: A practical guide to modeling and forecasting", 2019)

"Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side [...], typically with a large group of standard cases but with a tail of a few either very high" (for example, income) or low" (for example, legs) values." (David Spiegelhalter,The Art of Statistics: Learning from Data", 2019)

"[…] it is not merely that events in the tails of the distributions matter, happen, play a large role, etc. The point is that these events play the major role and their probabilities are not" (easily) computable, not reliable for any effective use. The implication is that Black Swans do not necessarily come from fat tails; the problem can result from an incomplete assessment of tail events." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"[…] whenever people make decisions after being supplied with the standard deviation number, they act as if it were the expected mean deviation." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"Behavioral finance so far makes conclusions from statics not dynamics, hence misses the picture. It applies trade-offs out of context and develops the consensus that people irrationally overestimate tail risk" (hence need to be 'nudged' into taking more of these exposures). But the catastrophic event is an absorbing barrier. No risky exposure can be analyzed in isolation: risks accumulate. If we ride a motorcycle, smoke, fly our own propeller plane, and join the mafia, these risks add up to a near-certain premature death. Tail risks are not a renewable resource." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"But note that any heavy tailed process, even a power law, can be described in sample" (that is finite number of observations necessarily discretized) by a simple Gaussian process with changing variance, a regime switching process, or a combination of Gaussian plus a series of variable jumps" (though not one where jumps are of equal size […])." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"Once we know something is fat-tailed, we can use heuristics to see how an exposure there reacts to random events: how much is a given unit harmed by them. It is vastly more effective to focus on being insulated from the harm of random events than try to figure them out in the required details" (as we saw the inferential errors under thick tails are huge). So it is more solid, much wiser, more ethical, and more effective to focus on detection heuristics and policies rather than fabricate statistical properties." (Nassim N Taleb,Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications" 2nd Ed., 2022)

"No one sees further into a generalization than his own knowledge of detail extends." (William James)

"Remember that a p-value merely indicates the probability of a particular set of data being generated by the null model–it has little to say about the size of a deviation from that model" (especially in the tails of the distribution, where large changes in effect size cause only small changes in p-values)." (Clay Helberg)


02 May 2026

🔭Data Science: Skewness (Just the Quotes)

"Some distributions [...] are symmetrical about their central value. Other distributions have marked asymmetry and are said to be skew. Skew distributions are divided into two types. If the 'tail' of the distribution reaches out into the larger values of the variate, the distribution is said to show positive skewness; if the tail extends towards the smaller values of the variate, the distribution is called negatively skew." (Michael J Moroney, "Facts from Figures", 1951)

"Logging size transforms the original skewed distribution into a more symmetrical one by pulling in the long right tail of the distribution toward the mean. The short left tail is, in addition, stretched. The shift toward symmetrical distribution produced by the log transform is not, of course, merely for convenience. Symmetrical distributions, especially those that resemble the normal distribution, fulfill statistical assumptions that form the basis of statistical significance testing in the regression model." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"Logging skewed variables also helps to reveal the patterns in the data. […] the rescaling of the variables by taking logarithms reduces the nonlinearity in the relationship and removes much of the clutter resulting from the skewed distributions on both variables; in short, the transformation helps clarify the relationship between the two variables. It also […] leads to a theoretically meaningful regression coefficient." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithmic transformation serves several purposes: (1) The resulting regression coefficients sometimes have a more useful theoretical interpretation compared to a regression based on unlogged variables. (2) Badly skewed distributions - in which many of the observations are clustered together combined with a few outlying values on the scale of measurement - are transformed by taking the logarithm of the measurements so that the clustered values are spread out and the large values pulled in more toward the middle of the distribution. (3) Some of the assumptions underlying the regression model and the associated significance tests are better met when the logarithm of the measured variables is taken." (Edward R Tufte, "Data Analysis for Politics and Policy", 1974)

"The logarithm is an extremely powerful and useful tool for graphical data presentation. One reason is that logarithms turn ratios into differences, and for many sets of data, it is natural to think in terms of ratios. […] Another reason for the power of logarithms is resolution. Data that are amounts or counts are often very skewed to the right; on graphs of such data, there are a few large values that take up most of the scale and the majority of the points are squashed into a small region of the scale with no resolution." (William S. Cleveland, "Graphical Methods for Data Presentation: Full Scale Breaks, Dot Charts, and Multibased Logging", The American Statistician Vol. 38 (4) 1984)

"It is common for positive data to be skewed to the right: some values bunch together at the low end of the scale and others trail off to the high end with increasing gaps between the values as they get higher. Such data can cause severe resolution problems on graphs, and the common remedy is to take logarithms. Indeed, it is the frequent success of this remedy that partly accounts for the large use of logarithms in graphical data display." (William S Cleveland, "The Elements of Graphing Data", 1985)

"If a distribution were perfectly symmetrical, all symmetry-plot points would be on the diagonal line. Off-line points indicate asymmetry. Points fall above the line when distance above the median is greater than corresponding distance below the median. A consistent run of above-the-line points indicates positive skew; a run of below-the-line points indicates negative skew." (Lawrence C Hamilton, "Regression with Graphics: A second course in applied statistics", 1991)

"Skewness is a measure of symmetry. For example, it's zero for the bell-shaped normal curve, which is perfectly symmetric about its mean. Kurtosis is a measure of the peakedness, or fat-tailedness, of a distribution. Thus, it measures the likelihood of extreme values." (John L Casti, "Reality Rules: Picturing the world in mathematics", 1992)

"Data that are skewed toward large values occur commonly. Any set of positive measurements is a candidate. Nature just works like that. In fact, if data consisting of positive numbers range over several powers of ten, it is almost a guarantee that they will be skewed. Skewness creates many problems. There are visualization problems. A large fraction of the data are squashed into small regions of graphs, and visual assessment of the data degrades. There are characterization problems. Skewed distributions tend to be more complicated than symmetric ones; for example, there is no unique notion of location and the median and mean measure different aspects of the distribution. There are problems in carrying out probabilistic methods. The distribution of skewed data is not well approximated by the normal, so the many probabilistic methods based on an assumption of a normal distribution cannot be applied." (William S Cleveland, "Visualizing Data", 1993)

"The logarithm is one of many transformations that we can apply to univariate measurements. The square root is another. Transformation is a critical tool for visualization or for any other mode of data analysis because it can substantially simplify the structure of a set of data. For example, transformation can remove skewness toward large values, and it can remove monotone increasing spread. And often, it is the logarithm that achieves this removal." (William S Cleveland, "Visualizing Data", 1993)

"When the distributions of two or more groups of univariate data are skewed, it is common to have the spread increase monotonically with location. This behavior is monotone spread. Strictly speaking, monotone spread includes the case where the spread decreases monotonically with location, but such a decrease is much less common for raw data. Monotone spread, as with skewness, adds to the difficulty of data analysis. For example, it means that we cannot fit just location estimates to produce homogeneous residuals; we must fit spread estimates as well. Furthermore, the distributions cannot be compared by a number of standard methods of probabilistic inference that are based on an assumption of equal spreads; the standard t-test is one example. Fortunately, remedies for skewness can cure monotone spread as well." (William S Cleveland, "Visualizing Data", 1993)

"The standard deviation (often SD) is a measure of variability. When we calculate the standard deviation of a sample, we are using it as an estimate of the variability of the population from which the sample was drawn. For data with a normal distribution, about 95% of individu als will have values within 2 standard deviations of the mean, the other 5% being equally scattered above and below these limits. Contrary to popular misconception, the standard deviation is a valid measure of variability regardless of the distribution. About 95% of observa tions of any distribution usually fall within the 2 standard deviation limits, though those outside may all be at one end. We may choose a different summary statistic, how ever, when data have a skewed distribution." (Douglas G Altman & J Martin Bland, "Statistics Notes: Standard Deviations And Standard Errors", British Medical Journal Vol. 331 (7521) 2005)

"Use a logarithmic scale when it is important to understand percent change or multiplicative factors. […] Showing data on a logarithmic scale can cure skewness toward large values." (Naomi B Robbins, "Creating More effective Graphs", 2005)

"Distributional shape is an important attribute of data, regardless of whether scores are analyzed descriptively or inferentially. Because the degree of skewness can be summarized by means of a single number, and because computers have no difficulty providing such measures (or estimates) of skewness, those who prepare research reports should include a numerical index of skewness every time they provide measures of central tendency and variability." (Schuyler W Huck, "Statistical Misconceptions", 2008)

"Given the important role that correlation plays in structural equation modeling, we need to understand the factors that affect establishing relationships among multivariable data points. The key factors are the level of measurement, restriction of range in data values (variability, skewness, kurtosis), missing data, nonlinearity, outliers, correction for attenuation, and issues related to sampling variation, confidence intervals, effect size, significance, sample size, and power." (Randall E Schumacker & Richard G Lomax, "A Beginner’s Guide to Structural Equation Modeling" 3rd Ed., 2010)

"[The normality] assumption is the least important one for the reliability of the statistical procedures under discussion. Violations of the normality assumption can be divided into two general forms: Distributions that have heavier tails than the normal and distributions that are skewed rather than symmetric. If data is skewed, the formulas we are discussing are still valid as long as the sample size is sufficiently large. Although the guidance about 'how skewed' and 'how large a sample' can be quite vague, since the greater the skew, the larger the required sample size. For the data commonly used in time series and for the sample sizes (which are generally quite large) used, skew is not a problem. On the other hand, heavy tails can be very problematic." (DeWayne R Derryberry, "Basic Data Analysis for Time Series with R" 1st Ed, 2014)

"In statistical theory, location and variability are referred to as the first and second moments of a distribution. The third and fourth moments are called skewness and kurtosis. Skewness refers to whether the data is skewed to larger or smaller values and kurtosis indicates the propensity of the data to have extreme values. Generally, metrics are not used to measure skewness and kurtosis; instead, these are discovered through visual displays [...]" (Peter C Bruce & Andrew G Bruce, "Statistics for Data Scientists: 50 Essential Concepts", 2016)

"A histogram represents the frequency distribution of the data. Histograms are similar to bar charts but group numbers into ranges. Also, a histogram lets you show the frequency distribution of continuous data. This helps in analyzing the distribution (for example, normal or Gaussian), any outliers present in the data, and skewness." (Umesh R Hodeghatta & Umesha Nayak, "Business Analytics Using R: A Practical Approach", 2017)

"New information is constantly flowing in, and your brain is constantly integrating it into this statistical distribution that creates your next perception (so in this sense 'reality' is just the product of your brain’s ever-evolving database of consequence). As such, your perception is subject to a statistical phenomenon known in probability theory as kurtosis. Kurtosis in essence means that things tend to become increasingly steep in their distribution [...] that is, skewed in one direction. This applies to ways of seeing everything from current events to ourselves as we lean 'skewedly' toward one interpretation, positive or negative. Things that are highly kurtotic, or skewed, are hard to shift away from. This is another way of saying that seeing differently isn’t just conceptually difficult - it’s statistically difficult." (Beau Lotto, "Deviate: The Science of Seeing Differently", 2017)

"Mean-averages can be highly misleading when the raw data do not form a symmetric pattern around a central value but instead are skewed towards one side [...], typically with a large group of standard cases but with a tail of a few either very high (for example, income) or low (for example, legs) values." (David Spiegelhalter, "The Art of Statistics: Learning from Data", 2019)

"With skewed data, quantiles will reflect the skew, while adding standard deviations assumes symmetry in the distribution and can be misleading." (Robert Grant, "Data Visualization: Charts, Maps and Interactive Graphics", 2019)

"Adjusting scale is an important practice in data visualization. While the log transform is versatile, it doesn’t handle all situations where skew or curvature occurs. For example, at times the values are all roughly the same order of magnitude and the log transformation has little impact. Another transformation to consider is the square root transformation, which is often useful for count data." (Sam Lau et al, "Learning Data Science: Data Wrangling, Exploration, Visualization, and Modeling with Python", 2023)

24 April 2025

🧭Business Intelligence: Perspectives (Part 30: The Data Science Connection)

Business Intelligence Series
Business Intelligence Series

Data Science is a collection of quantitative and qualitative methods, respectively techniques, algorithms, principles, processes and technologies used to analyze, and process amounts of raw and aggregated data to extract information or knowledge it contains. Its theoretical basis is rooted within mathematics, mainly statistics, computer science and domain expertise, though it can include further aspects related to communication, management, sociology, ecology, cybernetics, and probably many other fields, as there’s enough space for experimentation and translation of knowledge from one field to another.  

The aim of Data Science is to extract valuable insights from data to support decision-making, problem-solving, drive innovation and probably it can achieve more in time. Reading in between the lines, Data Science sounds like a superhero that can solve all the problems existing out there, which frankly is too beautiful to be true! In theory everything is possible, when in practice there are many hard limitations! Given any amount of data, the knowledge that can be obtained from it can be limited by many factors - the degree to which the data, processes and models built reflect reality, and there can be many levels of approximation, respectively the degree to which such data can be collected consistently. 

Moreover, even if the theoretical basis seems sound, the data, information or knowledge which is not available can be the important missing link in making any sensible progress toward the goals set in Data Science projects. In some cases, one might be aware of what's missing, though for the data scientist not having the required domain knowledge, this can be a hard limit! This gap can be probably bridged with sensemaking, exploration and experimentation approaches, especially by applying models from other domains, though there are no guarantees ahead!

AI can help in this direction by utilizing its capacity to explore fast ideas or models. However, it's questionable how much the models built with AI can be further used if one can't build mechanistical mental models of the processes reflected in the data. It's like devising an algorithm for winning at lottery small amounts, though investing more money in the algorithm doesn't automatically imply greater wins. Even if occasionally the performance is improved, it's questionable how much it can be leveraged for each utilization. Statistics has its utility when one studies data in aggregation and can predict average behavior. It can’t be used to predict the occurrence of events with a high precision. Think how hard the prediction of earthquakes or extreme weather is by just looking at a pile of data reflecting what’s happening only in a certain zone!

In theory, the more data one has from different geographical areas or organizations, the more robust the models can become. However, no two geographies, respectively no two organizations are alike: business models, the people, the events and other aspects make global models less applicable to local context. Frankly, one has more chances of progress if a model is obtained by having a local scope and then attempting to leverage the respective model for a broader scope. Even then, there can be differences between the behavior or phenomena at micro, respectively at macro level (see the law of physics). 

This doesn’t mean that Data Science or AI related knowledge is useless. The knowledge accumulated by applying various techniques, models and programming languages in problem-solving can be more valuable than the results obtained! Experimentation is a must for organizations to innovate, to extend their knowledge base. It’s also questionable how much of the respective knowledge can be retained and put to good use. In the end, each organization must determine this by itself!

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
Koeln, NRW, Germany
IT Professional with more than 25 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.