SQL Troubles

04 April 2006

🖍️Max Shron - Collected Quotes

"A mockup shows what we should expect to take away from a project. In contrast, an argument sketch tells us roughly what we need to do to be convincing at all. It is a loose outline of the statements that will make our work relevant and correct. While they are both collections of sentences, mockups and argument sketches serve very different purposes. Mockups give a flavor of the finished product, while argument sketches give us a sense of the logic behind the solution." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"A very powerful way to organize our thoughts is by classifying each point of dispute in our argument. A point of dispute is the part of an argument where the audience pushes back, the point where we actually need to make a case to win over the skeptical audience. All but the most trivial arguments make at least one point that an audience will be rightfully skeptical of. Such disputes can be classified, and the classification tells us what to do next. Once we identify the kind of dispute we are dealing with, the issues we need to demonstrate follow naturally." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"All stories have a structure, and a project scope is no different. Like any story, our scope will have exposition (the context), some conflict (the need), a resolution (the vision), and hopefully a happily-ever-after (the outcome). Practicing telling stories is excellent practice for scoping data problems." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Building exploratory scatterplots should precede the building of a model, if for no reason other than to check that the intuition gained from making the map makes sense. The relationships may be so obvious, or the confounders so unimportant, that the model is unnecessary. A lack of obvious relationships in pairwise scatterplots does not mean that a model of greater complexity would not be able to find signal, but if that’s what we’re up against, it is important to know it ahead of time. Similarly, building simple models before tackling more complex ones will save us time and energy." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Contexts emerge from understanding who we are working with and why they are doing what they are doing. We learn the context from talking to people, and continuing to talk to them until we understand what their long-term goals are. The context sets the overall tone for the project, and guides the choices we make about what to pursue. It provides the background that makes the rest of the decisions make sense. The work we do should further the mission espoused in the context. At least if it does not, we should be aware of that." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Data science, as a field, is overly concerned with the technical tools for executing problems and not nearly concerned enough with asking the right questions. It is very tempting, given how pleasurable it can be to lose oneself in data science work, to just grab the first or most interesting data set and go to town. Other disciplines have successfully built up techniques for asking good questions and ensuring that, once started, work continues on a productive path. We have much to gain from adapting their techniques to our field." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Data science is already a field of bricolage. Swaths of engineering, statistics, machine learning, and graphic communication are already fundamental parts of the data science canon. They are necessary, but they are not sufficient. If we look further afield and incorporate ideas from the 'softer' intellectual disciplines, we can make data science successful and help it be more than just this decade’s fad." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Data science is the application of math and computers to solve problems that stem from a lack of knowledge, constrained by the small number of people with any interest in the answers." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Keep in mind that a mockup is not the actual answer we expect to arrive at. Instead, a mockup is an example of the kind of result we would expect, an illustration of the form that results might take. Whether we are designing a tool or pulling data together, concrete knowledge of what we are aiming at is incredibly valuable. Without a mockup, it’s easy to get lost in abstraction, or to be unsure what we are actually aiming toward. We risk missing our goals completely while the ground slowly shifts beneath our feet. Mockups also make it much easier to focus in on what is important, because mockups are shareable. We can pass our few sentences, idealized graphs, or user interface sketches off to other people to solicit their opinion in a way that diving straight into source code and spreadsheets can never do." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Models that can be easily fit and interpreted (like a linear or logistic model), or models that have great predictive performance without much work (like random forests), serve as excellent places to start a predictive task. [...] It is important, though, to not get too deep into these exploratory steps and forget about the larger picture. Setting time limits (in hours or, at most, days) for these exploratory projects is a helpful way to avoid wasting time. To avoid losing the big picture, it also helps to write down the intended steps at the beginning. An explicitly written-down scaffolding plan can be a huge help to avoid getting sucked deeply into work that is ultimately of little value. A scaffolding plan lays out what our next few goals are, and what we expect to shift once we achieve them." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"Most people start working with data from exactly the wrong end. They begin with a data set, then apply their favorite tools and techniques to it. The result is narrow questions and shallow arguments. Starting with data, without first doing a lot of thinking, without having any structure, is a short road to simple questions and unsurprising results. We don’t want unsurprising - we want knowledge. [...] As professionals working with data, our domain of expertise has to be the full problem, not merely the columns to combine, transformations to apply, and models to fit. Picking the right techniques has to be secondary to asking the right questions. We have to be proficient in both to make a difference." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"There are four parts to a project scope. The four parts are the context of the project; the needs that the project is trying to meet; the vision of what success might look like; and finally what the outcome will be, in terms of how the organization will adopt the results and how its effects will be measured down the line. When a problem is well-scoped, we will be able to easily converse about or write out our thoughts on each. Those thoughts will mature as we progress in a project, but they have to start somewhere. Any scope will evolve over time; no battle plan survives contact with opposing forces." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

"To walk the path of creating things of lasting value, we have to understand elements as diverse as the needs of the people we’re working with, the shape that the work will take, the structure of the arguments we make, and the process of what happens after we 'finish'. To make that possible, we need to give ourselves space to think. When we have space to think, we can attend to the problem of why and so what before we get tripped up in how. Otherwise, we are likely to spend our time doing the wrong things." (Max Shron, "Thinking with Data: How to Turn Information into Insights", 2014)

🖍️Ely Devons - Collected Quotes

"Every economic and social situation or problem is now described in statistical terms, and we feel that it is such statistics which give us the real basis of fact for understanding and analysing problems and difficulties, and for suggesting remedies. In the main we use such statistics or figures without any elaborate theoretical analysis; little beyond totals, simple averages and perhaps index numbers. Figures have become the language in which we describe our economy or particular parts of it, and the language in which we argue about policy." (Ely Devons, "Essays in Economics", 1961)

"Indeed the language of statistics is rarely as objective as we imagine. The way statistics are presented, their arrangement in a particular way in tables, the juxtaposition of sets of figures, in itself reflects the judgment of the author about what is significant and what is trivial in the situation which the statistics portray." (Ely Devons, "Essays in Economics", 1961)

"It might be reasonable to expect that the more we know about any set of statistics, the greater the confidence we would have in using them, since we would know in which directions they were defective; and that the less we know about a set of figures, the more timid and hesitant we would be in using them. But, in fact, it is the exact opposite which is normally the case; in this field, as in many others, knowledge leads to caution and hesitation, it is ignorance that gives confidence and boldness. For knowledge about any set of statistics reveals the possibility of error at every stage of the statistical process; the difficulty of getting complete coverage in the returns, the difficulty of framing answers precisely and unequivocally, doubts about the reliability of the answers, arbitrary decisions about classification, the roughness of some of the estimates that are made before publishing the final results. Knowledge of all this, and much else, in detail, about any set of figures makes one hesitant and cautious, perhaps even timid, in using them." (Ely Devons, "Essays in Economics", 1961)

"The art of using the language of figures correctly is not to be over-impressed by the apparent air of accuracy, and yet to be able to take account of error and inaccuracy in such a way as to know when, and when not, to use the figures. This is a matter of skill, judgment, and experience, and there are no rules and short cuts in acquiring this expertness." (Ely Devons, "Essays in Economics", 1961)

"The knowledge that the economist uses in analysing economic problems and in giving advice on them is of thre First, theories of how the economic system works (and why it sometimes does not work so well); second, commonsense maxims about reasonable economic behaviour; and third, knowledge of the facts describing the main features of the economy, many of these facts being statistical." (Ely Devons, "Essays in Economics", 1961)

"The general models, even of the most elaborate kind, serve the simple purpose of demonstrating the interconnectedness of all economic phenomena, and show how, under certain conditions, price may act as a guiding link between them. Looked at in another way such models show how a complex set of interrelations can hang together consistently without any central administrative direction." (Ely Devons, "Essays in Economics", 1961)

"The most important and frequently stressed prescription for avoiding pitfalls in the use of economic statistics, is that one should find out before using any set of published statistics, how they have been collected, analysed and tabulated. This is especially important, as you know, when the statistics arise not from a special statistical enquiry, but are a by-product of law or administration. Only in this way can one be sure of discovering what exactly it is that the figures measure, avoid comparing the non-comparable, take account of changes in definition and coverage, and as a consequence not be misled into mistaken interpretations and analysis of the events which the statistics portray." (Ely Devons, "Essays in Economics", 1961)

"The two most important characteristics of the language of statistics are first, that it describes things in quantitative terms, and second, that it gives this description an air of accuracy and precision. The limitations, as well as the advantages, of the statistical approach arise from these two characteristics. For a description of the quantitative aspect of events never gives us the whole story; and even the best statistics are never, and never can be, completely accurate and precise. To avoid misuse of the language we must, therefore, guard against exaggerating the importance of the elements in any situation that can be described quantitatively, and we must know sufficient about the error and inaccuracy of the figures to be able to use them with discretion." (Ely Devons, "Essays in Economics", 1961)

"There are, indeed, plenty of ways in which statistics can help in the process of decision-taking. But exaggerated claims for the role they can play merely serve to confuse rather than clarify issues of public policy, and lead those responsible for action to oscillate between over-confidence and over-scepticism in using them." (Ely Devons, "Essays in Economics", 1961)

"There is a demand for every issue of economic policy to be discussed in terms of statistics, and even those who profess a general distrust of statistics are usually more impressed by an argument in support of a particular policy if it is backed up by figures. There is a passionate desire in our society to see issues of economic policy decided on what we think are rational grounds. We rebel against any admission of the uncertainty of our knowledge of the future as a confession of weakness." (Ely Devons, "Essays in Economics", 1961)

"There seems to be striking similarities between the role of economic statistics in our society and some of the functions which magic and divination play in primitive society." (Ely Devons, "Essays in Economics", 1961)

"This exaggerated influence of statistics resulting from willingness, indeed eagerness, to be impressed by the 'hard facts' provided by the 'figures', may play an important role in decision-making." (Ely Devons, "Essays in Economics", 1961)

"We all know that in economic statistics particularly, true precision, comparability and accuracy is extremely difficult to achieve, and it is for this reason that the language of economic statistics is so difficult to handle." (Ely Devons, "Essays in Economics", 1961)

🖍️Sinan Ozdemir - Collected Quotes

"Attention is a mechanism used in deep learning models (not just Transformers) that assigns different weights to different parts of the input, allowing the model to prioritize and emphasize the most important information while performing tasks like translation or summarization. Essentially, attention allows a model to 'focus' on different parts of the input dynamically, leading to improved performance and more accurate results. Before the popularization of attention, most neural networks processed all inputs equally and the models relied on a fixed representation of the input to make predictions. Modern LLMs that rely on attention can dynamically focus on different parts of input sequences, allowing them to weigh the importance of each part in making predictions." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"[...] building an effective LLM-based application can require more than just plugging in a pre-trained model and retrieving results - what if we want to parse them for a better user experience? We might also want to lean on the learnings of massively large language models to help complete the loop and create a useful end-to-end LLM-based application. This is where prompt engineering comes into the picture." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Different algorithms may perform better on different types of text data and will have different vector sizes. The choice of algorithm can have a significant impact on the quality of the resulting embeddings. Additionally, open-source alternatives may require more customization and finetuning than closed-source products, but they also provide greater flexibility and control over the embedding process." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Embeddings are the mathematical representations of words, phrases, or tokens in a largedimensional space. In NLP, embeddings are used to represent the words, phrases, or tokens in a way that captures their semantic meaning and relationships with other words. Several types of embeddings are possible, including position embeddings, which encode the position of a token in a sentence, and token embeddings, which encode the semantic meaning of a token." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Fine-tuning involves training the LLM on a smaller, task-specific dataset to adjust its parameters for the specific task at hand. This allows the LLM to leverage its pre-trained knowledge of the language to improve its accuracy for the specific task. Fine-tuning has been shown to drastically improve performance on domain-specific and task-specific tasks and lets LLMs adapt quickly to a wide variety of NLP applications." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Language modeling is a subfield of NLP that involves the creation of statistical/deep learning models for predicting the likelihood of a sequence of tokens in a specified vocabulary (a limited and known set of tokens). There are generally two kinds of language modeling tasks out there: autoencoding tasks and autoregressive tasks." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Large language models (LLMs) are AI models that are usually (but not necessarily) derived from the Transformer architecture and are designed to understand and generate human language, code, and much more. These models are trained on vast amounts of text data, allowing them to capture the complexities and nuances of human language. LLMs can perform a wide range of language-related tasks, from simple text classification to text generation, with high accuracy, fluency, and style." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"LLMs encode information directly into their parameters via pre-training and fine-tuning, but keeping them up to date with new information is tricky. We either have to further fine-tune the model on new data or run the pre-training steps again from scratch." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Prompt engineering involves crafting inputs to LLMs (prompts) that effectively communicate the task at hand to the LLM, leading it to return accurate and useful outputs. Prompt engineering is a skill that requires an understanding of the nuances of language, the specific domain being worked on, and the capabilities and limitations of the LLM being used." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Specific word choices in our prompts can greatly influence the output of the model. Even small changes to the prompt can lead to vastly different results. For example, adding or removing a single word can cause the LLM to shift its focus or change its interpretation of the task. In some cases, this may result in incorrect or irrelevant responses; in other cases, it may produce the exact output desired." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Text embeddings are a way to represent words or phrases as machine-readable numerical vectors in a multidimensional space, generally based on their contextual meaning. The idea is that if two phrases are similar, then the vectors that represent those phrases should be close together by some measure (like Euclidean distance), and vice versa." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"The idea behind transfer learning is that the pre-trained model has already learned a lot of information about the language and relationships between words, and this information can be used as a starting point to improve performance on a new task. Transfer learning allows LLMs to be fine-tuned for specific tasks with much smaller amounts of task-specific data than would be required if the model were trained from scratch. This greatly reduces the amount of time and resources needed to train LLMs." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Transfer learning is a technique used in machine learning to leverage the knowledge gained from one task to improve performance on another related task. Transfer learning for LLMs involves taking an LLM that has been pre-trained on one corpus of text data and then fine-tuning it for a specific 'downstream' task, such as text classification or text generation, by updating themodel’s parameters with task-specific data." (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

"Transfer learning is a technique that leverages pre-trained models to build upon existing knowledge for new tasks or domains. In the case of LLMs, this involves utilizing the pre-training to transfer general language understanding, including grammar and general knowledge, to particular domain-specific tasks. However, the pre-training may not be sufficient to understand the nuances of certain closed or specialized topics [...]" (Sinan Ozdemir, "Quick Start Guide to Large Language Models: Strategies and Best Practices for Using ChatGPT and Other LLMs", 2024)

03 April 2006

⛩️Jeremy C Morgan - Collected Quotes

"Another problem that can be confusing is that LLMs seldom put out the same thing twice. [...] Traditional databases are straightforward - you ask for something specific, and you get back exactly what was stored. Search engines work similarly, finding existing information. LLMs work differently. They analyze massive amounts of text data to understand statistical patterns in language. The model processes information through multiple layers, each capturing different aspects - from simple word patterns to complex relationships between ideas." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"As the old saying goes, 'Garbage in, garbage out.' Generative AI tools are only as good as the data they’re trained on. They need high-quality, diverse, and extensive datasets to create great code as output. Unfortunately, you have no control over this input. You must trust the creators behind the product are using the best code possible for the corpus, or data used for training. Researching the tools lets you learn how each tool gathers data and decide based on that." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Context is crucial for how language models understand and generate code. The model processes your input by analyzing relationships between different parts of the code and documentation to determine meaning and intent. [...] The model evaluates context by calculating mathematical relationships between elements in your input. However, it may miss important domain knowledge, coding standards, or architectural patterns that experienced developers understand implicitly." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Context manipulation involves setting up an optimal environment within the prompt to help a model generate accurate and relevant responses. By controlling the context in which the model operates, users can influence the output’s quality, consistency, and specificity, especially in tasks requiring clarity and precision. Context manipulation involves priming the model with relevant information, presenting examples within the prompt, and utilizing system messages to maintain the desired behavior." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Creating software is like building a house. The foundation is the first step; you can’t start without it. Building the rest of the house will be a struggle if the foundation doesn’t meet the requirements. If you don’t have the time to be thoughtful and do it right, you won’t have the time to fix it later." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Design is key in software development, yet programmers often rush it. I’ve done this, too. Taking time to plan an app’s architecture leads to happy users and lower maintenance costs." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"First, training data is created by taking existing source code in many languages and feeding it into a model. This model is evaluated and has layers that look for specific things. One layer checks the type of syntax. Another checks for keywords and how they’re used. The final layer determines whether :this is most likely to be correct and functional source code'. There is a vast array of machine learning algorithms that use the model to run through these layers and draw conclusions. Then, the AI produces output that is a prediction of what the new software should look like. The tool says, 'based on what I know, this is the most statistically likely code you’re looking for'. Then you, the programmer, reach the evaluation point. If you give it a thumbs up, the feedback returns to the model (in many cases, not always) as a correct prediction. If you give it a thumbs down and reject it, that is also tracked. With this continuous feedback, the tool learns what good code should look like." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Generative AI is a kind of statistical mimicry of the real world, where algorithms learn patterns and try to create things." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Generative AI for coding and language tools is based on the LLM concept. A large language model is a type of neural network that processes and generates text in a humanlike way. It does this by being trained on a massive dataset of text, which allows it to learn human language patterns, as described previously. It lets LLMs translate, write, and answer questions with text. LLMs can contain natural language, source code, and more." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Generative AI tools for coding are sometimes inaccurate. They can produce results that look good but are wrong. This is common with LLMs. They can write code or chat like a person. And sometimes, they share information that’s just plain wrong. Not just a bit off, but totally backwards or nonsense. And they say it so confidently! We call this 'hallucinating', which is a funny term, but it makes sense." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Great planning and initial setup are crucial for a successful project. Having an idea and immediately cracking open an IDE is rarely a good approach. Many developers find the planning process boring and tiresome. Generative AI tools make these tasks more efficient, accurate, and enjoyable. If you don’t like planning and setup, they can make the process smoother and faster. If you enjoy planning, you may find these tools make it even more fun." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"In machine learning, 'training' is when we teach models to understand language and code by analyzing massive amounts of data. During training, the model learns statistical patterns - how often certain words appear together, what code structures are common, andhow different parts of text relate to each other. The quality of training data directly affects how well the model performs." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"It’s a pattern-matching predictor, not a knowledge retriever. It’s great at what it does, but since it works by prediction, it can predict nonsense just as confidently as it predicts facts. So, when you use these tools, be curious and skeptical! Don’t just accept what it gives you. Ask, 'Is this just a likely sounding pattern, or is it actually right?' Understanding how generative AI works helps you know when to trust it and when to double-check. Keeping this skepticism in mind is crucial when working with these tools to produce code." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"It’s essentially a sophisticated prediction system. Instead of looking up stored answers, an LLM calculates probabilities to determine what text should come next. While these predictions are often accurate, they’re still predictions - which is why it’s crucial to verify any code or factual claims the model generates. This probabilistic nature makes LLMs powerful tools for generating text and code but also means they can make mistakes, even when seeming very confident. Understanding this helps set realistic expectations about what these tools can and cannot do reliably." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Professional software developers must know how to use AI tools strategically. This involves mastering advanced prompting techniques and working with AI across various files and modules. We must also learn how to manage context wisely. This is a new concept for most, and it is vitally important with code generation. AI-generated code requires the same scrutiny and quality checks as any code written by humans." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Recursive prompting is a systematic approach to achieving higher-quality outputs through iterative refinement. Rather than accepting the first response, it uses a step-by-step process of evaluation and improvement, making it particularly valuable for complex tasks such as code development, writing, and problem-solving. Our example demonstrated how a basic factorial function evolved from a simple implementation to a robust, optimized solution through multiple iterations of targeted refinements." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Stubbing is a fundamental technique in software development where simplified placeholder versions of code components are created before implementing the full functionality. It is like building the frame of a house before adding the walls, plumbing, and electrical systems. The stubs provide a way to test the overall structure and flow of an application early on, without getting bogged down in the details of individual components." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Testing is like an investment. You spend time building tests now to strengthen your product. This approach saves time and frustration by catching problems early. As your software evolves, each passing test reaffirms that your product still works properly. However, in today’s fast-paced development world, testing often falls behind. This is where generative AI can aid developers as a valuable resource." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"Unlike traditional code completion, which operates on predefined rules, generative AI creates a continuous improvement cycle, which includes the following five basic steps: (1) Developer input: You provide source code, comments, or natural language requirements. (2) Context analysis: The model analyzes patterns in your existingcode and requirements. (3) Prediction: Based on training data and your specific context, the model generates probable code. (4) Developer feedback: You accept, modify, or reject suggestions. (5) Model adaptation: The system incorporates your feedback to improve future suggestions." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"This ability to zero in on important code is why modern AI coding assistants can offer meaningful suggestions for your specific needs. It’s similar to how skilled developers know which code sections affect a new implementation the most. Each transformer layer learns about various code patterns, ranging from syntax validation to understanding the relationships among functions, classes, and modules." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

"When building new software, the clarity and precision of project requirements are pivotal. Getting the requirements right is critical as they often determine whether a software project meets its deadlines or faces significant delays. Requirements always change. Also, they’re frequently misinterpreted because we tend to grab the requirements and get to work. There is a lot of room for error here, so if we rush, we can get in trouble. Because generative AI tools make the requirements gathering process easier and faster, we can spend more time working on those requirements and getting them right." (Jeremy C Morgan, "Coding with AI: Examples in Python", 2025)

🖍️Kristin H Jarman - Collected Quotes

"A study is any data collection exercise. The purpose of any study is to answer a question. [...] Once the question has been clearly articulated, it’s time to design a study to answer it. At one end of the spectrum, a study can be a controlled experiment, deliberate and structured, where researchers act like the ultimate control freaks, manipulating everything from the gender of their test subjects to the humidity in the room. Scientific studies, the kind run by men in white lab coats and safety goggles, are often controlled experiments. At the other end of the spectrum, an observational study is simply the process of watching something unfold without trying to impact the outcome in any way." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"According to the central limit theorem, it doesn’t matter what the raw data look like, the sample variance should be proportional to the number of observations and if I have enough of them, the sample mean should be normal." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"Although it’s a little more complicated than [replication and random sampling], blocking is a powerful way to eliminate confounding factors. Blocking is the process of dividing a sample into one or more similar groups, or blocks, so that samples in each block have certain factors in common. This technique is a great way to gain a little control over an experiment with lots of uncontrollable factors." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"Any factor you don’t account for can become a confounding factor. A confounding factor is any variable that confuses the conclusions of your study, or makes them ambiguous. [...] Confounding factors can really screw up an otherwise perfectly good statistical analysis." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"Any time you collect data, you have uncertainty to deal with. This uncertainty comes from two places: (1) inherent variation in the values a random variable can take on and (2) the fact that for most studies, you can’t capture the entire population and so you must rely on a sample to make your conclusions." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"Choosing and organizing a sample is a crucial part of the experimental design process. Statistically speaking, the best type of sample is called a random sample. A random sample is a subset of the entire population, chosen so each member is equally likely to be picked. [...] Random sampling is the best way to guarantee you’ve chosen objectively, without personal preference or bias." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"Probability, the mathematical language of uncertainty, describes what are called random experiments, bets, campaigns, trials, games, brawls, and anything other situation where the outcome isn’t known beforehand. A probability is a fraction, a value between zero and one that measures the likelihood a given outcome will occur. A probability of zero means the outcome is virtually impossible. A probability of one means it will almost certainly happen. A probability of one-half means the outcome is just as likely to occur as not." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"Replication is the process of taking more than one observation or measurement. [...] Replication helps eliminate negative effects of uncontrollable factors, because it keeps us from getting fooled by a single, unusual outcome." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"The random experiment, or trial, is the situation whose outcome is uncertain, the one you’re watching. A coin toss is a random experiment, because you don’t know beforehand whether it will turn up heads or tails. The sample space is the list of all possible separate and distinct outcomes in your random experiment. The sample space in a coin toss contains the two outcomes heads and tails. The outcome you're interested in calculating a probability for is the event. On a coin toss, that might be the case where the coin lands on heads." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"The scientific method is the foundation of modern research. It’s how we prove a theory. It’s how we demonstrate cause and effect. It’s how we discover, innovate, and invent. There are five basic steps to the scientific method: (1) Ask a question. (2) Conduct background research. (3) Come up with a hypothesis. (4) Test the hypothesis with data. (5) Revise and retest the hypothesis until a conclusion can be made." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

"There are three important requirements for the probability distribution. First, it should be defined for every possible value the random variable can take on. In other words, it should completely describe the sample space of a random experiment. Second, the probability distribution values should always be nonnegative. They’re meant to measure probabilities, after all, and probabilities are never less than zero. Finally, when all the probability distribution values are summed together, they must add to one." (Kristin H Jarman, "The Art of Data Analysis: How to answer almost any question using basic statistics", 2013)

♯OOP: Attribute (Definitions)

"Additional characteristics or information defined for an entity." (Owen Williams, "MCSE TestPrep: SQL Server 6.5 Design and Implementation", 1998)

"A named characteristic or property of a class." (Craig Larman, "Applying UML and Patterns", 2004)

"A characteristic, quality, or property of an entity class. For example, the properties 'First Name' and 'Last Name' are attributes of entity class 'Person'." (Danette McGilvray, "Executing Data Quality Projects", 2008)

"Another name for a field, used by convention in many object-oriented programming languages. Scala follows Java’s convention of preferring the term field over attribute." (Dean Wampler & Alex Payne, "Programming Scala", 2009)

"1. (UML diagram) A descriptor of a kind of information captured about an object class. 2. (Relational theory) The definition of a descriptor of a relation." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A fact type element (specifically a characteristic assignment) that is a descriptor of an entity class." (David C Hay, "Data Model Patterns: A Metadata Map", 2010)

"A characteristic of an object." (Requirements Engineering Qualifications Board, "Standard glossary of terms used in Requirements Engineering", 2011)

"An inherent characteristic, an accidental quality, an object closely associated with or belonging to a specific person, place, or office; a word ascribing a quality." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

02 April 2006

🖍️Prashant Natarajan - Collected Quotes

"Data quality in warehousing and BI is typically defined in terms of the 4 C’s—is the data clean, correct, consistent, and complete? When it comes to big data, there are two schools of thought that have different views and expectations of data quality. The first school believes that the gold standard of the 4 C’s must apply to all data (big and little) used for clinical care and performance metrics. The second school believes that in big data environments, a stringent data quality standard is impossible, too costly, or not required. While diametrically opposite opinions may play well in panel discussions, they do little to reconcile the realities of healthcare data quality." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Data warehousing has always been difficult, because leaders within an organization want to approach warehousing and analytics as just another technology or application buy. Viewed in this light, they fail to understand the complexity and interdependent nature of building an enterprise reporting environment." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Generalization is a core concept in machine learning; to be useful, machine-learning algorithms can’t just memorize the past, they must learn from the past. Generalization is the ability to respond properly to new situations based on experience from past situations." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"The field of big-data analytics is still littered with a few myths and evidence-free lore. The reasons for these myths are simple: the emerging nature of technologies, the lack of common definitions, and the non-availability of validated best practices. Whatever the reasons, these myths must be debunked, as allowing them to persist usually has a negative impact on success factors and Return on Investment (RoI). On a positive note, debunking the myths allows us to set the right expectations, allocate appropriate resources, redefine business processes, and achieve individual/organizational buy-in." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"The first myth is that prediction is always based on time-series extrapolation into the future (also known as forecasting). This is not the case: predictive analytics can be applied to generate any type of unknown data, including past and present. In addition, prediction can be applied to non-temporal (time-based) use cases such as disease progression modeling, human relationship modeling, and sentiment analysis for medication adherence, etc. The second myth is that predictive analytics is a guarantor of what will happen in the future. This also is not the case: predictive analytics, due to the nature of the insights they create, are probabilistic and not deterministic. As a result, predictive analytics will not be able to ensure certainty of outcomes." (Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

"Your machine-learning algorithm should answer a very specific question that tells you something you need to know and that can be answered appropriately by the data you have access to. The best first question is something you already know the answer to, so that you have a reference and some intuition to compare your results with. Remember: you are solving a business problem, not a math problem."(Prashant Natarajan et al, "Demystifying Big Data and Machine Learning for Healthcare", 2017)

🖍️Andrew Ng - Collected Quotes

"AI is not a panacea. It cannot solve all problems. And like every technological disruption before it (the steam engine, internal combustion, electricity), it will bring about disruption good and bad." (Andrew Ng, [blog post] 2018)

"Carrying out error analysis on a learning algorithm is like using data science to analyze an ML system’s mistakes in order to derive insights about what to do next. At its most basic, error analysis by parts tells us what component(s) performance is (are) worth the greatest effort to improve." (Andrew Ng, "Machine Learning Yearning", 2018)

"In practice, increasing the size of your model will eventually cause you to run into computational problems because training very large models is slow. You might also exhaust your ability to acquire more training data. [...] Increasing the model size generally reduces bias, but it might also increase variance and the risk of overfitting. However, this overfitting problem usually arises only when you are not using regularization. If you include a well-designed regularization method, then you can usually safely increase the size of the model without increasing overfitting." (Andrew Ng, "Machine Learning Yearning", 2018)

"It is very difficult to know in advance what approach will work best for a new problem. Even experienced machine learning researchers will usually try out many dozens of ideas before they discover something satisfactory." (Andrew Ng, "Machine Learning Yearning", 2018)

"Keep in mind that artificial data synthesis has its challenges: it is sometimes easier to create synthetic data that appears realistic to a person than it is to create data that appears realistic to a computer." (Andrew Ng, "Machine Learning Yearning", 2018)

"AI is the new electricity: even with its current limitations, it is already transforming multiple industries. (Andrew Ng, [blog post] 2018)

"Artificial Intelligence can't solve all the world's problems, but it can help us with some of the biggest ones." (Andrew Ng) [attributed]

"Artificial Intelligence is a tool to help us be better humans, to help us get through the world more easily and richer and be more productive and engaging." (Andrew Ng) [attributed]

"If you can collect really large datasets, the algorithms often don't matter." (Andrew Ng) [attributed]

"Missing data is an opportunity, not a limitation." (Andrew Ng) [attributed]

"No one knows what the right algorithm is, but it gives us hope that if we can discover some crude approximation of whatever this algorithm is and implement it on a computer, that can help us make a lot of progress." (Andrew Ng) [attributed]

"Real-world problems are messy, and they rarely fit exactly into one category or another." (Andrew Ng) [attributed]

"The ability to innovate and to be creative are teachable processes. There are ways by which people can systematically innovate or systematically become creative." (Andrew Ng) [attributed]

"The key to AI success is not just having the right algorithms, but also having the right data to train those algorithms." (Andrew Ng) [attributed]

"The more data we can feed into the algorithms, the better models we can build." (Andrew Ng) [attributed]

🖍️John D Barrow - Collected Quotes

"Each of the most basic physical laws that we know corresponds to some invariance, which in turn is equivalent to a collection of changes which form a symmetry group. […] whilst leaving some underlying theme unchanged. […] for example, the conservation of energy is equivalent to the invariance of the laws of motion with respect to translations backwards or forwards in time […] the conservation of linear momentum is equivalent to the invariance of the laws of motion with respect to the position of your laboratory in space, and the conservation of angular momentum to an invariance with respect to directional orientation [...] discovery of conservation laws indicated that Nature possessed built-in sustaining principles which prevented the world from just ceasing to be." (John D Barrow, "New Theories of Everything: The Quest for Ultimate Explanation", 1991)

"Everywhere […] in the Universe, we discern that closed physical systems evolve in the same sense from ordered states towards a state of complete disorder called thermal equilibrium. This cannot be a consequence of known laws of change, since […] these laws are time symmetric- they permit […] time-reverse. […] The initial conditions play a decisive role in endowing the world with its sense of temporal direction. […] some prescription for initial conditions is crucial if we are to understand […]" (John D Barrow, "New Theories of Everything: The Quest for Ultimate Explanation", 1991)

"In practice, the intelligibility of the world amounts to the fact that we find it to be algorithmically compressible. We can replace sequences of facts and observational data by abbreviated statements which contain the same information content. These abbreviations we often call 'laws of Nature.' If the world were not algorithmically compressible, then there would exist no simple laws of nature. Instead of using the law of gravitation to compute the orbits of the planets at whatever time in history we want to know them, we would have to keep precise records of the positions of the planets at all past times; yet this would still not help us one iota in predicting where they would be at any time in the future. This world is potentially and actually intelligible because at some level it is extensively algorithmically compressible. At root, this is why mathematics can work as a description of the physical world. It is the most expedient language that we have found in which to express those algorithmic compressions." (John D Barrow, "New Theories of Everything: The Quest for Ultimate Explanation", 1991)

"On this view, we recognize science to be the search for algorithmic compressions. We list sequences of observed data. We try to formulate algorithms that compactly represent the information content of those sequences. Then we test the correctness of our hypothetical abbreviations by using them to predict the next terms in the string. These predictions can then be compared with the future direction of the data sequence. Without the development of algorithmic compressions of data all science would be replaced by mindless stamp collecting - the indiscriminate accumulation of every available fact. Science is predicated upon the belief that the Universe is algorithmically compressible and the modern search for a Theory of Everything is the ultimate expression of that belief, a belief that there is an abbreviated representation of the logic behind the Universe's properties that can be written down in finite form by human beings." (John D Barrow, "New Theories of Everything: The Quest for Ultimate Explanation", 1991)

"The goal of science is to make sense of the diversity of Nature." (John D Barrow, "New Theories of Everything: The Quest for Ultimate Explanation", 1991)

"There is one qualitative aspect of reality that sticks out from all others in both profundity and mystery. It is the consistent success of mathematics as a description of the workings of reality and the ability of the human mind to discover and invent mathematical truths." (John D Barrow, "New Theories of Everything: The Quest for Ultimate Explanation", 1991)

"Highly correlated brown and black noise patterns do not seem to have seem to have attractive counterparts in the visual arts. There, over-correlation is the order of the day, because it creates the same dramatic associations that we find in attractive natural scenery, or in the juxtaposition of symbols. Somehow, it is tediously predictable when cast in a one-dimensional medium, like sound." (John D Barrow, "The Artful Universe", 1995)

"Where there is life there is a pattern, and where there is a pattern there is mathematics." (John D Barrow, "The Artful Universe", 1995)

"The advent of small, inexpensive computers with superb graphics has changed the way many sciences are practiced, and the way that all sciences present the results of experiments and calculations." (John D Barrow, "Cosmic Imagery: Key Images in the History of Science", 2008)

🖍️Herbert F Spirer - Collected Quotes

"Clearly, the mean is greatly influenced by extreme values, but it can be appropriate for many situations where extreme values do not arise. To avoid misuse, it is essential to know which summary measure best reflects the data and to use it carefully. Understanding the situation is necessary for making the right choice. Know the subject!" (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"'Garbage in, garbage out' is a sound warning for those in the computer field; it is every bit as sound in the use of statistics. Even if the “garbage” which comes out leads to a correct conclusion, this conclusion is still tainted, as it cannot be supported by logical reasoning. Therefore, it is a misuse of statistics. But obtaining a correct conclusion from faulty data is the exception, not the rule. Bad basic data (the 'garbage in') almost always leads to incorrect conclusions (the 'garbage out'). Unfortunately, incorrect conclusions can lead to bad policy or harmful actions." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"Graphic misrepresentation is a frequent misuse in presentations to the nonprofessional. The granddaddy of all graphical offenses is to omit the zero on the vertical axis. As a consequence, the chart is often interpreted as if its bottom axis were zero, even though it may be far removed. This can lead to attention-getting headlines about 'a soar' or 'a dramatic rise (or fall)'. A modest, and possibly insignificant, change is amplified into a disastrous or inspirational trend." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"If you want to show the growth of numbers which tend to grow by percentages, plot them on a logarithmic vertical scale. When plotted against a logarithmic vertical axis, equal percentage changes take up equal distances on the vertical axis. Thus, a constant annual percentage rate of change will plot as a straight line. The vertical scale on a logarithmic chart does not start at zero, as it shows the ratio of values (in this case, land values), and dividing by zero is impossible." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"In analyzing data, more is not necessarily better. Unfortunately, it is not always possible to have one uniquely correct procedure for analyzing a given data set. An investigator may use several different methods of statistical analysis on a data set. Furthermore, different outcomes may result from the use of different analytical methods. If more than one conclusion results, then an investigator is committing a misuse of statistics unless the investigator shows and reconciles all the results. If the investigator shows only one conclusion or interpretation, ignoring the alternative procedure(s), the work is a misuse of statistics." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"It is a consequence of the definition of the arithmetic mean that the mean will lie somewhere between the lowest and highest values. In the unrealistic and meaningless case that all values which make up the mean are the same, all values will be equal to the average. In an unlikely and impractical case, it is possible for only one of many values to be above or below the average. By the very definition of the average, it is impossible for all values to be above average in any case." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"It is a major statistical sin to show a graph displaying a variable as a function of time with the vertical (left-hand) scale cut short so that it does not go down to zero, without drawing attention to this fact. This sin can create a seriously misleading impression, and, as they do with most sins, sinners commit it again and again." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"It is a misuse of statistics to use whichever set of statistics suits the purpose at hand and ignore the conflicting sets and the implications of the conflicts." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"Jargon and complex methodology have their place. But true professional jargon is merely a shorthand way of speaking. Distrust any jargon that cannot be translated into plain English. Sophisticated methods can bring unique insights, but they can also be used to cover inadequate data and thinking. Good analysts can explain their methods in simple, direct terms. Distrust anyone who can't make clear how they have treated the data." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"Know the subject matter, learn it fast, or get a trustworthy expert. To identify the unknown, you must know the known. But don't be afraid to challenge experts on the basis of your logical reasoning. Sometimes a knowledge of the subject matter can blind the expert to the novel or unexpected." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"Percentages seem to invite misuse, perhaps because they require such careful thinking." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"There is no shortage of statistical methods. Elementary statistics textbooks list dozens, and statisticians constantly develop and report new ones. But if a researcher uses the wrong method, a clear misuse, to analyze a specific set of data, then the results may be incorrect." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

"When an analyst selects the wrong tool, this is a misuse which usually leads to invalid conclusions. Incorrect use of even a tool as simple as the mean can lead to serious misuses. […] But all statisticians know that more complex tools do not guarantee an analysis free of misuses. Vigilance is required on every statistical level." (Herbert F Spirer et al, "Misused Statistics" 2nd Ed, 1998)

01 April 2006

🖍️Alfred R Ilersic - Collected Quotes

"Diagrams are sometimes used, not merely to convey several pieces of information such as several time series on one chart, but also to provide visual evidence of relationships between the series." (Alfred R Ilersic, "Statistics", 1959)

"Everybody has some idea of the meaning of the term 'probability' but there is no agreement among scientists on a precise definition of the term for the purpose of scientific methodology. It is sufficient for our purpose, however, if the concept is interpreted in terms of relative frequency, or more simply, how many times a particular event is likely to occur in a large population." (Alfred R Ilersic, "Statistics", 1959)

"However informative and well designed a statistical table may be, as a medium for conveying to the reader an immediate and clear impression of its content, it is inferior to a good chart or graph. Many people are incapable of comprehending large masses of information presented in tabular form; the figures merely confuse them. Furthermore, many such people are unwilling to make the effort to grasp the meaning of such data. Graphs and charts come into their own as a means of conveying information in easily comprehensible form." (Alfred R Ilersic, "Statistics", 1959)

"In brief, the greatest care must be exercised in using any statistical data, especially when it has been collected by another agency. At all times, the statistician who uses published data must ask himself, by whom were the data collected, how and for what purpose?" (Alfred R Ilersic, "Statistics", 1959)

"It is a good rule to remember that the first step in analyzing any statistical data, whether it be culled from an official publication or a report prepared by someone else, is to check the definitions used for classification." (Alfred R Ilersic, "Statistics", 1959)

"It is helpful to remember when dealing with index numbers that they are specialized tools and as such are most efficient and useful when properly used. A screwdriver is a poor substitute for a chisel, although it may be used as such. All index numbers are designed to measure particular groups of related changes." (Alfred R Ilersic, "Statistics", 1959)

"Most people tend to think of values and quantities expressed in numerical terms as being exact figures; much the same as the figures which appear in the trading account of a company. It therefore comes as a considerable surprise to many to learn that few published statistics, particularly economic and sociological data, are exact. Many published figures are only approximations to the real value, while others are estimates of aggregates which are far too large to be measured with precision." (Alfred R Ilersic, "Statistics", 1959)

"Numerical data, which have been recorded at intervals of time, form what is generally described as a time series. [...] The purpose of analyzing time series is not always the determination of the trend by itself. Interest may be centered on the seasonal movement displayed by the series and, in such a case, the determination of the trend is merely a stage in the process of measuring and analyzing the seasonal variation. If a regular basic or under- lying seasonal movement can be clearly established, forecasting of future movements becomes rather less a matter of guesswork and more a matter of intelligent forecasting." (Alfred R Ilersic, "Statistics", 1959)

"Often, in order to simplify statistical tables, the practice of rounding large figures and totals is resorted to. Where the constituent figures in a table together with their aggregate have been so treated, a discrepancy between the rounded total and the true sum of the rounded constituent figures frequently arises. Under no circumstances should the total be adjusted to what appears to be the right answer. A note to the table to the effect that the figures have been rounded, e.g. to the nearest 1,000, is all that is necessary. The same remark applies to percentage equivalents of the constituent parts of a total; it they do not add to exactly 100 per cent, leave them." (Alfred R Ilersic, "Statistics", 1959)

"Poor statistics may be attributed to a number of causes. There are the mistakes which arise in the course of collecting the data, and there are those which occur when those data are being converted into manageable form for publication. Still later, mistakes arise because the conclusions drawn from the published data are wrong. The real trouble with errors which arise during the course of collecting the data is that they are the hardest to detect." (Alfred R Ilersic, "Statistics", 1959)

"Statistical method consists of two main operations; counting and analysis. [...] The statistician has no use for information that cannot be expressed numerically, nor generally speaking, is he interested in isolated events or examples. The term 'data is itself plural and the statistician is concerned with the analysis of aggregates. " (Alfred R Ilersic, "Statistics", 1959)

"The averaging of percentages themselves requires care, where the percentages are each computed on different bases, i.e. different quantities. The average is not derived by aggregating the percentages and dividing them. Instead of this, each percentage must first be multiplied by its base to bring out its relative significance to the other percentages and to the total. The sum of the resultant products is then divided by the sum of the base values [...], not merely the number of items." (Alfred R Ilersic, "Statistics", 1959)

"The rounding of individual values comprising an aggregate can give rise to what are known as unbiased or biased errors. [...]The biased error arises because all the individual figures are reduced to the lower 1,000 [...] The unbiased error is so described since by rounding each item to the nearest 1,000 some of the approximations are greater and some smaller than the original figures. Given a large number of such approximations, the final total may therefore correspond very closely to the true or original total, since the approximations tend to offset each other. [...] With biased approximations, however, the errors are cumulative and their aggregate increases with the number of items in the series." (Alfred R Ilersic, "Statistics", 1959)

"The simplest way of indicating that figures are not given precisely to the last unit is to express them to the nearest 100 or 1,000; or in some cases to the nearest 100,000 or million. [...] The widespread desire for precision is reflected in many reports on economic trends which quote figures in great detail, rather than emphasizing the trends and movements reflected in the figures." (Alfred R Ilersic, "Statistics", 1959)

"The statistician has no use for information that cannot be expressed numerically, nor generally speaking, is he interested in isolated events or examples. The term ' data ' is itself plural and the statistician is concerned with the analysis of aggregates." (Alfred R Ilersic, "Statistics", 1959)

"The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data." (Alfred R Ilersic, "Statistics", 1959)

"There are good statistics and bad statistics; it may be doubted if there are many perfect data which are of any practical value. It is the statistician's function to discriminate between good and bad data; to decide when an informed estimate is justified and when it is not; to extract the maximum reliable information from limited and possibly biased data." (Alfred R Ilersic, "Statistics", 1959)

"This is the essential characteristic of a logarithmic scale. Any given increase, regardless of its absolute size, is related to a given base quantity. Thus, a perfectly straight line on such a graph denotes a constant percentage rate of increase, and not a constant absolute increase. It is the slope of the line or curve which is significant in such a graph. The steeper the slope, whether it be downwards or upwards, the more marked is the rate of change." (Alfred R Ilersic, "Statistics", 1959)

"This type of graph possesses a number of advantages. It is possible to graph a number of series of widely differing magnitudes on a single chart and bring out any relationship between their movements. How- ever wide the amplitude of the fluctuations in the series, a logarithmic scale reduces them to manageable size on a single sheet of graph paper, whereas, on a normal scale, it might prove impossible to get the larger fluctuations on to a single chart, except by so reducing the scale that all the other smaller movements in the series are almost obliterated." (Alfred R Ilersic, "Statistics", 1959)

"Time series analysis often requires more knowledge of the data and relevant information about their background than it does of statistical techniques. Whereas the data in some other fields may be controlled so as to increase their representativeness, economic data are so changeable in their nature that it is usually impossible to sort out the separate effects of the various influences. Attempts to isolate cyclical, seasonal and irregular, or random movements, are made primarily in the hope that some underlying pattern of change over time may be revealed." (Alfred R Ilersic, "Statistics", 1959)

"When using estimated figures, i.e. figures subject to error, for further calculation make allowance for the absolute and relative errors. Above all, avoid what is known to statisticians as 'spurious' accuracy. For example, if the arithmetic Mean has to be derived from a distribution of ages given to the nearest year, do not give the answer to several places of decimals. Such an answer would imply a degree of accuracy in the results of your calculations which are quite un- justified by the data. The same holds true when calculating percentages." (Alfred R Ilersic, "Statistics", 1959)

"While it is true to assert that much statistical work involves arithmetic and mathematics, it would be quite untrue to suggest that the main source of errors in statistics and their use is due to inaccurate calculations." (Alfred R Ilersic, "Statistics", 1959)

Book available on Archive.org.

🖍️Charles Livingston - Collected Quotes

"Cautions about combining groups: apples and oranges. In computing an average, be careful about combining groups in which the average for each group is of more interest than the overall average. […] Avoid combining distinct quantities in a single average." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Central tendency is the formal expression for the notion of where data is centered, best understood by most readers as 'average'. There is no one way of measuring where data are centered, and different measures provide different insights." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Concluding that the population is becoming more centralized by observing behavior at the extremes is called the 'Regression to the Mean' Fallacy. […] When looking for a change in a population, do not look only at the extremes; there you will always find a motion to the mean. Look at the entire population." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Data often arrive in raw form, as long lists of numbers. In this case your job is to summarize the data in a way that captures its essence and conveys its meaning. This can be done numerically, with measures such as the average and standard deviation, or graphically. At other times you find data already in summarized form; in this case you must understand what the summary is telling, and what it is not telling, and then interpret the information for your readers or viewers." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"If a hypothesis test points to rejection of the alternative hypothesis, it might not indicate that the null hypothesis is correct or that the alternative hypothesis is false." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Limit a sentence to no more than three numerical values. If you've got more important quantities to report, break those up into other sentences. More importantly, however, make sure that each number is an important piece of information. Which are the important numbers that truly advance the story?" (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Numbers are often useful in stories because they record a recent change in some amount, or because they are being compared with other numbers. Percentages, ratios and proportions are often better than raw numbers in establishing a context." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Probability is sometimes called the language of statistics. […] The probability of an event occurring might be described as the likelihood of it happening. […] In a formal sense the word "probability" is used only when an event or experiment is repeatable and the long term likelihood of a certain outcome can be determined." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"Roughly stated, the standard deviation gives the average of the differences between the numbers on the list and the mean of that list. If data are very spread out, the standard deviation will be large. If the data are concentrated near the mean, the standard deviation will be small." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"The basic idea of going from an estimate to an inference is simple. Drawing the conclusion with confidence, and measuring the level of confidence, is where the hard work of professional statistics comes in." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"The central limit theorem […] states that regardless of the shape of the curve of the original population, if you repeatedly randomly sample a large segment of your group of interest and take the average result, the set of averages will follow a normal curve." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"The dual meaning of the word significant brings into focus the distinction between drawing a mathematical inference and practical inference from statistical results." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

"The percentage is one of the best (mathematical) friends a journalist can have, because it quickly puts numbers into context. And it's a context that the vast majority of readers and viewers can comprehend immediately." (Charles Livingston & Paul Voakes, "Working with Numbers and Statistics: A handbook for journalists", 2005)

24 March 2006

🧿Glenn Greenwald - Collected Quotes

"A population, a country that venerates physical safety above all other values will ultimately give up its liberty and sanction any power seized by authority in exchange for the promise, no matter how illusory, of total security. However, absolute safety is itself chimeric, pursued by never obtained. The pursuit degrades those who engage in it as well as any nation that comes to be defined by it." (Glenn Greenwald, "No Place to Hide", 2014)

"A prime justification for surveillance - that it’s for the benefit of the population - relies on projecting a view of the world that divides citizens into categories of good people and bad people. In that view, the authorities use their surveillance powers only against bad people, those who are “doing something wrong,” and only they have anything to fear from the invasion of their privacy." (Glenn Greenwald, "No Place to Hide", 2014)

"Converting the Internet into a system of surveillance thus guts it of its core potential. Worse, it turns the Internet into a tool of repression, threatening to produce the most extreme and oppressive weapon of state intrusion human history has ever seen." (Glenn Greenwald, "No Place to Hide", 2014)

"Democracy requires accountability and consent of the governed, which is only possible if citizens know what is being done in their name." (Glenn Greenwald, "No Place to Hide", 2014)

"Far from hyperbole, that is the literal, explicitly stated aim of the surveillance state: to collect, store, monitor, and analyze all electronic communication by all people around the globe." (Glenn Greenwald, "No Place to Hide", 2014)

"For many kids, the Internet is a means of self-actualization. It allows them to explore who they are and who they want to be, but that works only if we’re able to be private and anonymous, to make mistakes without them following us." (Glenn Greenwald, "No Place to Hide", 2014)

"Technology has now enabled a type of ubiquitous surveillance that had previously been the province of only the most imaginative science fiction writers." (Glenn Greenwald, "No Place to Hide", 2014)

"The ability to eavesdrop on people’s communications vests immense power in those who do it. And unless such power is held in check by rigorous oversight and accountability, it is almost certain to be abused." (Glenn Greenwald, "No Place to Hide", 2014)

"The principle which protects personal writings and all other personal productions, not against theft and physical appropriation, but against publication in any form, is in reality not the principle of private property, but that of an inviolate personality." (Glenn Greenwald, "No Place to Hide", 2014)

"To permit surveillance to take root on the Internet would mean subjecting virtually all forms of human interaction, planning, and even thought itself to comprehensive state examination." (Glenn Greenwald, "No Place to Hide", 2014)

"We all instinctively understand that the private realm is where we can act, think, speak, write, experiment, and choose how to be, away from the judgmental eyes of others. Privacy is a core condition of being a free person." (Glenn Greenwald, "No Place to Hide", 2014)

"We shouldn't have to be faithful loyalists of the powerful to feel safe from state surveillance. Nor should the price of immunity be refraining from controversial or provocative dissent. We shouldn't want a society where the message is conveyed that you will be left alone only if you mimic the accommodating behavior and conventional wisdom of an establishment columnist." (Glenn Greenwald, "No Place to Hide", 2014)

"What made the Internet so appealing was precisely that it afforded the ability to speak and act anonymously, which is so vital to individual exploration." (Glenn Greenwald, "No Place to Hide", 2014)

16 March 2006

OOP: Generalization (Definitions)

"The activity of identifying commonality among concepts and defining a superclass (general concept) and subclass (specialized concept) relationships. It is a way to construct taxonomic classifications among concepts which are then illustrated in class hierarchies." (Craig Larman, "Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process", 1997)

"The activity of identifying commonality among concepts and defining a superclass (general concept) and subclass (specialized concept) relationships. It is a way to construct taxonomic classifications among concepts, which are then illustrated in class hierarchies. Conceptual subclasses conform to conceptual superclasses in terms of intension and extension." (Craig Larman, "Applying UML and Patterns", 2004)

"The process of forming a more comprehensive or less restrictive class (a superclass) from one or more entities (or classes, in Unified Modeling Language [UML])." (Sharon Allen & Evan Terry, "Beginning Relational Data Modeling 2nd Ed.", 2005)

"In extended ER model (EER model), generalization is a structure in which one object generally describes more specialized objects." (S. Sumathi & S. Esakkirajan, "Fundamentals of Relational Database Management Systems", 2007)

"A special type of abstraction relationship that specifies that several types of entities with certain common attributes can be generalized (or abstractly defined) with a higher-level entity type, a supertype entity; an 'is-a' type relationship. For example, employee is a generalization of engineer, manager, and administrative assistant, based on the common attribute job-title. A tool often used to make view integration possible." (Toby J Teorey, ", Database Modeling and Design" 4th Ed., 2010)

"In a specialization hierarchy, the grouping together of common attributes into a supertype entity. See specialization hierarchy." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management 9th Ed", 2011)

"The process of evaluating multiple relationships between entities in a set into fewer relationships. Usually necessary after other generalization activities have taken place, which carry the relationships of the specialized entities into the generalized entities. For example, two 1:M relationships between two entities, each having a different parent, can be generalized into a M:N relationship." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The process of recognizing commonalities, and combining similar types of entities or objects into a less specialized type based on common attributes and behaviors, creating a supertype for two or more specialized subtypes. Contrast with specialization." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"The abstraction, reduction, and simplification of features and feature classes for deriving a simpler model of reality or decreasing stored." (GRC Data Intelligence)

04 March 2006

♯OOP: Method (Definitions)

"A function that performs an action by using a component object model (COM) object, as in SQL-DMO, OLE DB, and ADO." (Microsoft Corporation, "SQL Server 7.0 System Administration Training Kit", 1999)

"A programmatic operation such as a procedure or function defined on an object type or class." (Bill Pribyl & Steven Feuerstein, "Learning Oracle PL/SQL", 2001)

"A callable set of execution instructions. Methods specify a contract; that is, they have a name, a number of parameters, and a return type. Clients that need to call a method must satisfy the contract when calling the method. Several kinds of methods are possible, such as instance and static." (Damien Watkins et al, "Programming in the .NET Environment", 2002)

"A procedure associated with a Java class or interface." (Peter Gulutzan & Trudy Pelzer, "SQL Performance Tuning", 2002)

"A procedure that belongs to a class and can be executed by sending a message to a class object or to instances from the class." (Stephen G Kochan, "Programming in Objective-C", 2003)

"Java code is organized into methods that are named and declared to have specific input parameters and return types. All methods are members of a class." (Marcus Green & Bill Brogden, "Java 2™ Programmer Exam Cram™ 2 (Exam CX-310-035)", 2003)

"In the UML, the specific implementation or algorithm of an operation for a class. Informally, the software procedure that can be executed in response to a message." (Craig Larman, "Applying UML and Patterns", 2004)

"Operations on an object that are exposed for use by other objects or applications." (Bob Bryla, "Oracle Database Foundations", 2004)

"A named collection of statements, with or without arguments, and a return value. A member of a class." (Michael Fitzgerald, "Learning Ruby", 2007)

"A function that is associated exclusively with an instance, either defined in a class, trait, or object definition. Methods can only be invoked using the object.method syntax." (Dean Wampler & Alex Payne, "Programming Scala", 2009)

"A program module that acts on objects created from a class in an object-oriented program." (Jan L Harrington, "SQL Clearly Explained" 3rd Ed., 2010)

"(1) A piece of code provided by an object, such as a control, that a program can call to make the object do something. (2) A routine (that may or may not return a value) provided by a class." (Rod Stephens, "Start Here! Fundamentals of Microsoft® .NET Programming", 2011)

"A function that is defined by a class and can only be invoked in the context of the class or one of its instances." (Dean Wampler, "Functional Programming for Java Developers", 2011)

"A procedure implemented by a class." (Rod Stephens, "Stephens' Visual Basic® Programming 24-Hour Trainer", 2011)

"A procedure that belongs to a class and can be executed by sending a message to a class object or to instances from the class." (Stephen G Kochan, "Programming in Objective-C, 4th Ed.", 2011)

"In the object-oriented data model, a named set of instructions to perform an action. Methods represent realworld actions. Methods are invoked through messages." (Carlos Coronel et al, "Database Systems: Design, Implementation, and Management" 9th Ed, 2011)

"In object-oriented design and programming, a function bound to a class as part of its overall behavior, executed in response to a message." (DAMA International, "The DAMA Dictionary of Data Management", 2011)

"A kind of action that an object can take if you tell it to." (Jon Orwant et al, "Programming Perl, 4th Ed.", 2012)

"In object-oriented programming, a named code block that performs a task when called." (SQL Server 2012 Glossary, "Microsoft", 2012)

"Defined and repetitive approach used to broach particular types of problems." (Gilbert Raymond & Philippe Desfray, "Modeling Enterprise Architecture with TOGAF", 2014)

"A named algorithm that defines one aspect of the behavior of a class" (Nell Dale & John Lewis, "Computer Science Illuminated, 6th Ed.", 2015)

"In object-oriented programming, a piece of code that makes an object do something." (Rod Stephens, "Beginning Software Engineering", 2015)

"The object-oriented programming term for a function or procedure." (Daniel Leuck et al, "Learning Java" 5th Ed., 2020)