SQL Troubles: 🔭Data Science: Time Series (Just the Quotes)

21 November 2018

🔭Data Science: Time Series (Just the Quotes)

"No observations are absolutely trustworthy. In no field of observation can we entirely rule out the possibility that an observation is vitiated by a large measurement or execution error. If a reading is found to lie a very long way from its fellows in a series of replicate observations, there must be a suspicion that the deviation is caused by a blunder or gross error of some kind. [...] One sufficiently erroneous reading can wreck the whole of a statistical analysis, however many observations there are." (Francis J Anscombe, "Rejection of Outliers", Technometrics Vol. 2 (2), 1960)

"It is almost impossible to define 'time-sequence chart' in a clear and unambiguous manner because of the many forms and adaptations open to this type of chart. However. it might be said that, in essence, time-sequence chart portrays a chain of activities through time, indicates the type of activity in each link of the chain, shows clearly the position of the link in the total sequence chain, and indicates the duration of each activity. The time sequence chart may also contain verbal elements explaining when to begin an activity, how long to continue the activity, and a description of the activity. The chart may also indicate when to blend a given activity with another and the point at which a given activity is completed. The basic time-sequence chart may also be accompanied by verbal explanations and by secondary or contributory charts." (Cecil H Meyers, "Handbook of Basic Graphs: A modern approach", 1970)

"A time series is a sequence of observations, usually ordered in time, although in some cases the ordering may be according to another dimension. The feature of time series analysis which distinguishes it from other statistical analysis is the explicit recognition of the importance of the order in which the observations are made. While in many problems the observations are statistically independent, in time series successive observations may be dependent, and the dependence may depend on the positions in the sequence. The nature of a series and the structure of its generating process also may involve in other ways the sequence in which the observations are taken." (Theodore W Anderson, "The Statistical Analysis of Time Series", 1971)

"Entropy theory, on the other hand, is not concerned with the probability of succession in a series of items but with the overall distribution of kinds of items in a given arrangement." (Rudolf Arnheim, "Entropy and Art: An Essay on Disorder and Order", 1974)

"When the statistician looks at the outside world, he cannot, for example, rely on finding errors that are independently and identically distributed in approximately normal distributions. In particular, most economic and business data are collected serially and can be expected, therefore, to be heavily serially dependent. So is much of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using procedures such as standard regression analysis which assume independence, can lead to gross error. Furthermore, the possibility of contamination of the error distribution by outliers is always present and has recently received much attention. More generally, real data sets, especially if they are long, usually show inhomogeneity in the mean, the variance, or both, and it is not always possible to randomize." (George E P Box, "Some Problems of Statistics and Everyday Life", Journal of the American Statistical Association, Vol. 74 (365), 1979)

"An especially effective device for enhancing the explanatory power of time-series displays is to add spatial dimensions to the design of the graphic, so that the data are moving over space (in two or three dimensions) as well as over time. […] Occasionally graphics are belligerently multivariate, advertising the technique rather than the data." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The bar graph and the column graph are popular because they are simple and easy to read. These are the most versatile of the graph forms. They can be used to display time series, to display the relationship between two items, to make a comparison among several items, and to make a comparison between parts and the whole (total). They do not appear to be as 'statistical', which is an advantage to those people who have negative attitudes toward statistics. The column graph shows values over time, and the bar graph shows values at a point in time. bar graph compares different items as of a specific time (not over time)." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"The problem with time-series is that the simple passage of time is not a good explanatory variable: descriptive chronology is not causal explanation. There are occasional exceptions, especially when there is a clear mechanism that drives the Y-variable." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"The time-series plot is the most frequently used form of graphic design. With one dimension marching along to the regular rhythm of seconds, minutes, hours, days, weeks, months, years, centuries, or millennia, the natural ordering of the time scale gives this design a strength and efficiency of interpretation found in no other graphic arrangement." (Edward R Tufte, "The Visual Display of Quantitative Information", 1983)

"There are several uses for which the line graph is particularly relevant. One is for a series of data covering a long period of time. Another is for comparing several series on the same graph. A third is for emphasizing the movement of data rather than the amount of the data. It also can be used with two scales on the vertical axis, one on the right and another on the left, allowing different series to use different scales, and it can be used to present trends and forecasts." (Anker V Andersen, "Graphing Financial Information: How accountants can use graphs to communicate", 1983)

"A connected graph is appropriate when the time series is smooth, so that perceiving individual values is not important. A vertical line graph is appropriate when it is important to see individual values, when we need to see short-term fluctuations, and when the time series has a large number of values; the use of vertical lines allows us to pack the series tightly along the horizontal axis. The vertical line graph, however, usually works best when the vertical lines emanate from a horizontal line through the center of the data and when there are no long-term trends in the data." (William S Cleveland, "The Elements of Graphing Data", 1985)

"A time series is a special case of the broader dependent-independent variable category. Time is the independent variable. One important property of most time series is that for each time point of the data there is only a single value of the dependent variable; there are no repeat measurements. Furthermore, most time series are measured at equally-spaced or nearly equally-spaced points in time." (William S Cleveland, "The Elements of Graphing Data", 1985)

"This transition from uncertainty to near certainty when we observe long series of events, or large systems, is an essential theme in the study of chance." (David Ruelle, "Chance and Chaos", 1991)

"System dynamics models are not derived statistically from time-series data. Instead, they are statements about system structure and the policies that guide decisions. Models contain the assumptions being made about a system. A model is only as good as the expertise which lies behind its formulation. A good computer model is distinguished from a poor one by the degree to which it captures the essence of a system that it represents. Many other kinds of mathematical models are limited because they will not accept the multiple-feedback-loop and nonlinear nature of real systems." (Jay W Forrester, "Counterintuitive Behavior of Social Systems", 1995)

"Like modeling, which involves making a static one-time prediction based on current information, time-series prediction involves looking at current information and predicting what is going to happen. However, with time-series predictions, we typically are looking at what has happened for some period back through time and predicting for some point in the future. The temporal or time element makes time-series prediction both more difficult and more rewarding. Someone who can predict the future based on what has occurred in the past can clearly have tremendous advantages over someone who cannot." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Many of the basic functions performed by neural networks are mirrored by human abilities. These include making distinctions between items (classification), dividing similar things into groups (clustering), associating two or more things (associative memory), learning to predict outcomes based on examples (modeling), being able to predict into the future (time-series forecasting), and finally juggling multiple goals and coming up with a good-enough solution (constraint satisfaction)." (Joseph P Bigus,"Data Mining with Neural Networks: Solving business problems from application development to decision support", 1996)

"Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation." (Donald J Wheeler," Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"No comparison between two values can be global. A simple comparison between the current figure and some previous value and convey the behavior of any time series. […] While it is simple and easy to compare one number with another number, such comparisons are limited and weak. They are limited because of the amount of data used, and they are weak because both of the numbers are subject to the variation that is inevitably present in weak world data. Since both the current value and the earlier value are subject to this variation, it will always be difficult to determine just how much of the difference between the values is due to variation in the numbers, and how much, if any, of the difference is due to real changes in the process." (Donald J Wheeler, "Understanding Variation: The Key to Managing Chaos" 2nd Ed., 2000)

"Time-series forecasting is essentially a form of extrapolation in that it involves fitting a model to a set of data and then using that model outside the range of data to which it has been fitted. Extrapolation is rightly regarded with disfavour in other statistical areas, such as regression analysis. However, when forecasting the future of a time series, extrapolation is unavoidable." (Chris Chatfield, "Time-Series Forecasting" 2nd Ed, 2000)

"Comparing series visually can be misleading […]. Local variation is hidden when scaling the trends. We first need to make the series stationary (removing trend and/or seasonal components and/or differences in variability) and then compare changes over time. To do this, we log the series (to equalize variability) and difference each of them by subtracting last year’s value from this year’s value." (Leland Wilkinson, "The Grammar of Graphics" 2nd Ed., 2005)

"Prior to the discovery of the butterfly effect it was generally believed that small differences averaged out and were of no real significance. The butterfly effect showed that small things do matter. This has major implications for our notions of predictability, as over time these small differences can lead to quite unpredictable outcomes. For example, first of all, can we be sure that we are aware of all the small things that affect any given system or situation? Second, how do we know how these will affect the long-term outcome of the system or situation under study? The butterfly effect demonstrates the near impossibility of determining with any real degree of accuracy the long term outcomes of a series of events." (Elizabeth McMillan, Complexity, "Management and the Dynamics of Change: Challenges for practice", 2008)

"Regression toward the mean. That is, in any series of random events an extraordinary event is most likely to be followed, due purely to chance, by a more ordinary one." (Leonard Mlodinow, "The Drunkard’s Walk: How Randomness Rules Our Lives", 2008)

"A time-series plot (sometimes also called a time plot) is a simple graph of data collected over time that can be invaluable in identifying trends or patterns that might be of interest.A time-series plot can be constructed by thinking of the data set as a bivariate data set, where y is the variable observed and x is the time at which the observation was made. These (x, y) pairs are plotted as in a scatterplot. Consecutive observations are then connected by a line segment; this aids in spotting trends over time." (Roxy Peck et al, "Introduction to Statistics and Data Analysis" 4th Ed., 2012)

"Using random processes in our models allows economists to capture the variability of time series data, but it also poses challenges to model builders. As model builders, we must understand the uncertainty from two different perspectives. Consider first that of the econometrician, standing outside an economic model, who must assess its congruence with reality, inclusive of its random perturbations. An econometrician’s role is to choose among different parameters that together describe a family of possible models to best mimic measured real world time series and to test the implications of these models. I refer to this as outside uncertainty. Second, agents inside our model, be it consumers, entrepreneurs, or policy makers, must also confront uncertainty as they make decisions. I refer to this as inside uncertainty, as it pertains to the decision-makers within the model. What do these agents know? From what information can they learn? With how much confidence do they forecast the future? The modeler’s choice regarding insiders’ perspectives on an uncertain future can have significant consequences for each model’s equilibrium outcomes." (Lars P Hansen, "Uncertainty Outside and Inside Economic Models", [Nobel lecture] 2013)

"A key difference between a traditional statistical problems and a time series problem is that often, in time series, the errors are not independent." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"Either a logarithmic or a square-root transformation of the data would produce a new series more amenable to fit a simple trigonometric model. It is often the case that periodic time series have rounded minima and sharp-peaked maxima. In these cases, the square root or logarithmic transformation seems to work well most of the time." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"The random element in most data analysis is assumed to be white noise - normal errors independent of each other. In a time series, the errors are often linked so that independence cannot be assumed (the last examples). Modeling the nature of this dependence is the key to time series." (DeWayne R Derryberry, "Basic data analysis for time series with R", 2014)

"With time series though, there is absolutely no substitute for plotting. The pertinent pattern might end up being a sharp spike followed by a gentle taper down. Or, maybe there are weird plateaus. There could be noisy spikes that have to be filtered out. A good way to look at it is this: means and standard deviations are based on the naïve assumption that data follows pretty bell curves, but there is no corresponding 'default' assumption for time series data (at least, not one that works well with any frequency), so you always have to look at the data to get a sense of what’s normal. [...] Along the lines of figuring out what patterns to expect, when you are exploring time series data, it is immensely useful to be able to zoom in and out." (Field Cady, "The Data Science Handbook", 2017)

"[Making reasoned macro calls] starts with having the best and longest-time-series data you can find. You may have to take some risks in terms of the quality of data sources, but it amazes me how people are often more willing to act based on little or no data than to use data that is a challenge to assemble." (Robert J Shiller)

SQL Troubles

Pages

21 November 2018

🔭Data Science: Time Series (Just the Quotes)

No comments:

About Me