"A histogram consists of the outline of bars of equal width and appropriate length next to each other. By connecting the frequency values at the position of the nominal values (the midpoints of the intervals) with straight lines, a frequency polygon is obtained. Attaching classes with frequency zero at either end makes the area (the integral) under the frequency polygon equal to that under the histogram.
"A valid digit is not necessarily a significant digit. The significance of numbers is a result of its scientific context.
"[myth:] Accuracy is more important than precision. For single best estimates, be it a mean value or a single data value, this question does not arise because in that case there is no difference between accuracy and precision. (Think of a single shot aimed at a target.) Generally, it is good practice to balance precision and accuracy. The actual requirements will differ from case to case.
"Any scientific data without (a stated) uncertainty is of no avail. Therefore the analysis and description of uncertainty are almost as important as those of the data value itself . It should be clear that the uncertainty itself also has an uncertainty – due to its nature as a scientific quantity – and so on. The uncertainty of an uncertainty is generally not determined.
"As uncertainties of scientific data values are nearly as important as the data values themselves, it is usually not acceptable that a best estimate is only accompanied by an estimated uncertainty. Therefore, only the size of nondominant uncertainties should be estimated. For estimating the size of a nondominant uncertainty we need to find its upper limit, i.e., we want to be as sure as possible that the uncertainty does not exceed a certain value.
"Before best estimates are extracted from data sets by way of a regression analysis, the uncertainties of the individual data values must be determined.In this case care must be taken to recognize which uncertainty components are common to all the values, i.e., those that are correlated (systematic).
"Before discarding a data point one should investigate the possible reasons for this faulty data value.
"Correlation analysis can help us find the size of the formal relation between two properties. An equidirectional variation is present if we observe high values of one variable together with high values of the other variable (or low ones combined with low ones). In this case there is a positive correlation. If high values are combined with low values and low values with high values, the variation is counterdirectional, and the correlation is negative.
"[myth:] Counting can be done without error. Usually, the counted number is an integer and therefore without (rounding) error. However, the best estimate of a scientifically relevant value obtained by counting will always have an error. These errors can be very small in cases of consecutive counting, in particular of regular events, e.g., when measuring frequencies.
"Due to the theory that underlies uncertainties an infinite
number of data values would be necessary to determine the true value of any
quantity. In reality the number of available data values will be relatively
small and thus this requirement can never be fully met; all one can get is the
best estimate of the true value."
"For linear dependences the main information usually lies in the slope. It is obvious that those points that lie far apart have the strongest influence on the slope if all points have the same uncertainty. In this context we speak of the strong leverage of distant points; when determining the parameter “slope” these distant points carry more effective weight. Naturally, this weight is distinct from the “statistical” weight usually used in regression analysis.
"For some scientific data the true value cannot be given by a constant or some straightforward mathematical function but by a probability distribution or an expectation value. Such data are called probabilistic. Even so, their true value does not change with time or place, making them distinctly different from most statistical data of everyday life.
"If there is an outlier there are two possibilities: The model is wrong– after all, a theory is the basis on which we decide whether a data point is an outlier (an unexpected value) or not. The value of the data point is wrong because of a failure of the apparatus or a human mistake. There is a third possibility, though: The data point might not be an actual outlier, but part of a (legitimate) statistical fluctuation.
"In error analysis the so-called 'chi-squared' is a measure of the agreement between the uncorrelated internal and the external uncertainties of a measured functional relation. The simplest such relation would be time independence. Theory of the chi-squared requires that the uncertainties be normally distributed. Nevertheless, it was found that the test can be applied to most probability distributions encountered in practice.
"In many cases systematic errors are interpreted as the systematic difference between nature (which is being questioned by the experimenter in his experiment) and the model (which is used to describe nature). If the model used is not good enough, but the measurement result is interpreted using this model, the final result (the interpretation) will be wrong because it is biased, i.e., it has a systematic deviation (not uncertainty). If we do not use the best model (the best theory) available for the description of a certain phenomenon this procedure is just wrong. It has nothing to do with an uncertainty.
"In science we try to explain reality by using models (theories). This is necessary because reality itself is too complex. So we need to come up with a model for that aspect of reality we want to understand – usually with the help of mathematics. Of course, these models or theories can only be simplifications of that part of reality we are looking at. A model can never be a perfect description of reality, and there can never be a part of reality perfectly mirroring a model."
"It is also inevitable for any model or theory to have an
uncertainty (a difference between model and reality). Such uncertainties apply
both to the numerical parameters of the model and to the inadequacy of the
model as well. Because it is much harder to get a grip on these types of
uncertainties, they are disregarded, usually.
"It is important that uncertainty components that are independent of each other are added quadratically. This is also true for correlated uncertainty components, provided they are independent of each other, i.e., as long as there is no correlation between the components.
"It is important to pay heed to the following detail: a disadvantage of logarithmic diagrams is that a graphical integration is not possible, i.e., the area under the curve (the integral) is of no relevance.
"It is the aim of all data analysis that a result is given in form of the best estimate of the true value. Only in simple cases is it possible to use the data value itself as result and thus as best estimate.
"It is the nature of an uncertainty that it is not known and
can never be known, whether the best estimate is greater or less than the true
value.
"Outliers or flyers are those data points in a set that do
not quite fit within the rest of the data, that agree with the model in use.
The uncertainty of such an outlier is seemingly too small. The discrepancy
between outliers and the model should be subject to thorough examination and
should be given much thought. Isolated data points, i.e., data points that are
at some distance from the bulk of the data are not outliers if their values are
in agreement with the model in use.
"[myth:] Random errors can always be determined by repeating measurements under identical conditions. […] this statement is true only for time-related random errors .
"[myth:] Systematic errors can be determined inductively. It should be quite obvious that it is not possible to determine the scale error from the pattern of data values.
"The fact that the same uncertainty (e.g., scale uncertainty) is uncorrelated if we are dealing with only one measurement, but correlated (i.e., systematic) if we look at more than one measurement using the same instrument shows that both types of uncertainties are of the same nature. Of course, an uncertainty keeps its characteristics (e.g., Poisson distributed), independent of the fact whether it occurs only once or more often.
"To fulfill the requirements of the theory underlying
uncertainties, variables with random uncertainties must be independent of each
other and identically distributed. In the limiting case of an infinite number
of such variables, these are called normally distributed. However, one usually
speaks of normally distributed variables even if their number is finite.
No comments:
Post a Comment