"A feature shared by both the range and the interquartile range is that they are each calculated on the basis of just two values - the range uses the maximum and the minimum values, while the IQR uses the two quartiles. The standard deviation, on the other hand, has the distinction of using, directly, every value in the set as part of its calculation. In terms of representativeness, this is a great strength. But the chief drawback of the standard deviation is that, conceptually, it is harder to grasp than other more intuitive measures of spread.
"[…] an outlier is an observation that lies an 'abnormal' distance from other values in a batch of data. There are two possible explanations for the occurrence of an outlier. One is that this happens to be a rare but valid data item that is either extremely large or extremely small. The other is that it isa mistake – maybe due to a measuring or recording error.
"Cleverly drawn pictures can sometimes disguise or render invisible what is there. At other times, they can make you see things that are not really there. It is helpful to be aware of how these illusions are achieved, as some of the illusionist’s 'tricks of the trade' can also be found in distortions used in graphs and diagrams.
"People sometimes appeal to the 'law of averages' to justify their faith in the gambler’s fallacy. They may reason that, since all outcomes are equally likely, in the long run they will come out roughly equal in frequency. However, the next throw is very much in the short run and the coin, die or roulette wheel has no memory of what went before.
"People tend to give greater weight to the data that they have just been exposed to than other relevant data. […] This phenomenon, where people give greater attention to recent or easily available data, is often referred to as an availability error.
"Probability is about making decisions under uncertainty - indeed, where there is no uncertainty, no decision is required, as you would simply choose the outcome that you know will occur. A 'good' or 'rational' decision favours the Cartesian principle that ‘when it is not in our power to follow what is true, we ought to follow what is most probable’. Of course, rational decisions sometimes turn out to be wrong. That does not mean that the decisions were bad - they may have been the best choices, given the information available at the time. […] In the long run, the vagaries of chance tend to even out, but in particular cases it can happen that the long shot comes in first. This is the corollary of a 'good' decision that has bad consequences - a 'bad' or 'irrational' decision that turns out to be right.
"Random number generators do not always need to be symmetrical. This misconception of assuming equal likelihood for each outcome is fostered in a restricted learning environment, where learners see only such situations (that is, dice, coins and spinners). It is therefore very important for learners to be aware of situations where the different outcomes are not equally likely (as with the drawing-pins example).
"'Regression to the mean' describes a natural phenomenon whereby, after a short period of success, things tend to return to normal immediately afterwards. This notion applies particularly to random events." (Alan Graham, "Developing Thinking in Statistics", 2006)
"The notion of outcomes covering a space is a very useful mental image, as it ties in strongly with the use of Venn diagrams and tables for clarifying the nature of possible events resulting from a trial. There are two important aspects to this. First, when enumerating the various outcomes that comprise an event, the number of (equally. likely) outcomes should correspond, visually, with the area of that part of the diagram represented by the event in question - the greater the probability, the larger the area. Secondly, where events overlap (for example, when rolling a die, consider the two events 'getting an even score' and 'getting a score greater than 2' ), the various regions in the Venn diagram help to clarify the various combinations of events that might occur.
"Unlike in mathematics, where relationships tend to be clearly defined and unambiguous, statistical relationships tend to reflect the general messiness of the real world from which the data were drawn.
"Use of a histogram should be strictly reserved for continuous numerical data or for data that can be effectively modelled as continuous […]. Unlike bar charts, therefore, the bars of a histogram corresponding to adjacent intervals should not have gaps between them, for obvious reasons.
"What sets statistics apart from the rest of mathematics is that in statistics events occur under conditions of uncertainty. Whereas in pure mathematics all even numbers possess the property of evenness, a statistical variable may take a range of different values that are usually unpredictable in advance.
"When it comes to drawing a picture of continuous data, you need to think through carefully where one interval ends and the next one begins. Failing to do this can result in overlaps or gaps between adjacent intervals, which can cause confusion.
"Where correlation exists, it is tempting to assume that one of the factors has caused the changes in the other (that is, that there is a cause-and-effect relationship between them). Although this may be true, often it is not. When an unwarranted or incorrect assumption is made about cause and effect, this is referred to as spurious correlation […]
"Whereas regression is about attempting to specify the underlying relationship that summarises a set of paired data, correlation is about assessing the strength of that relationship. Where there is a very close match between the scatter of points and the regression line, correlation is said to be 'strong' or 'high' . Where the points are widely scattered, the correlation is said to be 'weak' or 'low'.
No comments:
Post a Comment