procedures and we sometimes use samples that are smaller than 30. Therefore, as of now, we
are not guaranteed to be safe. Without doing more or assuming some more, our procedures
might not be warranted when samples are small.
This is where the second version of the Assumption of Normality (caps again) comes in. By the
First Known Property of the Normal, if the population is normal to start with, then the means
from samples of any size will be normally distributed. In fact, when the population is normal,
even an N of 1 will produce a normal distribution (since you’re just reproducing the original
distribution). So, if we assume that our populations are normal, then we’re always safe when
making the parametric assumptions about the sampling distribution, regardless of sample size.
To prevent us from having to use one set of statistical procedures for large (30+) samples and
another set of procedures for smaller samples, the above is exactly what we do: we assume that
the population is normal. (This removes any reliance on the Monte Carlo simulations [which is
good, because simulations annoy people who always want proofs].) The one thing about this
that (rightfully) bothers some people is that we know -- from experience -- that many
characteristics of interest to psychologists are not normal. This leaves us with three options: 1.
Carry on regardless, banking on the idea that minor violations of the Assumption of Normality
(at the sample-means level) will not cause too much grief -- the fancy way of saying this is “we
capitalize of the robustness of the underlying statistical model,” but it really boils down to
looking away and whistling. 2. Remember that we only need a sample size as big as 30 to
guarantee normality if we started with the worst-case population distribution -- viz., an
exponential -- and psychological variables are rare this bad, so a sample size of only 10 or so will
probably be enough to “fix” the non-normalness of any psych data; in other words, with a little
background knowledge concerning the shape of your raw data, you can make a good guess as to
how big your samples need to be to be safe (and it never seems to be bigger than 10 and is
usually as small as 2, 3, or 4, so we’re probably always safe since nobody I know collects
samples this small). 3. Always test to see if you are notably violating the Assumption of
Normality (at the level of raw data) and do something to make the data normal (if they aren’t)
before running any inferential stats. The third approach is the one that I’ll show you (after one
brief digression).
Another Reason to Assume that the Population is Normal
Although this issue is seldom mentioned, there is another reason to expand the Assumption of
Normality such that it applies down at the level of the individual values in the population (as
opposed to only up at the level of the sample means). As hinted at in the previous chapter, the
mean and the standard deviation of the sample are used in very different ways. In point
estimation, the sample mean is used as a “best guess” for the population mean, while the sample
standard deviation (together with a few other things) is used to estimate how wrong you might
be. Only in the final step (when one calculates a confidence interval or a probability value), do
these two things come back into contact. Until this last step, the two are kept apart.
In order to see why this gives us another reason to assume that populations are normal, note the
following two points. First, it is assumed that any error in estimating the population mean is
independent of any error in estimating how wrong we might be. (If this assumption is not