It's arguable that we have here a bizarre situation, namely: many statistics texts have recommended
the Clopper-Pearson method for decades and all along at least one better method was already
available.
Roberto G. Gutierrez of StataCorp then followed up with this comment:
The exact interval used by -ci, binomial- is the Clopper-Pearson interval, but you must realize that
“exact” is a bit of a misnomer. It is exact in the sense that it uses the binomial distribution as the
basis of the calculation. However, the binomial distribution is a discrete distribution and as such its
cumulative probabilities will have discrete jumps, and thus you'll be hard pressed to get (say)
exactly 95% coverage.
What Clopper-Pearson does do is guarantee that the coverage is AT LEAST 95% (or whatever
level you specify) and so it is desirable in that sense. It is able to accomplish this goal by using the
exact binomial distribution in its calculations.
However, by guaranteeing 95% coverage, Clopper-Pearson can be a bit conservative (wide) for
some tastes, since for some n and p the true coverage can even get quite close to 100%. The other
intervals (Jeffrey's, Agresti, Wilson) offered by -ci- are an attempt to not be so conservative, but
yet still get the right coverage without the constraint of having to be at least the stated coverage
level. These new intervals were added (by popular demand) after the release of Stata 8, and so you
won't find them in the manual.
The definitive article covering all this, including definitions for Jeffrey's, Agresti, and Wilson, is
Brown, Cai, & DasGupta. Interval Estimation for a Binomial Proportion.
Statistical Science, 2001, 16, pp. 101-133.
Great article if you are into this sort of thing.
I don’t want to spend hours going over the pros and cons of these different formulas! For
our purposes, it probably won’t matter too much which formula you use. But if your life
depended on doing things as accurately as possible, it would be a good idea to check out
the Brown article. The help files for the above-mentioned
cij and ciw routines (which
you can install by using the
findit command in Stata) also contain brief discussions
and extensive sets of references.
3. Case III. Population normal, σ unknown:
)N* s/
t
( + x )N* s/
t
( - x
i.e. ),N* s/
t
( x
v/2,v/2,
v/2,
αα
α
µ
≤≤
±
Case III is probably the most common. Even when Case II technically holds, treating it as
though it were Case III often won’t matter too much so long as N is large.
Confidence Intervals - Page 4