AJUR_Vol_20_Issue_4_March

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 3

On Sample Size Needed for Block Bootstrap Conﬁdence Intervals to

Have Desired Coverage Rates

Mathew Chandy*, Elizabeth D. Schifano, & Jun Yan

Department of Statistics, University of Connecticut, Storrs, CT

https://doi.org/10.33697/ajur.2024.101

Student: mathew[email protected]*

Mentors: elizabeth.schif[email protected], [email protected]

ABSTRACT

Block bootstrap is widely used in constructing conﬁdence intervals for parameters estimated from stationary time se-

ries. Theoretically, the method should provide valid conﬁdence intervals as the length of the time series goes to inﬁnity.

In practice, however, it is necessary to know how large of a ﬁnite sample is required for block bootstrap conﬁdence

intervals to work well. This study aims to answer this question in a simple simulation setting where the data are gen-

erated from a ﬁrst-order autoregressive process. The empirical coverage rates of several commonly used bootstrap con-

ﬁdence intervals for the mean, standard deviation, and the lag-1 autocorrelation coeﬃcient are compared. A quite large

sample is found necessary for the intervals to have the right coverage rates even when estimating a simple parameter

like the mean. Some block bootstrap methods could fail when estimating the lag-1 autocorrelation. It is surprising that

the coverage property even deteriorates as the sample size increases with some commonly used block bootstrap conﬁ-

dence intervals including the percentile intervals and bias-corrected intervals.

KEYWORDS

Autocorrelation; Bias-Correction; Centering; Dependent Data; Percentile; Resampling; Simulation; Time Series

INTRODUCTION

Block bootstrap is a tool to construct conﬁdence intervals (CI) to make inferences about dependent data. Essentially, it

depends on correct estimation of the uncertainty in the estimation, similar to the standard bootstrap

, but for serially

dependent data. Early ideas of block bootstrap were developed not long after the standard bootstrap.

2–4

It has since

been applied in various ﬁelds, for instance, econometrics and meteorology.

5, 6

Block bootstrap is especially useful for

serially dependent data when the serial dependence is not speciﬁed or not of primary interest. The method is expected

to produce CIs with coverage rates matching their nominal levels as the sample size grows.

However, when dealing

with ﬁnite sample sizes, an important question is how large the sample size must be for block bootstrap CIs to have

the desired coverage rates.

Lahiri

ﬁnds that moving block bootstrap has better performance than non-overlapping block bootstrap. Additionally,

moving block bootstrap with nonrandom block sizes results in lower mean-squared errors than moving block bootstrap

with random block sizes. Buhlmann and Künsch

notes that a drawback of block bootstrap is that it heavily depends

on block size, which has to be chosen by the user of the method. Even when using the appropriate settings, as noted

by Buhlmann

observes some general drawbacks of block bootstrap — with respect to how reasonably it imitates the

data-generating process. In addition, although block bootstrap is primarily used for stationary time series, it can be out-

performed by other bootstrap schemes for linear time series and categorical processes. Still, Buhlmann

emphasizes

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 

that a signiﬁcant advantage of block bootstrap is its simplicity. To be more speciﬁc, the resampling step of block boot-

strap is not computationally more diﬃcult than the resampling step of basic bootstrap. Furthermore, block bootstrap

performs better than local bootstrap in terms of mimicking dependence structures.

For independent data, extensive research has explored the eﬀectiveness of bootstrap standard errors in providing accu-

rate uncertainty measures. For example, Hesterberg

observes that while percentile-based CIs for the mean parameter

are more accurate than t-intervals for larger sample sizes, their accuracy diminishes for smaller sample sizes. The op-

timal parameter estimation of a distribution, according to Chernick and Labudde

, depends on the sample size, the

number of bootstrap replicates, and the conﬁdence level. In structural equation modeling, Nevitt and Hancock

ﬁnd

that a sample size of 200–1000 is suﬃcient for interval estimation using standard nonparametric bootstrap. In estimat-

ing variance components, Burch

reports that as the sample size increases under a normal distribution, nonparametric

bootstrap methods approach the coverage of a pivotal quantity, but for other distributions, the coverage can deteriorate.

In estimating the correlation coeﬃcient of bivariate normal data, Puth et al.

note that even for a sample size of 100

with true correlation coeﬃcient 0, bootstrap methods are less accurate than the Fisher’s transformation. The prevailing

consensus highlights the necessity of a substantial sample size for bootstrap CIs to attain the desired coverage.

Limited research has oﬀered practical guidance concerning the requisite sample size for employing block bootstrap in-

ference with dependent data. In the context of linear regression involving dependent data, where regression errors stem

from a homoscedastic autoregressive process of order-1, the investigation conducted by Goncalves and White

reveals

that, in cases of small sample sizes, standard error estimates derived from the moving block bootstrap approach may

demonstrate greater accuracy than those based on closed-form asymptotic estimates. Nonetheless, even when consider-

ing a substantial sample size of 1024, conﬁdence intervals generated through the moving block bootstrap method still

fail to adequately encompass the target parameter. The scarcity of existing literature addressing the necessary sample

sizes conducive to the eﬃcacy of block bootstrap techniques has spurred the initiation of the present study.

The goal of this paper is to provide recommendations on necessary sample size for block bootstrap with dependent

data, similar to what was done for basic bootstrap in Hesterberg

. We consider a simple situation of a stationary time

series, where the parameters of interests are the mean, standard deviation, and the ﬁrst-order autocorrelation coeﬃcient.

We compare six variants of block bootstrap CIs from the literature:

17, 18

a standard normal CI, a Student’s t CI, a per-

centile CI, a bias-corrected CI, a bias-corrected and accelerated CI, and a recentered percentile CI proposed in this arti-

cle. Their empirical coverage rates at diﬀerent sample sizes and dependence levels are compared in a simulation study.

The results of this study suggest that recovery of temporal dependence parameters is reliant on the type of interval

used.

The remainder of the paper is organized as follows. The ﬁrst section reviews block bootstrap procedures and how to

use block bootstrap estimates to construct CIs; a simple CI obtained by recentering at the original point estimate is

proposed for comparison. The second section reports a simulation study comparing the coverage rates of six block

bootstrap CIs. A discussion concludes in the ﬁnal section.

BLOCK BOOTSTRAP CIS

Consider a stationary time series {X

: t =1,...,n} with length n. Our goal is to construct a CI for a parameter θ

in the data generating model of the series. Suppose that

is a point estimator of θ based on the observed series. Boot-

strap is a powerful approach to construct CIs. If the observations in the series were independent, a standard nonpara-

metric bootstrap procedure would draw a large number B bootstrap copies of the observed data, and calculate a boot-

strap point estimate

(b)

for each copy b =1,...,B. The uncertainty of

is then estimated by the empirical uncer-

tainty of the bootstrap point estimates. When serial dependence is present, the bootstrap procedure needs to preserve

the serial dependence. Block bootstrap was motivated for this situation.

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 5

Block Bootstrap

Block bootstrap preserves the serial dependence in the observed data by partitioning the data into blocks and perform-

ing bootstrap on the blocks. In particular, consider block size l and, for convenience, suppose that n is a multiple of l

such that there are k = n/l blocks. Each block j is Y

= {X

(j−1)l+1

,...,X

(j−1)l+l

}, j =1,...,k. Then, we sample

k blocks of Y

’s from the set {Y

,...,Y

} with replacement and concatenate the k sampled blocks in the order they

are picked to form a bootstrap sample of the data. The formation of the bootstrap sample ensures that the between-

block dependence is weak and that the within-block serial dependence is preserved. Because the blocks here are non-

overlapping, this bootstrap approach is known as non-overlapping block bootstrap, or simple block bootstrap.

Alternatively, block-bootstrap can be done with overlapping or moving blocks. Deﬁne moving blocks

= {X

,...,X

j+l−1

},j=1,...,n− l +1.

Now we draw k blocks from the (n − l +1)blocks of Z

’s with replacement and then align them in the order they

were picked to form a block bootstrap sample. If n is not a multiple of l, the last block selected will be reduced in size

so that the ﬁnal size of the block bootstrap sample is n. It is also possible to implement moving block bootstrap while

allowing blocks to wrap around the end of the series. In other words, deﬁne moving blocks (assuming l>1) as:



,...,X

j+l−1

}, if j =1,...,n− l +1,

,...,X

j−n+l−1

}, if j = n − l +2,...,n.

This version does not require that n/l be an integer.

The block size l needs to be chosen with care. It should be large enough for each bootstrap sample to preserve the se-

rial dependence, yet small enough for there to be a large number of blocks to give suﬃcient variability between each

bootstrap sample. As n increases, both l and n/l should also increase. To achieve this, the order of l is often assigned

a value as a function of n. A common expression that is considered optimal for the order of l is n

1/3

,

which was

adopted in this study.

Block Bootstrap CIs

Suppose that we have repeated the steps in the last subsection B times, and that for b ∈{1,...,B}, we have obtained

a bootstrap point estimate

(b)

based on the bth bootstrap sample using the same method that was applied to {X

t =1,...,n} to obtain

. Now the question is how to construct a CI for θ using the B bootstrap point estimates

{

(1)

,...,

(B)

}. We consider six kinds of block bootstrap CIs adapted from standard bootstrap CIs.

Standard Normal CI Assuming that

is asymptotically normally distributed with θ as the mean, we just need an

estimate of the standard error to construct an approximate CI.

Let



SE be the empirical standard error of the boot-

strap point estimates

(b)

for b ∈{1,...,B}.Letz

(α)

be the quantile function F

−1

(α) of the standard normal distribu-

tion. A (1 − α)100% standard normal CI is

(

− z

(1−α/2)



SE,

− z

(α/2)



SE).

This CI is centered by the original point estimate

and is symmetric. The standard CI is classiﬁed by Efron and Tib-

shirani

as a conﬁdence interval based on bootstrap “tables", which essentially means it is based on an asymptotic dis-

tribution with an estimated asymptotic variance (standard error). Its validity relies on whether the distribution of

is reasonably well approximated by its asymptotic normal distribution and whether the bootstrap



SE approximates the

true standard error.

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 6

Student’s t CI The procedure for constructing a Student’s t CI based on standard bootstrap is described in Efron and

Tibshirani

.Lett

(α,k)

be the quantile function F

−1

(α, k) of a t distribution with k degrees of freedom. With block

bootstrapping, a (1 − α)100% Student’s t CI is

(

− t

(1−α/2),k−1



SE,

− t

(α/2),k−1



SE),

where k is the number of blocks. This CI is centered by the original point estimate

and is symmetric. Like the stan-

dard normal interval, the Student’s t CI is classiﬁed by Efron and Tibshirani

as a conﬁdence interval based on boot-

strap “tables". In this case, its validity relies on whether the distribution of

is reasonably well approximated by the

k−1

distribution with an expected value of θ and whether the bootstrap



SE approximates the true standard error.

Percentile CI The percentile CI was ﬁrst suggested in Efron

.Let

n,α

be the empirical 100αth percentile of {

(1)

,...,

(

A (1 − α)100% empirical percentile CI is

(

n,α/2

n,1−α/2

This CI is not necessarily centered by the original point estimate

. As will be shown in our simulation study, this

approach works well for the marginal mean and standard deviation of a serially dependent process, but its coverage of

the temporal dependence deteriorates as n increases, which is contrary to what one would expect.

Bias-Corrected (BC) CI The procedure for constructing a bias-corrected Bootstrap CI based on standard bootstrap is

described in Carpenter and Bithell

.Letˆz

=Φ

−1

{#{

(b)

}/B} for b ∈{1,...,B}. Deﬁne α

=Φ(2ˆz

−

1−α/2

) and α

=Φ(2ˆz

− z

α/2

).A(1 − α)100% BC CI is

(

n,α

Bias-Corrected and Accelerated (BCA) CI The BCA CI was ﬁrst suggested in Efron

.LetZ

(i)

be the original sam-

ple without the ith block z

for i ∈{1,...,k},let

(i)

be the statistic of Z

(i)

, and let

(.)

= k

−1



i=1

(i)

.Let

ˆa =



i=1

(

(.)

−

(i)

)



i=1

(

(.)

−

(i)

)

}

3/2

Deﬁne

=Φ



ˆz

+ z

α/2

1 − ˆa(ˆz

+ z

α/2

)



and

=Φ



ˆz

+ z

1−α/2

1 − ˆa(ˆz

+ z

1−α/2

)



A (1 − α)100% BCA CI is

(

n,α

This CI is not necessarily centered by

. The BCA method corrects for bias and skewness of the B bootstrap point

estimates {

(1)

,...,

(B)

} by including bias-correction and acceleration factors. The acceleration factor refers to the rate

of change of the standard error of

with respect to θ.

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 7

Recentered Percentile CI We propose a CI that is centered at the original point estimate and uses the variation in

the bootstrap estimates to construct the error bound. The motivation behind proposing such an interval was based on

the simulation performance of the BC and BCA intervals, which will be discussed further in the Results section. This

interval requires the computation of

= n

−1



b=1

(b)

, the mean of all bootstrap point estimates. A (1 − α)100% CI

is centered around

and can be written as

(

n,α/2

−

n,1−α/2

−

It is not necessarily symmetric, as diﬀerent critical values are used to compute the lower and upper bounds. It has the

same width as the percentile CI.

SIMULATION DESIGN

We compared the performance of the diﬀerent block bootstrap CI methods under two marginal distributions: standard

normal and unit exponential.

Marginal Standard Normal Distribution

We generated time series X

from a 1st order autoregressive (AR(1)) process:

= φX

t−1

+ 

where φ is an autoregressive coeﬃcient, and 

is a series of independent errors from a normal distribution with mean

zero and variance σ



. The strength of the serial dependence is controlled by φ, which was set to ﬁve levels: {−0.4, −0.2,

0.0, 0.2, 0.4}. We only used serial dependences as strong as 0.4, because we only seek to establish the general trend as

the strength of the autocorrelation increases, and how it varies depending on the sign of the autocorrelation and the

parameter of interest. The series X

has mean zero and variance σ

= σ



/(1 − φ

), so for each value of φ,weset



=(1− φ

) such that σ

=1.

Three target parameters of X

were considered: 1) μ =0, the mean of X

;2)σ

=1, the standard deviation of X

; and

3) φ, the lag-1 autocorrelation coeﬃcient. To investigate the eﬀect of sample size n, we considered an array of values

n ∈{100, 200, 400, 800, 1600, 3200}. In each conﬁguration, we generated 10,000 replicates. The block bootstrap sam-

pling step was done with function tsboot from R

package boot,

with block size n/l. This function by default

is an implementation of moving block bootstrap as described in the previous section, meaning that that blocks are al-

lowed to wrap around, and we tried both l = n

1/3

 and l = 2n

1/3

, keeping the order of the block size constant but

varying the coeﬃcient. For each replicate, we constructed six 95% block bootstrap CIs for each parameter as described

in the last section with B = 1000. We can estimate μ, σ

, and φ by computing the sample mean, sample standard de-

viation, and lag-1 autocorrelation, respectively, of each bootstrap sample. Then we can construct intervals for each pa-

rameter using the appropriate procedures described in Block Bootstrap CIs. Then we estimated their actual coverage rates

along with their 95% conﬁdence intervals from the 10,000 replicates.

The coverage rates of the CIs were used to compare the performance of CIs. Let

L,r

and

U,r

be the lower and upper

bound, respectively, for the conﬁdence interval constructed for each replicate r ∈{1,...,R}, where R is the number

of replicates. Then the empirical coverage rate is



r=1

L,r

<θ<

U,r

}/R, where I(·) in the indicator function.

If a CI method is valid, then the coverage rate is expected to match the nominal level. Because it is unlikely for the

coverage to exactly match the nominal level, we can construct a 95% Clopper-Pearson exact CI of the coverage rate,

which is an estimate of a proportion with R =10, 000. We used the R PropCIs package to achieve this.

The choice of

Clopper-Pearson was motivated by the Wald interval’s poor coverage as the proportion approaches 0 or 1,

although

when we tried Wald intervals, the coverage rate intervals did not appear to have large diﬀerences. If the proportion 0.95

is included in the interval, the block bootstrap method is likely performing well. If all values in the interval are below

0.95, the results would suggest that the method either is providing inaccurate estimation, is underestimating the process’

variability, or a combination of both. If all values in the interval are above .95, the results suggest that the method is

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 8

overestimating the process’ variability. Figure 1 summarizes the empirical coverage rates and the 95% conﬁdence inter-

vals of the real coverage for a marginal standard normal distribution using block bootstrap with l = n

1/3

, generated

using the R ggplot2 package.

Marginal Unit Exponential Distribution

Additionally, to investigate if the results are robust to nonnormal marginal distributions, we evaluated the performance

of block bootstrap for a time series with a non-normal marginal distribution. Speciﬁcally, we estimated the mean, stan-

dard deviation, and the lag-1 autocorrelation coeﬃcient of a stationary series with marginal unit exponential distribu-

tion. Note that we expect the CIs that are based on bootstrap “tables" to depend on the asymptotic distribution of the

estimator. This asymptotic distribution depends more on the sample size than on the marginal distribution of the time

series. So we expect such CIs to have similar performance for diﬀerent marginal distributions when the sample sizes is

large. The percentile-based CIs (Percentile, BC, BCA, Recentered Percentile) are not necessarily expected to perform

better under non-normal marginal distributions. The student’s t and and normal-based CIs are only noticeably diﬀerent

when the number of blocks is smaller than 20.

The series were generated by marginally transforming the AR(1) series X

in the ﬁrst simulation study by

= F

−1

[Φ(X

)],

where F

−1

(p) is the quantile function for the unit exponential distribution. The true mean (μ) and standard deviation

(σ

) parameters of W

are 1. The lag-1 autocorrelation coeﬃcient (ρ) is not invariant to the transformation,

but its

value can be obtained by



∞

−∞



∞

−∞

−1

[Φ(x)]F

−1

[Φ(y)]g

(x, y; φ)dxdy − 1,

where g

(x, y; φ) is the density of a standard bivariate normal distribution with correlation parameter φ. We kept the

conﬁguration of φ ∈{−0.4, −0.2, 0.0, 0.2, 0.4}, and the corresponding lag-1 autocorrelation coeﬃcients are ρ ∈{−0.298,

−0.156, 0, 0.170, 0.355}.

SIMULATION RESULTS

Marginal Standard Normal Distribution

For estimating the mean parameter μ, the top panel of Figure 1 suggests that all methods eventually approach correct

coverage of μ as sample size increases. Student’s t CIs appear to need the smallest sample size to achieve correct cover-

age, except for samples with strong negative dependence, in which case, they actually over-cover μ for smaller sample

sizes. For instance, for a sample with n = 100 and φ = −0.4, the lower bound for a Students t CI’s coverage of μ

is greater than 95%, whereas the coverage intervals for other methods contain 95%. The standard normal, percentile,

BC, and BCA, and recentered percentile CIs require similar sample sizes to recover μ at the nominal level for all com-

binations of n and φ. All methods seem to require a smaller sample to recover μ at the nominal rate when dealing with

negative dependence versus positive dependence. For example, BC CIs recover μ for n ≥ 100 when

φ = −0.2, but

they only recover μ for n ≥ 800 when φ =0.2. In addition, as a negative dependence gets stronger, holding every-

thing else equal, coverage increases, which lead to the Student t CI’s aforementioned over-coverage. As a positive de-

pendence gets stronger, holding everything else equal, coverage decreases, and a larger sample is necessary to recover μ.

A possible explanation for this is that if a stationary series has a positive autocorrelation, the eﬀective sample size is

decreased, whereas if a series has a negative autocorrelation, the eﬀective sample size is increased.

Additionally, this

seems to have a greater eﬀect on the the estimation of the location parameter versus that of the scale parameter or tem-

poral dependence parameter.

For estimating the standard deviation parameter σ

, Figure 1 suggests that every method can reach nominal coverage

of σ

if the sample is large enough, but for a given n and φ, coverage of σ

will be lower than coverage of μ in gen-

eral. Like μ, σ

can be covered by Student t CIs with smaller sample sizes when compared to other methods. Unlike

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 9

Figure 1. Empirical coverage rates of diﬀerent 95% block bootstrap CIs for the marginal mean μ, the marginal standard deviation σ

, and the ﬁrst-order

autocorrelation coeﬃcient φ of an AR(1) process with a marginal standard normal distribution with AR coeﬃcient φ ∈{−0.4, 0.2, 0, 0.2, 0.4} and

series length n ∈{100, 200, 400, 800, 1600, 3200} based on 10,000 replicates of block bootstrap with l = n

1/3

. The error bars represent 95% CIs of

the real covera

e rates. To

: l =



1/3



. Bottom: l =



1/3



American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 10

Figure 2. 50 replicate percentile, BC, and recentered percentile CIs for samples of size n ∈{800, 1600} (l = n

1/3

) for the lag-1 autocorrelation of an

AR(1) process with φ =0.4. For each replicate, the lower and upper bounds of the CIs are displayed, as well as

(blue circle) and

(red cross).

μ, there is no over-coverage issue for σ

when φ = −0.4. Standard normal, percentile, BC, BCA, and recentered per-

centile CIs again have similar performance. All methods seem to have slightly higher coverage of σ

when φ is negative

versus when φ is positive. Regardless of the sign, coverage of σ

gets worse as the strength of the temporal dependence

increases.

For estimating the autocorrelation parameter φ, Figure 1 suggests that while standard normal, Student’s t, and recen-

tered percentile CIs do approach correct coverage as sample size increases, percentile, BC, and BCA CIs deteriorate as

sample size increases, especially as the strength of the temporal dependence increases. Because of this, only standard

normal, Student’s t, and recentered percentile CIs should be considered as eﬀective block bootstrap methods to estimate

φ. Student’s t CIs once again can achieve correct coverage with smaller sample sizes when compared to standard and

recentered percentile CIs which perform similarly. Student’s t CIs can can recover φ at the nominal level for n ≥ 100

when the sample’s temporal dependence is as strong as 0.4. Coverage appears to be higher for all methods when the de-

pendence is negative rather than positive. Whether or not the dependence is negative or positive, coverage of φ seems

to increase slightly as the absolute value increases for standard normal, Student’s t, and recentered percentile CIs. For

the values of φ observed, there are no examples of over-coverage for standard normal, Student’s t, BC, BCA, or recen-

tered percentile CIs. However, percentile CIs appear to over-cover φ for smaller sample sizes when φ = −0.2 and when

φ =0, indicating again that they should not be used.

The outcomes of the φ estimation raise a natural question about the lackluster performance of certain methodologies.

To delve into this inquiry, a set of 50 CIs was generated for each of the percentile, BC, and recentered percentile ap-

proaches for samples of n ∈{800, 1600}. Illustrated in Figure 2, it becomes evident that the percentile-based CIs ex-

hibit a notable bias, predominantly manifesting as a substantial underestimation of φ with point estimator

, that is,

the average of B bootstrap point estimates. As the sample size increases from 800 to 1600, this bias does not vanish

while the uncertainty reduces, which explains why the coverage rates deteriorate. The bias in

also appears to inval-

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 11

idate the bias-correction in the BC bootstrap, leading to the poor performance of the BC intervals. The BCA intervals

have the same problem as the BC intervals in the bias-correction step. The root of the issue appears to be that the au-

tocorrelation in the block bootstrap samples is somehow smaller compared to that in the original sample. On the other

hand, the original point estimator

is asymptotically unbiased. Since the width is based on the uncertainty in the

bootstrap point estimates

(b)

, b =1,...,B, the percentile CIs recentered at the original point estimate

provide

desired coverage.

To summarize, the performance of the CIs depends on the target parameter. When estimating μ and σ

, any CI will

do, although Student’s t CIs perform noticeably better than the others. However, when estimating φ, the choice of

method is of utmost importance as to avoid coverage deterioration. Coverage rates are acceptable at smaller sample

sizes when phi is positive versus when phi is negative. In other words, a larger sample size is generally required to es-

timate a parameter for a sample with a negative φ versus a positive φ of the same magnitude. In order to know if cover-

age will increase as the strength of the temporal dependence increases, one need to know what the parameter of interest

is, and in the case of μ, the direction of the serial dependence. The BC approach does not seem to be correcting bias

appropriately when estimating φ. Like the percentile method, the recentered percentile method uses the spread from

the bootstrap to construct the width of the CI. However, the recentered approach, does not correct from the original

point estimate

The results for l = 2n

1/3

 are reported in the bottom panel of Figure 1. The performances generally seems to be

inferior compared those with l = n

1/3

, but importantly, the patterns in performance when varying other parameters

appear to be robust to the diﬀerent block size. For negative autocorrelations, the coverage rates of μ appear to be lower

when using l = 2n

1/3

. For example, whereas n = 100 or 200 would seem suﬃcient for most CIs when using l =

n

1/3

, n = 800 or 1600 is necessary to capture negative autocorrelations for l = n

1/3

. Student’s t CIs do not seem

to be as aﬀected by this change in l:forφ = −0.4 and -0.2, they still over-cover μ for smaller values of n. The results

for σ

with l = 2n

1/3

 look very similar to those the results for σ

with l = n

1/3

, but coverage rates of σ

do look

slightly lower especially for negative values of φ, although Student’s t CIs are again not as inﬂuenced by this change

in l. A larger sample size seems necessary when using other CIs to estimate σ

for l = 2n

1/3

. Recentered percentile

and standard CIs have slightly lower coverage rates when estimating negative values of φ with l = 2n

1/3

. Although

it is still a problem, the coverage deterioration appears to be less dramatic for BCA, BC, and percentile CIs. Aside from

these diﬀerences, the overall changes in performance when other experimental factors are changed are the same as when

l = n

1/3

.

Marginal Unit Exponential Distribution

For the scenario of marginal exponential distribution, the empirical coverage rates for μ, σ

, and the lag-1 autocorrela-

tion coeﬃcient ρ using block bootstrap with l ∈{n

1/3

, 2n

1/3

, as well as 95% conﬁdence intervals of the real cover-

age are displayed in Figure 3. Additionally, a set of 50 CIs are displayed for each of the percentile, BC, and recentered

percentile approaches for exponentially distributed samples of n ∈{800, 1600} with lag-1 autocorrelation coeﬃcient

0.355 (φ =0.4)inFigure 4.

It appears that a greater sample size is generally required for the bootstrap CIs to cover the mean and standard devia-

tion parameters in the exponential margin case than in the normal margin case. However, the other trends and patterns

discussed regarding the performance of various methods and diverse parameters remain unchanged. For example, Stu-

dent’s t conﬁdence intervals still exhibit higher coverage rates in comparison to alternative methods. Performance con-

tinues to be more favorable when temporal dependence is negative rather than positive. Again, altering the block size

results in the same changes in performance of diﬀerent CIs as those in the scenario of marginal normal distribution.

Of particular signiﬁcance, the percentile, BC, and BCA conﬁdence intervals still display a decline in coverage accuracy

for the lag-1 autocorrelation coeﬃcient as sample size increases as demonstrated in Figure 4. Both the percentile and

BC intervals persist in manifesting the same bias issue. On the other hand, the recentered percentile conﬁdence interval

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 12

Figure 3. Empirical coverage rates of diﬀerent 95% block bootstrap CIs for the marginal mean μ, the marginal standard deviation σ

, and the ﬁrst-

order autocorrelation coeﬃcient ρ of a stationary series with marginal unit exponential distribution obtained by transforming an AR(1) process with

φ ∈{−0.4, 0.2, 0, 0.2, 0.4} with series length n ∈{100, 200, 400, 800, 1600, 3200} based on 10,000 replicates replicates of block bootstrap with

l =



1/3



. The error bars re

resent 95% CIs of the real covera

e rates. To

: l =



1/3



. Bottom: l =



1/3



American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 13

Figure 4. 50 replicate percentile, BC, and recentered percentile CIs for the lag-1 autocorrelation coeﬃcient ρ =0.355 (φ =0.4) of a stationary series

with marginal exponential distribution obtained by transforming an AR(1) process with φ =0.4 with sample size n ∈{800, 1600} (l = n

1/3

). For

each replicate, the lower and upper bounds of the CIs are displayed, as well as

(blue circle) and

(red cross).

continues to be eﬀective in estimating the temporal dependence due to the inherent unbiasedness of the original point

estimator. In sum, for the most part, the ﬁndings for series that are marginally exponentially distributed closely mirror

those attained for series that are marginally normally distributed.

DISCUSSION

Block bootstrap is a useful method for estimating parameters of a time series, from simple parameters like the mean to

more complicated temporal dependence factors. We know theoretically that the block bootstrap procedure will cover

a parameter of a time series at the nominal level given an inﬁnitely large sample,

so the goal for this study was to

ﬁnd the smallest ﬁnite sample length n of a time series in order for the block bootstrap procedure to recover its asso-

ciated parameters at an acceptable rate. Our analysis relies on the assumption that there is a size n large enough for

the method to work: that is, the method’s performance improves as n increases. Out of the six types of intervals used

in this study, this assumption was found to hold true with respect to estimating φ only for standard normal, Student’s

t, and recentered percentile CIs, whereas percentile, BC, and BCA intervals exhibited coverage deterioration as n in-

creased. The percentile CI’s coverage deterioration can be attributed to bias that is not corrected as n increases. Specif-

ically, as n increases, the width of the CI decreases, but because the percentile CI underestimates φ, the coverage de-

creases. The BC CI seems to correct the bias, but the width of the CI seems to be too short. The acceleration factor of

the BCA CI seems to fail, as the width of the CI seems to be too short.

One of the goals of this study was to provide some practical recommendations for necessary sample sizes when us-

ing block bootstrap to estimate the parameters of serially dependent data. When using Student’s t intervals and the

marginal distribution and temporal dependence is unknown, the results of this study suggest that n ≥ 1600 may be

necessary for common practice to estimate μ, whereas n>3200 may be necessary to estimate the standard deviation.

Student’s t is always preferable to Standard Normal CIs as they performs better for smaller sample sizes and performs

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 

as good or better for larger sample sizes. Lastly, to estimate lag-1 autocorrelation, n ≥ 100 using the Student’s t method

may be suﬃcient under a marginal standard normal distribution, whereas n ≥ 1600 may be required under under a

marginal exponential distribution. Further investigation may be necessary to see if there are other percentile-based in-

terval corrections that ﬁx the coverage deterioration problem for φ.

Although we have only used serial dependences as strong as 0.4, we have established the trends as |φ| gets larger. When

estimating μ, we expect coverage rates to decrease as φ approaches 1 — as φ approaches -1, we may observe increased

over-coverage. When estimating the standard deviation, we expect a larger sample size to be necessary as |φ| gets closer

to 1. Lastly, when estimating φ for a marginal normal distribution, we expect a larger sample size to be necessary as |φ|

approaches 0, assuming standard normal, Student’s t, or recentered percentile CIs are used. However, when estimating

the ﬁrst-order autocorrelation of a marginal exponential distribution using the same methods, we expect coverage rates

to respond to stronger dependences in a trend similar to that of coverage rates of μ. We expect other percentile-based

CIs, which are already inadequate for relatively weak dependence structures, to perform even worse as |φ| approaches 1.

This study could be used as a guide for applied statistics courses for students to generally understand how large of a

sample size is suﬃcient for block bootstrap to be used versus other inference methods. For undergraduate or gradu-

ate students, block bootstrap is not typically a part of curriculum, but the results of this study can easily be used to

demonstrate when it is practical to use this method. This information could also prove to be useful for research us-

ing block bootstrap estimation of time series in domains such as econometrics. Future studies could investigate the n

needed to make inferences about other forms of serially dependent data such as a moving average process. One could

also investigate if there are types of block bootstrap interval construction such as ABC or bootstrap-t intervals

that

could more appropriately recover the parameters of a time series. We discussed some drawbacks of block bootstrap in

the introduction, which could motivate a similar simulation study for alternatives to block bootstrap, such as AR-Sieve

bootstrap,

which Buhlmann

ﬁnds to be the best for linear time series. Finally, there is a need for a more in-depth

exploration to comprehend the reasons behind the subpar performance of existing percentile-based CIs when estimat-

ing the autocorrelation parameter. It is crucial to conduct a thorough investigation into the speciﬁc scenarios where the

proposed CI demonstrates superior performance and the conditions under which it should be recommended.

REFERENCES

1. Efron, B. (1979) Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7(1), 1–26.

https://doi.org/10.1007/978-1-4612-4380-9_41

2. Hall, P. (1985) Resampling a coverage pattern. Stochastic Processes and their Applications 20(2), 231–246.

https://doi.org/10.1016/0304-4149(85)90212-1

3. Carlstein, E. (1986) The use of subseries values for estimating the variance of a general statistic from a stationary

sequence. The Annals of Statistics 14(3), 1171–1179. https://doi.org/10.1214/aos/1176350057

4. Kunsch, H. R. (1989) The jackknife and the bootstrap for general stationary observations. The Annals of Statistics

17(3), 1217–1241. https://doi.org/10.1214/aos/1176347265

5. MacKinnon, J. G. (2006) Bootstrap methods in econometrics. The Economic Record 82(1), 2–18.

https://doi.org/10.1111/j.1475-4932.2006.00328.x

6. Varga, L., and Zempléni, A. (2017) Generalised block bootstrap and its use in meteorology. Advances in Statistical

Climatology, Meterology and Oceanography 3(1), 55–66. https://doi.org/10.5194/ascmo-3-55-2017

7. Calhoun, G. (2018) Block bootstrap consistency under weak assumptions. Econometric Theory 34(6), 1383–1406.

https://doi.org/10.1017/S0266466617000500

8. Lahiri, S. N. (1999) Theoretical comparisons of block bootstrap methods. The Annals of Statistics 27(1), 386–404.

https://doi.org/10.1214/aos/1018031117

9. Bühlmann, P., and Künsch, H. R. (1999) Block length selection in the bootstrap for time series. Computational

Statistics & Data Analysis 31(3), 295–310. https://doi.org/10.1016/S0167-9473(99)00014-6

10. Bühlmann, P. (2002) Bootstraps for time series. Statistical Science 52–72. https://doi.org/10.1214/ss/1023798998

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 15

11. Hesterberg, T. C. (2015) What teachers should know about the bootstrap: Resampling in the undergraduate statis-

tics curriculum. The American Statistician 69(4), 371–386. https://doi.org/10.1080/00031305.2015.1089789

12. Chernick, M. R., and Labudde, R. A. (2009) Revisiting qualms about bootstrap conﬁdence intervals. American

Journal of Mathematical and Management Sciences 29(3-4), 437–456. https://doi.org/10.1080/01966324.2009.10737767

13. Nevitt, J., and Hancock, G. R. (2001) Performance of bootstrapping approaches to model test statistics and pa-

rameter standard error estimation in structural equation modeling. Structural Equation Modeling 8(3), 353–377.

https://doi.org/10.1207/S15328007SEM0803_2

14. Burch, B. D. (2012) Nonparametric bootstrap conﬁdence intervals for variance components applied to in-

terlaboratory comparisons. Journal of Agricultural, Biological, and Environmental Statistics 17(2), 228–245.

https://doi.org/10.1007/s13253-012-0087-9

15. Puth, M.-T., Neuhäuser, M., and Ruxton, G. D. (2015) On the variety of methods for calculating conﬁdence inter-

vals by bootstrapping. Journal of Animal Ecology 84(4), 892–897. https://doi.org/10.1111/1365-2656.12382

16. Gonçalves, S., and White, H. (2005) Bootstrap standard error estimates for linear regression. Journal of the American

Statistical Association 100(471), 970–979. https://doi.org/10.1198/016214504000002087

17. DiCiccio, T. J., and Efron, B. (1996) Bootstrap conﬁdence intervals. Statistical Science 11(3), 189–228.

https://doi.org/10.1214/ss/1032280214

18. Rice, J. A. (2006) Mathematical Statistics and Data Analysis. Cengage Learning, Boston

19. Efron, B., and Tibshirani, R. (1993) An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton.

https://doi.org/10.1201/9780429246593

20. Carpenter, J., and Bithell, J. (2000) Bootstrap conﬁdence intervals: When, which, what? A practical guide

for medical statisticians. Statistics in Medicine 19(9), 1141–1164. https://doi.org/10.1002/(SICI)1097-

0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F

21. Efron, B. (1987) Better bootstrap conﬁdence intervals. Journal of the American St atistical Association 82(397), 171–

185. https://doi.org/10.1080/01621459.1987.10478410

22. R Core Team. (2022) R: A Language and Environment for St atistical Computing. R Foundation for Statistical Com-

puting, Vienna, Austria

23. Canty, A. (2022) boot: Bootstrap R (S-Plus) Functions. R package version 1.3-28.1

24. Clopper, C. J., and Pearson, E. S. (1934) The use of conﬁdence or ﬁducial limits illustrated in the case of the bino-

mial. Biometrika 26(4), 404–413. https://doi.org/10.2307/2331986

25. Scherer, R. (2018) PropCIs: Various Conﬁdence Interval Methods for Proportions. R package version 0.3-0

26. Brown, L. D., Cai, T. T., and DasGupta, A. (2001) Interval estimation for a binomial proportion. Statistical Science

16(2), 101–133. https://doi.org/10.1214/ss/1009213286

27. Wickham, H. (2016) ggplot2: Elegant Graphics for Dat a Analysis. Springer-Verlag, New York

https://doi.org/10.1007/978-0-387-98141-3

28. Hofert, M., Kojadinovic, I., Mächler, M., and Yan, J. (2018) Elements of Copula Modeling with R. Springer, New

York. https://doi.org/10.1007/978-3-319-89635-9

29. Geyer, C. J. (2011) Introduction to Markov chain Monte Carlo. In S. Brooks, A. Gelman, G. L. Jones, and X.-L.

Meng, editors,

Handbook of Markov chain Monte Carlo, 3–48. CRC Press, Boca Raton https://doi.org/10.1201/b10905

30. Kreiss, J.-P. (1992) Bootstrap procedures for AR (∞)—processes. In Bootstrapping and Related Techniques:

Proceedings of an International Conference, Held in Trier, FRG, June 4–8, 1990, 107–113. Springer, New York.

https://doi.org/10.1007/978-3-642-48850-4_14

ABOUT STUDENT AUTHOR

Mathew Chandy is a senior majoring in both Statistics and Statistical Data Science, and he plans to graduate in the

Spring of 2024.

American Journal of

8QGHUJUDGXDWH5HVHDUFK

www.ajuronline.org

9ROXPH_,VVXH_0DUFK 16

PRESS SUMMARY

This simulation study evaluates the sample size necessary to estimate the mean, standard deviation, and lag-1 autocor-

relation of a stationary time series using diﬀerent block bootstrap conﬁdence interval types. The results showed that

percentile-based conﬁdence intervals for the lag-1 autocorrelation may suﬀer from coverage deterioration as sample size

is increased, motivating the authors to propose a new recentered percentile conﬁdence interval which does not deterio-

rate in performance for greater sample sizes. The results also suggest that when using Student’s t bootstrap conﬁdence

intervals, a sample size of at least 1600 may be suﬃcient to estimate the mean, whereas a sample size larger than 3200

may be necessary to estimate the standard deviation. The results additionally indicate that estimation of the lag-1 auto-

correlation - using Student’s t bootstrap conﬁdence intervals - demands a sample size of at least 100 when the marginal

distribution is standard normal and a sample size of at least 1600 when the marginal distribution is unit exponential.