A Higher-Order Correct Fast Moving-Average Bootstrap for Dependent Data

We develop and implement a novel fast bootstrap for dependent data. Our scheme is based on the i.i.d. resampling of the smoothed moment indicators. We characterize the class of parametric and semi-parametric estimation problems for which the method is valid. We show the asymptotic refinements of the proposed procedure, proving that it is higher-order correct under mild assumptions on the time series, the estimating functions, and the smoothing kernel. We illustrate the applicability and the advantages of our procedure for Generalized Empirical Likelihood estimation. As a by-product, our fast bootstrap provides higher-order correct asymptotic confidence distributions. Monte Carlo simulations on an autoregressive conditional duration model provide numerical evidence that the novel bootstrap yields higher-order accurate confidence intervals. A real-data application on dynamics of trading volume of stocks illustrates the advantage of our method over the routinely-applied first-order asymptotic theory, when the underlying distribution of the test statistic is skewed or fat-tailed.


Introduction
Inference based on first-order correct asymptotics can be misleading with confidence intervals having erratic probability coverage. It is especially true in the presence of serial dependence where first-order asymptotics often requires larger sample sizes than for i.i.d. data to apply. Resampling methods for time series help to obtain confidence intervals with better finite sample properties. Bootstrap methods for moment condition models have been extensively discussed under various dependence structures by, for example, Hall and Horowitz (1996), Brown and Newey (2002), Inoue and Shintani (2006), and Davidson and MacKinnon (2006). If bootstrap methods for m-dependent and strongly mixing data can achieve higher-order correctness (Hall and Horowitz (1996), Inoue and Shintani (2006)), they are computationally too intensive, once applied to heavy numerical estimation procedures. For a booklength review, see e.g. Lahiri (2010).
In this paper, we propose a novel fast bootstrap scheme, that we call the Fast Moving-average Bootstrap (FMB). The resampling method is computationally attractive while maintaining higher-order correctness of the inferential procedure for strongly mixing data. Our idea for building confidence regions for the parameter of interest is to realize that smoothing the moment indicators as in the Generalized Empirical Likelihood (GEL) literature permits to bootstrap them as if they were i.i.d. Parente and Smith (2018a) study the first-order validity of GEL test statistics based on a similar bootstrapping scheme, the Kernel Block Bootstrap (henceforth KBB); see Parente and Smith (2018b) and Parente and Smith (2020). Our approach differs from KBB in two significant aspects. First, our methodology does not require to solve the estimation problem at each bootstrap sample, lessening drastically the computational burden. Indeed, FMB is at least (except for a simple low-dimensional linear model) one thousand times faster, according to standard rules on bootstrap simulation errors (Efron (1987) Section 9, Davison and Hinkley (1997) Section 2.5.2). Second, we exploit an inversion technique to benefit from the kernel smoothing used in the studentization of our test statistic. The inversion is related to the standard percentile−t bootstrap approach in the linear univariate case (see Example 1 below).
The studentization relies on a simple sample variance of the smoothed moment indicators, which turns out to be asymptotically equivalent to a HAC estimator for the original moment indicators, as shown by Smith (2005). Together these inversion and studentization make our FMB inference amenable to be shown higher-order correct. Our proof strategy is not directly applicable to KBB; its higher-order correctness remains a conjecture.
The already existing fast resampling methods usually hinge on the first-order von Mises expansion of the estimating function (Shao and Tu (1995), Davidson and MacKinnon (1999), Andrews (2002), Salibian-Barrera and Zamar (2002), Gonçalves and White (2004), Hong and Scaillet (2006), Salibian- Barrera et al. (2006), Salibian-Barrera et al. (2008), Camponovo et al. (2012), Camponovo et al. (2013), Armstrong et al. (2014), and Gonçalves et al. (2019)). It yields a fast approximation, but its inherent construction does not ensure higher-order correctness. Instead, our fast method relies on inversion, namely we identify the level sets of test statistics under the null hypothesis to obtain confidence regions for the parameter of interest (see Parzen et al. (1994) and Hu and Kalbfleisch (2000) for i.i.d. data). Furthermore, the FMB confidence regions are invariant to monotonic reparameterization, due to studentization of the moment indicators. It ensures stability of our method across varying parameter scales (DiCiccio and Efron (1996)).
We design FMB for GEL estimator to exploit its intrinsic smoothing, and as it provides a considerably wide theoretical framework on semiparametric estimation (Smith (2011)). As a consequence, the higher-order refinements achieved by our method ensue for the Empirical Likelihood (see Qin and Lawless (1994), Imbens (1996), Kitamura (1997)), the Exponential Tilting (Kitamura and Stutzer (1997), Imbens et al. (1998)), and the Continuously Updating Estimator (Hansen et al. (1996)). In addition to the KBB, other bootstrap methods already exist in the GEL literature. For instance, Bravo (2004) shows the higher-order correctness of the bootstrap for inference based on empirical likelihood with i.i.d. data, while Bravo (2005) shows consistency of the block bootstrap for empirical entropy tests in times series regressions with strongly mixing data. However, to our knowledge, there is no proof of higher-order correctness of the bootstrap for GEL in the literature yet.
Clearly, we can also apply FMB in the setting of M-estimation (Huber (1964)) and Generalized Method of Moment (Hansen (1982)), obtaining a fast version of the bootstrap methods derived in Hall and Horowitz (1996) for m-dependent data and Inoue and Shintani (2006) for strongly mixing data.
The structure of the paper is as follows. Section 2.1 is a simple introduction to the FMB algorithm in the univariate case. There, we also discuss connections between FMB and already existing resampling schemes. In Section 2.2, we briefly present the GMM and GEL estimators for strongly mixing time series, using these frameworks as a tool to extend FMB to the multivariate setting. We itemize our assumptions and present the main theoretical results in Section 3. In Section 4, we give details on the implementation aspects of FMB, emphasizing the relation between the choice of the kernel and the properties of the long-run variance estimator. We also discuss connections with the recent literature on confidence distributions, that we use in our empirical application. We present our Monte Carlo experiments in Section 5, and a real data example in Section 6. Finally, we prove our theorems in appendix. For some technical lemmas, we give the proofs in the Supplementary Material (available online).

An introduction in the univariate case
Let {X t } t∈Z be a stationary strongly mixing process in R d , observed at t = 1, ..., T . We assume that the time series of interest satisfies Assumptions 1-7 in Section 3, which are standard in the bootstrap literature. Let B ⊂ R be the compact space of the parameter β and X t := {X t 1 , ..., X tv } be a collection of vectors from the process {X t } t∈Z . Consider the function g : R dv × B → R such that: where the expectation E is taken w.r.t. the true underlying distribution, unknown and depending on β 0 . In the following, we use the shorthand notation g t (β) := g (X t , β).
The function g in (1) can be the (conditional) likelihood in full parametric models, or it can be obtained using the (conditional) moments and/or may depend on instrumental variables in semiparametric models. The collection of vectors X t typically contains information on the relation between the observations and the parameter characterizing the q-dimensional stationary distribution of a time series. More generally, we can exploit the knowledge in closed-form of the (conditional) moments to obtain (martingale) estimating functions for non-linear conditional autoregressive and heteroscedastic models or discretely observed diffusions. We refer to Godambe and Heyde (1987), Taniguchi and Kakizawa (2000), and Kessler et al. (2012) for book-length presentations.
Each function of the sequence {g t (β)} T t=1 is often defined using the innovations, which can be i.i.d.
random variables or more generally martingale differences. Thus, {g t (β)} T t=1 exhibits less dependence than the original process {X t } T t=1 . Nevertheless, neglecting this temporal dependence has a serious impact on the performance of several inferential procedures, in particular it can affect the consistency of the bootstrap variance estimator and the higher-order accuracy of bootstrap confidence intervals.
To take automatically this aspect into account, we follow Kitamura and Stutzer (1997), Otsu (2006), Guggenberger and Smith (2008a), and Smith (2011), and we perform a convolution of the moment indicator g with the kernel k : R → R, obtaining: where B T is a bandwidth parameter, increasing in T and such that B T /T −→ 0. The convolution in (2) induces a HAC-type modification, ensuring consistency of the long-run variance estimation of the mean over timeḡ T (β) := T −1 T t=1 g T,t (β); see Newey and West (1987), Andrews (1991), andSmith (2005). Solvingḡ T (β) = 0 gives the just-identified univariate estimatorβ. Hence, the estimatorβ relies on a smoothed moment condition. Below, we explain how we can further exploit the convolution in (2) to derive our bootstrap.
Let us first give the intuition of our methodology for the construction of confidence interval (CI) for β 0 ; more technical aspects are available in Sections 2.2 and 4.1. For ease of notation, we drop the subscript T from any estimator, whenever its dependence on the sample size is clear from the context.
To keep the exposition as simple as possible, we assume temporarily a one-to-one relationship between the parameter and the estimating functionḡ T (β), in an suitable subset of B. Even though the probabilistic validity of the FMB CI does not depend on this condition (see e.g. Lehmann (1959), Shao (1999), Hansen et al. (1996) and Guggenberger and Smith (2008b)), this assumption allows us to explain easily why our resampling scheme does not need to solve the estimating equation for each bootstrap sample.
Intuitively, the construction of the FMB CI goes as follows. First, our bootstrap scheme provides a higher-order correct approximation of the distribution of a statisticŜ(β 0 ). This statistic is an asymptotically pivotal version of the estimating function T 1/2ḡ T (β), evaluated at the true parameter β 0 . Second, the one-to-one relationship allows us to map the quantile estimates ofŜ to CI limits in B. This mapping is crucial to gain computational efficiency. Indeed, we use the computationally intensive part of the FMB algorithm to compute the distribution of simple mean-type statisticŜ(β 0 ) which is much faster to compute than roots of T 1/2ḡ T (β), or numerical solutions to the estimating optimization problem (see Section 2.2). From an hypothesis testing point of view, FMB yields an approximation to the distribution ofŜ(β) under H 0 : β = β 0 . Then, each β in the CI is in the non-rejection region of H 0 .
The next example is a widely-applied model where the one-to-one condition is satisfied, since the considered function is monotonic. Below, we explain in Remark 2 how to adapt FMB to deal with general estimating functions, which do not necessarily satisfy the monotonicity condition.
For a given kernel k, smoothing these moment indicators leads to Equation (3) is linear in the parameter of interest β. Thus, neither taking the mean T 1/2ḡ T (β) nor rescaling T 1/2ḡ T (β) by a constant affect this linearity. As we build our statistic of interest by rescaling T 1/2ḡ T (β), the one-to-one condition is verified.
Now that the main principles of FMB are settled, we present the detailed algorithm underlying its numerical implementation. The statistic serving as basis for inference is the asymptotically pivotal quantity:Ŝ where, for instance,σ := κ 2 1 (T κ 2 ) −1 T t=1 g 2 T,t (β) and κ j := k (u) j du for j = 1, 2. This statistic is a particular value of the functionŜ : R dv × B → R,Ŝ(β) := T 1/2ḡ T (β) /σ, that we suppose strictly increasing on B. The studentization inŜ is crucial for FMB to be higher-order correct. In principle, we can apply other estimators of the long-run variance and we flag that each estimatorσ has its own bias, which is going to affect the properties (e.g. the accuracy) of FMB. We refer to Section 4.1 for further discussion.
Considering an i.i.d. bootstrap sample drawn from {g T,t (β)} T t=1 , say {g * T,t } T t=1 , the bootstrap version ofŜ in (4) is: withḡ * T := T −1 T t=1 g * T,t andσ * 2 := T −1 T t=1 g * 2 T,t . In (5), both the computed numerator and denominator rely on g * T,t , and thus avoid re-estimating the parameter on each bootstrap sample in order to make it fast.
Indeed, to derive the CI, the bootstrap procedure first provides (1 − α)-quantile estimates ofŜ(β 0 ), say q * 1−α . Then, a numerical method (e.g. Newton-Raphson or secant methods) defines a one-sided (1 − α)-CI for β 0 as [β min ,q 1−α ], where β min := min B and the upper limitq For a two-sided equal-tailed CI, we follow the same principles. We consider two real numbers s 1 and where R T is an asymptotically negligible remainder term. Hence, we can compute s 1 and s 2 such that P * [s 1 < S * ≤ s 2 ] = 1 − α. Then, the CI for β 0 is Under Assumptions 1-7 (Section 3), we can get R T = o p T −1/2 , given a suitable choice of kernel k and bandwidth B T (see Theorem 4 and discussion in Section 3). It implies that C 1−α is correct up to a higher order.
From the studentization inŜ(β), the CI limits c 1 and c 2 derived in Step 5 remain invariant to monotonic transformation of the parameter. This property is crucial for the bootstrap CI (DiCiccio and Efron (1996)), ensuring stability of FMB across varying parameter scale.
Example 1 [cont'd]. Let us see how the steps in Algorithm 1 specialize for the AR(1). In Step Step 2, the estimator is available in closed form:

For
Step 3 and Step 4, we define S * using (5), and we use i.i.d. resampling of {g T,t (β)} T t=1 . SinceŜ(β) is strictly decreasing, we switch the sign of S * andŜ(β) and proceed as in Step 5 of Algorithm 1 to build a one-sided CI for β 0 . In this particular case of linear models, we can rewriteŜ(β 0 ) as T 1/2 (β − β 0 )/ς, whereς =Ŵ −1σ andŴ := T −1 T t=1 ∂g T,t (β)/∂β. Similarly, we can rewrite the bootstrap counterpart where q * are quantiles of S * . In this representation, the FMB CI is similar to a percentile−t bootstrap CI, up to our use ofβ instead of β * in the definition of ς * , avoiding to compute β * , and in the multivariate case, to invert a matrix, for each bootstrap sample. This modification does not impact higher-order correctness, as shown in Section 3.
This modified FMB CI is simply connected with high probability when T is large enough. Indeed, we can show thatq = 4T is the highest value such that the sublevel set β ∈ B :Q (β) ≤q is still simply connected. It corresponds to the local maximum of a cubic polynomial (we provide a graphical illustration in Figure 2, Supplementary Material SM.9). Thus, the range of confidence level from which we can draw a simply connected set increases proportionally to the sample size T. In practice, we recommend to try using theQ approximation to guarantee the second-order correctness of the CI and connected CI with high probability. If this approach yields a disconnected CI because of a too small sample size T , the user can truncate the Taylor approximation at the quadratic term, which gives a simply connected CI w.p.1. This quadratic approximation does not guarantee higher-order correctness, but is prone to work better than the Gaussian approximation in practice. To summarise, we face three possibilities. If we useQ, we always get higher-order correctness, but not necessarily connected CI when monotonicity is not satisfied. If we use the cubic approximationQ, we get higherorder correctness and connected CI with high probability. If we use a quadratic truncation, while being asymptotically correct, higher-order correctness might be lost, but we ensure connected CI.
Therefore, we conclude that, even if the monotonicity condition is violated (or is not easy to check), we can choose a convenient statistic based on a cubic approximation and preserving the asymptotic properties of FMB. We point out that the proposed derivation of the CI only involves the (potentially numerical) computation ofQ s derivatives, whose evaluation is required only at the single pointβ. As the shape of the CI is fully determined by these derivatives, it simplifies and speeds up the implementation of Step 5.
Some further remarks on the other steps Algorithm 1 are in order. First, Step 1 -Step 3 (Lines 2-13) hinge on bootstrapping the moment indicator evaluated atβ. It justifies the adjective "fast" in the name of our resampling scheme, and bears some similarities to the already existing fast bootstrap (henceforth FB) methods; see Shao and Tu (1995), Davidson and MacKinnon (1999) (Parzen et al. (1994), Hu and Kalbfleisch (2000)). However, FB methods typically rely on a first order von Mises expansion, which approximation error prevents the FB to be higher-order accurate. Second, Step 3 (Line 12) computes the bootstrap statistic S * , where the kernel k creates a block of moment indicators evaluated atβ. The block of g t induced by the kernel is similar to a moving-average, as we emphasize in the name of our resampling scheme. The Moving Block Bootstrap (henceforth MBB) is the state-of-the-art higher-order correct alternative to FMB Künsch (1996), Lahiri (1996)). For MBB, the blocks are defined at the level of the observations, whereas in our case the convolution is applied to the moment indicators.
In the same family of groupwise resampling schemes, FMB is even more reminiscent of the Tapered Block Bootstrap of Paparoditis and Politis (2001) (hereafter TBB), in the sense that we can view their tapered block as our moving-average kernel. The main difference is that our kernel has unbounded support, in contradistinction with their block tapering window. It gives FMB an advantage in the studentization: it allows us to use the Quadratic Spectral (QS) kernel, which is optimal in terms of asymptotic mean squared error according to Andrews (1991). Parente and Smith (2018b) have already pointed out such an advantage for a KBB variance estimator. Yet, the TBB and the KBB approach of Parente and Smith (2018a) both require R bootstrap estimations. Thus, neither the TBB nor the KBB is fast and there is no result on their potential higher-order correctness. The higher-order correctness of the FMB approximation to the distribution ofŜ comes from jointly considering two ingredients: the smoothing of moment indicators and the studentization. 2 Taken in isolation, each ingredient does not allow to show higher-order correctness of the FMB CI.
To summarize, we itemize in Table 1 the main features of the discussed bootstrap schemes. We only list methodologies that are specifically designed for dependent data. (Section 3) also hold for the standard GMM case and are not tied to the use of GEL.

Generalized Method of Moments
Assume we have to conduct inference on the multivariate parameter β ∈ B ⊂ R p , where B is compact. We are given a random sample of X t = {X t 1 , ..., X tv } observed at t = 1, ..., T, and we define a set of moment conditions g : R dv × B → R r , with r ≥ p, such that E [g(X t , β 0 )] = 0. To handle the serial dependence, we define the smoothed moment indicator g T,t (β) as in the univariate case andḡ T (β) := T −1 T t=1 g T,t (β). Estimating β 0 via Generalized Method of Moments (Hansen (1982)) is the most popular approach in econometrics. In the next subsection, we discuss alternative estimators.
Step 1 -Step 3 of the FMB Algorithm 1 remain conceptually unchanged. As far as the bootstrap statistic is concerned, the principles of Step 4 and Step 5 stay the same as in Algorithm 1, the main change being that the asymptotically pivotal statistic becomes: } is a consistent estimator of the long-run covariance matrix of T 1/2ḡ T (β 0 ), of rank ν = r; 3 see Section 4.1. Standard results guarantee thatQ is asymptotically X 2 ν . As in the univariate case, the statistic of interest is a particular value of a function, We define the GMM estimator asβ = argmin β∈BQ (β). Similarly to the argument of Remark 2, we also define the cubic approximation centered on the root-T consistent estimatorβ : where the matrixĤ := ∂ 2Q (β)/∂β ∂β. It allows us to build simply connected level sets, yielding higher-order correct Confidence Region (henceforth CR), as shown in Corollary 5 (Section 3).
The bootstrap version ofQ is resampling scheme as in Algorithm 1. Now we are ready to state the algorithm of our FMB in the over-identified case, made of five steps (lines 2-4, 5, 6-13, 14-15, 16-17).
end for

17:
return C 18: end procedure A few remarks are in order.
Step 3 uses (9), where we recenter the bootstrap statistic. Indeed, the bootstrap expectation E * g * T,t =ḡ T (β) = 0 in the over-identified case. Thus, we subtract its expectation from g * T,t to recenter the bootstrap variable. This operation is crucial to achieve higherorder accurate CR in Step 5 of Algorithm 2 (see e.g. Hall and Horowitz (1996)).
Moreover, in contradistinction with the already existing FB methods, Step 3-4 mimic the variability of the covariance estimatorΩ (in (6)) to achieve higher-order refinements. To this end, we useΩ * instead ofΩ in the definition of Q * (in (9)), such that we randomize the bootstrap covariance estimator across the different bootstrap samples, and do not keep it fixed atΩ. Similar comment applies to Algorithm 1.
Finally, to define the CR for β 0 , we proceed similarly to Step 5 of Algorithm 1. We set q * 1−α such (Section 3), we show in Theorem 4 that the remainder R T can be at most of order o p T −1/2 , given a suitable choice of kernel k and bandwidth B T , which implies that C 1−α is correct up to a higher order.
Remark 3 (Step 5, Lines 16-17). If the higher-order correctness of FMB is guaranteed independently of the CR shape, they are generally not elliptical and they might fail to be simply connected (they can come in several pieces or contain holes). Although this irregularity is customary for small to moderate sample sizes, it can still make the results difficult to interpret. The monotonicity condition aforementioned is one of the possible ways out. As a more general solution, we propose to use the cubic approximationQ (as in (8)). Similarly to Remark 2, we haveQ(β) Thus, defining the modified FMB CR as preserves higher-order correctness, as shown in Corollary 5. The resulting CR is not simply connected for all sample sizes, but we can show that the range of confidence level (1−α) leading to a simply connected CR is proportional to the sample size T. It is not an asymptotic property and the desired CR can already be simply connected for a small sample size, depending on the second derivative of the moment indicator g. If the sample size is too small for this modified CR to be simply connected and the user thoroughly needs this property, we recommend to truncate theQ approximation at the second term. The resulting CR is always simply connected and elliptical, but, while being asymptotically correct, we cannot guarantee its higher-order properties.

Generalized Empirical Likelihood Estimation
FMB is not tied to a particular estimation method. In order to show its wide applicability, we consider a general estimation method, which includes (among others) a version of GMM. The Continuously Updating Estimator (CUE) (Hansen et al. (1996)), Empirical Likelihood (EL) (Qin and Lawless (1994), Imbens (1996), Kitamura (1997)), and the Exponential Tilting (ET) (Kitamura and Stutzer (1997), Imbens et al. (1998)) are all asymptotically equivalent to the (2S)GMM, but they tend to be less biased for small to moderate sample sizes (see Altonji and Segal (1996) for a Monte Carlo exploration, and Newey and Smith (2004), Anatolyev (2005) for theoretical insights). Putting EL, ET, and CUE under the same umbrella, Smith (2011) introduces the Generalized Empirical Likelihood (GEL) criterion for time series data. We briefly describe it before extending our FMB to this general setting. Smith (2011) defines the GEL criterion as:
Similarly to Khundi and Rilstone (2012) and Lee (2016), we are going to use the first order condition of the GEL criterion as a just-identified representation of the estimation problem to explain how to build the FMB statistic. Differentiating (10) w.r.t. to λ and β, we obtain: We can see from (11) that ρ 1 κB −1/2 T λ (β) g T,t (β) gives weights to the observations such that the moment conditions in g are always enforced in a given sample. GEL estimators are equivalent to some minimum discrepancy estimators based on the power-divergence family (see Cressie and Read (1984)), where the auxiliary vector parameter λ corresponds to the Lagrange multiplier enforcing this empirical moment condition. Thus, if the original moment conditions in g are correctly specified, the true (longrun) Lagrange multiplier is zero (λ 0 = 0). Applying our FMB to this setting only requires to define a quadratic statistic from the GEL first order conditions ( (11) and (12)) evaluated at the true value of the parameters. Yet, replacing λ = λ 0 = 0 and β = β 0 in (11) and (12) boils down to the original g T (β 0 ). Thus, the natural extension of FMB for GEL estimators requires to take the asymptotically pivotal statisticQ(β 0 ) as in the GMM case (6). As a consequence, FMB in the GEL setting is exactly the same as in the GMM setting (see Algorithm 2), up to the initial estimatorβ. It is quite intuitive since, in absence of misspecification, the first order conditions of the GEL criterion convey the same information on β 0 as the moment condition g.

Theory
In the next theorems, we state that FMB CI and CR are higher-order correct. By construction, the higher-order correctness of the FMB CI and CR entirely hinges on our bootstrap approximation of the distribution for the test statisticsŜ(β 0 ) (as in (4)) andQ(β 0 ) (as in (6)).
We start by itemizing the assumptions and regularity conditions. In the following, we keep using the shorthand notations g t = g t (β 0 ) and g T,t = g T,t (β 0 ). For any vector V ∈ R n , we write V = We make use of generic constants C, δ and , whose value can differ from an expression to another. We define {g t } t∈Z on the probability space (Ω, A, P). Let example is to take D t := σ g t , but it is not always the most efficient choice to check the assumptions below (see Götze and Hipp (1983) and Götze and Hipp (1994) for practical examples). The higher-order correctness of FMB is subject to the following conditions (Götze and Künsch (1996), Lahiri (2010)), which we assume to hold for {g t } t∈Z : Assumption 3. There exists a constant δ > 0 such that for t, m = 1, 2, ... and m > δ −1 , we can Assumption 5. There exists a constant δ > 0 such that for all t, m = 1, 2, ..., δ −1 < m < t, and all Assumption 6. There exists a constant δ > 0 such that for all t, m, p = 1, 2, ... and A ∈ D t+p t−p , exp (iτ g T,t (β))|, there exist constants b and δ > 0 such that f (β 0 ) < 1 and is continuous at β 0 a.s.
Assumption 1 is an identification condition. It is necessary also because we evaluate the moment conditions atβ in the bootstrap samples. Since we are going to prove the validity and higher-order correctness of FMB using the (s − 2)-th order Edgeworth expansion for the mean of Götze and Hipp (1994), we require the moments in Assumption 2 to be defined. Assumption 3 ensures that the process {g t } is close enough to another process {g ‡ t,m }, measurable w.r.t. sub-sigma-fields belonging to the sequence {D t }, whose dependence structure is controlled by the mixing condition in Assumption 4.
Assumption 5 is the conditional Cramér condition of Götze and Hipp (1994) for weakly dependent process. Assumption 6 ensures that we can approximate the probability of A ∈ D t+p t−p given {D k : k = t} with an exponentially increasing accuracy, as the information in {D k : 0 < |t − k| ≤ m + p} increases with m. We need Assumption 7 on the regularity of the bootstrap characteristic function in order for the appropriate Cramér condition to hold for S * and Q * . We refer to Götze and Hipp (1983) for a general overview of the processes in agreement with our assumptions. As an example, the OLS moment indicators for the autoregressive parameters of a stationary AR(p) process j=0 w j e t−j satisfy Assumptions 1-7, when e t i.i.d.
We verify the assumptions for this example in the Supplementary Material (SM.10). We can check along the same lines that an ACD(v,0) model, and in particular the ACD(1,0) used in our Monte Carlo simulations, satisfies Assumptions 1-7.
Under the stated assumptions, we prove (see Appendix) the following two theorems, in which we denote by P * the bootstrap probability measure given the data: where A is a set and I A {V } = 1 if V ∈ A and 0 otherwise.
we have (i) for S * as in (5) andŜ as in (4): and (ii) for Q * as in (9) andQ as in (6): It is now apparent that the kernel k impacts the bootstrap accuracy through the variance estimators inŜ andQ. As discussed by Parzen (1957) and Andrews (1991), the bias of this estimator depends on the smoothness of the kernel k * at zero. The kernel k * (a) : where q is the Parzen exponent of k * , namely the maximal natural number such that lim a→0 (1 − k * (a)) /|a| q is finite. The bias is minimal for the rectangular kernel. However, as discussed in Section 4.1, the resulting estimateΩ is not necessarily positive semi-definite. In contrast, the QS kernel has an optimal Parzen exponent q = 2 over all the kernels giving positive semi-definite estimators. Thus, the covariance matrix estimator with QS kernel has asymptotic MSE of order O(B T /T )+O B −2 T . Consequently, B T must grow faster than T 1/4 and slower than T 1/2 , for the bootstrap error to be o p T −1/2 . In Section 2, we define the FMB CR by (8)), where q * 1−α is the approximation Theorem 4 gives the order correct by construction, as the (first-order) Gaussian approximation is at best of order O(T −1/2 ). If B T = CT γ , C > 0, we get the conditions q > 1 and (2q) −1 < γ < 1/2. Similar arguments apply for Algorithm 1. Part (ii) extends the higher-order correctness of the i.i.d. bootstrap of Hu and Kalbfleisch (2000) in the just-identified multivariate parameter case (see their Remark 10) to the over-identified case with time-dependent data.
When the monotonicity condition discussed in Section 2 (Example 1) is violated (or is uneasy to check), we may use the alternative FMB CI or CR defined as (8)), which are simply connected when T is large enough and centered atβ. The following corollary (see the Supplementary Material for its proof) states the higher-order correctness of this alternative version of FMB CI and CR, based on theQ statistic.
we have forQ as in (8) and Q * as in (9): where q is the Parzen exponent of k * .
As a consequence, we have Thus, similarly to the CI and CR based onŜ andQ, the CI and CR based onQ are higher-order correct if we select a kernel k such that

Consistent covariance matrix estimation
In this section, we give the necessary details on the appropriate way to estimate the long-run variance in the statisticsŜ (as in (4)) andQ (as in (6)). We treat the general case ofΩ (as in (6)), from which the univariate case can be easily deduced. Following Andrews (1991) and Smith (2011), we can obtain several estimatorsΩ using different types of kernels. Let us give some examples of kernels useful in the implementation of FMB and their respective properties.
Example 7 (Quadratic Spectral kernel). Among the available kernels ensuring the long-run variance estimator to be positive semi-definite, Andrews (1991) points out the optimal QS kernel k * QS , as well as the respective optimal bandwidth B opt T = O T 1/5 . From the relationship K * (λ) = (2π/κ 2 ) |K (λ)| 2 and the inverse Fourier transform, Smith (2011) identifies the kernel Thereby, we preferably use k J (x) of Example 7 in (2) to get the estimatorΩ. Indeed, both from theory and simulations, the QS kernel is the optimal induced kernel in terms of asymptotic mean squared error (Andrews (1991)), among all kernels giving positive semi-definite long-run variance estimators.
The optimal bandwidth B T = O(T 1/5 ) minimising the mean-squared error of the variance estimator with k J (x) (see Example 5) does not satisfy the condition (1 − )/q < γ < for any ∈ (0, 1/2] (see Section 3) since q = 2 for the QS kernel. Then, we can take γ = 1/3, so that we match the condition for q = 2. Wilhelm (2015) has already exhibited a discrepancy between the optimal choice for the HAC variance estimator and the optimal choice for a GMM point estimator. In his case, the optimal bandwidth for the point estimate is of the same order as the one minimizing the mean-squared error of the nonparametric plugin estimate, while the constants of proportionality are significantly different. In our case, the order is even different if we want to achieve higher-order correctness for FMB.
Alternatively, we may use the flat-top kernel version of the QS kernel, see Politis (2011), to get an even faster rate of convergence for the estimated long-run variance. Unfortunately, self-convolution of a kernel k cannot induce a flat-top kernel k * . Indeed, we know that it cannot be the case that U = X + Y , where the random variable U is uniformly distributed on [0, 1] (the flat-top part) and the random variables X and Y are independent and identically distributed (see Exercise 4.14.20 and its proof by contradiction in Grimmett and Stirzaker (2001)). As a consequence, if we want to benefit from the smaller bias of the flat-top kernel, we should use a different kernel for the original statistic and the bootstrap one. A potential modification of FMB is to decouple the kernel k * used in the variance estimator of the original statistic, say a flat-top kernel, and the kernel k used for the smoothed moment indicators. This version of FMB also achieves higher-order correctness since we maintain the asymptotic pivotal nature of the test statistics.

From confidence regions to confidence intervals
FMB user can manipulate the higher-order correct CR C 1−α to obtain CR for a subset of parameters, or CI for a single parameter. In this section, we will consider separately the (possibly multivariate) parameter of interest β 0 , and the (possibly multivariate) nuisance parameter β 0 . Without loss of generality, we write the partition in the order β = (β (1) , β (2) ) .
If the inequalityQ(β 0 ,β (2) ) ≤Q(β 0 ) fails to be true almost surely, we cannot guarantee the higherorder correctness of C remains true for large enough T, as long asQ(β at least with probability 1 − α asymptotically, but not always with higher-order refinements. The drawback of no guarantee of higher-order refinements is inherent to the existing information on a multivariate parameter. The difficult task of reducing a CR for β to a CR for β (1) is not directly entangled with the FMB, but more with the nature of dependence between β (1) and β (2) induced by the moment condition themselves. However, there exists a general way to build CR for β (1) 0 while preserving higher-order refinements. Indeed, defining the profile statisticQ(β (1) ) := inf β (2)Q(β (1) , β (2) ), we get Q(β (1) ) ≤ q * 1−α } preserves the higher-order refinements. This property comes with a cost, asC 1−α is generally more conservative than C 1−α and it might be heavy to compute in high dimension.

From confidence intervals to a confidence curve
Let us now make connections to the concept of Confidence Distributions (CD) that we use in our empirical application. It aims at answering the following question: can we also use a distribution function, or a "distribution estimator", to estimate or test for a parameter of interest in frequentist inference in the style of a Bayesian posterior? (see the review paper by Xie and Singh (2013)). Natural point estimators include the median, the mean, and the maximum of the CD density (Singh et al. (2005)). That "distribution estimator" is named CD in agreement with the terminology coined by Efron (1998), and traces back to the fiducial distribution of Fisher (1930), albeit being a purely frequentist concept. It was introduced by Schweder and Hjort (2002) and its asymptotic extension by Singh et al. (2005) (see also Xie et al. (2011), Veronese and Melilli (2015), and the book-length presentation of Schweder and Hjort (2016) H * S (β), randomness is not coming from the (non-random) parameter β, but fromŜ and S * .
These p-values also benefit from higher-order correctness. Collecting them for different values of β yields the so-called confidence curve CV * (β) := 2 min{H * S (β), 1 − H * S (β)}, introduced by Birnbaum (1961) (see Xie and Singh (2013) and Hannig et al. (2016) for illustrations). We can view this graphical tool as a piled-up form of two-sided CI of equal tails, at all confidence levels. We provide an example of such a plot in Figure 1 for our empirical application in Section 6, where we compare CI given by our FMB and first-order Gaussian asymptotics. Finally, Coudin and Dufour (2020) show how we can design a Hodges-Lehmann-type point estimator (Hodges and Lehmann (1963)) when a CD is constructed from a hypothesis test.
There exist analogue multivariate CD, for instance Definitions 5.1 and 5.2 in Singh et al. (2007).
Similarly to the univariate case, we can apply FMB to achieve higher-order accuracy, as long as these multivariate confidence distributions are based on the test statisticQ(β 0 ).

Monte Carlo experiments
To illustrate the applicability of FMB, we consider a simulation exercise on constructing CR for the parameters of an Autoregressive Conditional Duration (ACD) model (Engle and Russel (1998)). It is a model typically applied for the analysis of high-frequency data in finance, and more generally to model positive variables (e.g. volatility or volume) via a multiplicative error model; see e.g. Hautsch (2012) for a recent book-length presentation.
The duration is defined as the time lag between two consecutive events occurrence, namely x := ∼ E (1) for any , with E (1) being an exponential random variable with mean one. Specifically, for the ACD(1, 1) specification, we have for ω > 0 β 1 , β 2 ∈ R + and β 1 + β 2 < 1. When we take β 2 = 0, the ACD(1,0) model is in agreement with Assumptions 1-7 (in Section 3). Thus, we start our numerical experiment with this specification, conducting inference on β := (ω, β 1 ) . We apply the optimal estimating functions of Li and Turtle (2000) and a moment condition which does not assume any specific functional form for the underlying innovation density, but relies on the unconditional expectation of x .
In a second step, to get numerical insights on the applicability of our FMB, we extend our Monte Carlo experiment to a general ACD(1,1) model. The latter is non-markovian in the observations and does not fit our setting stricto sensu, since we assume the vector of observations in (1) to be finite to ease notations and proofs. However, taking a large number v of lagged durations in an ACD(v,0) model is close to an ACD(1,1) model, and it meets our current theoretical framework.
In the following, we compute by Monte Carlo simulations the coverage of FMB CR. We label the resultsQ F M B for the FMB usingQ,Q F M B for the FMB using the cubic approximation of Remark 3 andQ F M B,2 for the FMB using the quadratic approximation of Remark 3. To validate numerically our theoretical results, we compare the coverage of these FMB CR to some standard first-order correct alternatives. The first one, labeled asQ X 2 r , defines CR as contours of the same statistic than FMB, but making use of the (first-order correct) X 2 r asymptotic distribution to compute the rejection probabilities.
The second one is the standard elliptical contour of an asymptotically X 2 p distributed Wald statistic (labeled as W X 2 p ), whose covariance matrix is a HAC estimator with bandwidth B T . To compare with a state-of-the-art competitor of FMB in terms of higher-order correctness, we also show the coverage of CR yielded by MBB. We adapt the MBB of Inoue and Shintani (2006) to a Wald statistic, which is, in turn, an adaptation of Götze and Künsch (1996). As the statistic defining the CR has to be asymptotically pivotal to obtain higher-order correctness, we choose to apply MBB to the Wald statistic, which we label as W M BB . We show a detailed CPU time comparison between FMB and MBB in the Supplementary Material (Table 6). In general, FMB appears to be at least 10 3 times faster than MBB as expected.
In Table 2, we display the empirical coverages for the ACD(1,0) model, for B T = 3 and B T = 5. In Table 3, we show the same outputs for the ACD(1,1) specification.    In line with our theoretical results, we observe that FMB performs well compared to its first-order correct competitors: the coverages are typically closer to their nominal level. It generally remains true for theQ F M B version of FMB CR, whose coverage is often very close to the one of the original FMB CR based onQ F M B . The coverage of the quadraticQ F M B,2 version of FMB CR is generally further away from the original FMB.
The Wald statistic seems to yield very erratic CR (which is additionally confirmed by unreported plots). The MBB version of the Wald statistic improves slightly on the asymptotic X 2 p distribution, without being convincing though.
Finally, the FMB presented here does not take advantage of all the potential fine-tuning, and this should leave room for practical improvement. First, we might improve FMB if the long-run vari- ance Ω(β 0 ) is estimated with a less biased version of variance estimator, for instance carrying out a prewhitening step (Andrews and Monahan (1992)), or using a flat-top kernel (Politis (2011)

Real data application
In this section, we illustrate how FMB performs on real data. We look at daily volumes of stock transaction (in millions), modeled with the same exponential ACD as in Section 5 (see (15)). We focus on data available online (Yahoo! Finance), for five stocks in three different sectors, namely bank, technology, and food. We compute the CR (for parameters ω, β 1 and β 2 ), before the subprime crisis ( year. Before diving into a deeper analysis, we briefly describe the data at hand in Table 4 below. Table 4 illustrates the larger variability of the volumes of transaction during 2008, as measured by the standard deviation (SD) and the interquartile range (IQR). The high skewness (SKN) and excess of kurtosis (KURT) typically indicate that a higher-order correct inferential procedure might be required in finite samples.
To investigate further the impact that asymmetry and fat tails may have on the conducted inference, we compute the FMB and the asymptotic normal (Asy) CR of nominal coverage (1 − α) = 95%, for the ACD(1,1) parameters ω, β 1 and β 2 at each period. As FMB yields higher-order accurate CR by inverting probabilities of the test statisticQ(β 0 ) =Q(ω 0 , β 1,0 , β 2,0 ), we represent these trivariate CR by slicing them at the estimatesω,β 1 andβ 2 (Table 5). Namely, we cut the CR by fixing the parameters that are not of interest to their estimated values.To keep Table 5 concise, we do not report the estimatê ω and the intervals for the parameter ω. We can deduce the former from Table 4. 4 We consider the companies Bank of America (BA), JP Morgan (JPM), Microsoft (MSF), Coca-Cola (KO) and Unilever (UL). We use the Exponential Tilting estimator (Est), a particular case of GEL, and the benchmark intervals are the first-order asymptotic Gaussian (Asy). All the intervals are equal-tailed and have conditional nominal coverage of 95% when we fix the other parameters at their estimated values.
A few comments are in order. First of all, the different sectors exhibit very diverse reactions to the events happening in 2008. For instance, the food sector seems to be the most stable, while financial sector undergoes a huge variability, as we could expect. We can observe this either comparing non-4 Using Section 5 and volumes of transaction {xt} T t=1 instead of durations, we haveω ≈ (1 −β1 −β2)T −1 T =1 x , from the moment condition based on g3,t. We report the sample mean T −1 T =1 x in Table 4.
critical periods to the crisis, or comparing the estimates and their CI before and after the crisis. For instance, the estimates for Unilever are almost the same before and after the crisis, as if the company has recovered the same volume behaviour. Coca-Cola looks equally stable with respect to the parameter β 2 , which is almost unchanged after the crisis. Second, the estimateβ 1 , respectivelyβ 2 , seems to be larger, respectively smaller, during the crisis period. It is expected since β 1 reflects the sudden trading reactions due to changes in the expectations by the market participants during the crisis period. Thus, this feature of 2008 corresponds to an increase of the impact of news (shocks) on the volumes of transaction (via the parameter β 1 ), relative to persistence (via the parameter β 2 ). Finally, we see that the FMB CI are longer than the first-order correct Gaussian CI. It is in line with our Monte Carlo experiments, as available in Section 5. Indeed, as the CR are defined by level sets ofQ, a longer CI corresponds to an adaptation of FMB to a skewed or fat-tailed distribution. Since the CI obtained by Gaussian approximation are typically shorter, we conclude that the routinely applied first-order asymptotic theory tends to underestimate the rejection probability, whereas FMB stays conservative.
Our experience underpinned by several Monte Carlo simulations makes us expect that the distribution ofQ(β 0 ) is more skewed or fat-tailed than the chi-squared; see the comparison betweenQ X 2 r and FMB in Table 3.
Following our discussion on CD (Subsection 4.3), we illustrate here the link between our FMB CR and our previous definition of asymptotic confidence distribution H * S (β), via the confidence curve CV * (β). Among the alternative ways to represent the former CR, marginalization allows us to build unconditional CI. Stacking the CR at different coverages 1−α leads to a center-outward confidence curve for the multidimensional parameter CV (ω, β 1 , β 2 ). For each CI, we integrate out the two parameters that are not of interest in CV (ω, β 1 , β 2 ). This yields a different confidence curve CV * (β) for each β ∈ {ω, β 1 , β 2 }, whose level sets give the equal-tailed CI. As an illustration of graphical use of these confidence curves (defined in Section 3), Figure 1 reports a comparison between the FMB and Gaussian CI based on the FMB and Gaussian confidence curves. Again we observe that FMB is much more conservative.

Appendix: Proofs
In this appendix, we prove the asymptotic refinements of FMB. By construction, the higher-order correctness of FMB CI and CR (for β 0 ) entirely hinges on FMB applied to the test statisticsŜ(β 0 ) (see (4)) andQ(β 0 ) (see (6)). Therefore, it follows from Theorem 4 as in Section 3.
The outline of the proof goes as follows. In Appendix A.1, we derive an Edgeworth expansion for S(β 0 ) andQ(β 0 ). In Appendix A.2, we derive a similar Edgeworth expansion for the bootstrap counterparts S * and Q * . In Appendix A.3, we show that the difference between the two Edgeworth expansions The first term in this difference is smaller than the usual O p (T −1/2 ) of the standard first-order asymptotics (central limit theorem). The improvement is essentially due to FMB being able to approximate accurately the third moment of the statistics, whilst CLT approximates only the first two moments. It yields higher-order correctness.
The second and third term of the difference has the same order than the bias and variance of the variance estimator, which scalesŜ(β 0 ) andQ(β 0 ). We get the order For the sake of exposition, technical lemmas and lengthy derivations are available in the online Supplementary Material.

A.1. Edgeworth expansion of the original sample statistics
In this section, we derive the Edgeworth expansions ofŜ(β 0 ) andQ(β 0 ). We state the following: Theorem 9. Under Assumptions 1-6, with s ≥ 8 and log T = o(B T ), we get the Edgeworth expansions: with uniform error bound: In the expansions, p 1 is an even polynomial in x, and p 2 and p Q are odd polynomials in x. These polynomials depend on vectors K 1 S , K 2 S , and K Q , respectively containing the first three cumulants ofŜ andQ. The uppercase Φ and F X 2 r are respectively the c.d.f. of a N (0, 1) and a X 2 r random variable, the lowercase φ and f X 2 r stand for their densities.
To prove the statements, we remark thatŜ(β 0 ) bears some similarity to the studentized smooth function of means of Götze and Künsch (1996). Therefore, our proof mainly relies on the same strategy as in Götze and Künsch paper. However, we have to discuss two important distinctions.
First, Götze and Künsch (1996) derive an Edgeworth expansion considering a class of studentizing factor of the formς 2 = κ 2 1 T −1 s=1−T k * (s/B T )Γ s (β), withΓ s (β) := T −1 T −s t=1 g t β g t+s β , as introduced by Andrews (1991). Our definition of the studentizing factor is different, but asymptotically equivalent, sinceσ is a consistent approximation of k * (s/B T ) by Riemann sum (Smith (2005)). In particular, the bias and variance of both studentizing factor have the same order, respectively O(B −q T ) and O(B T /T ), where q is the Parzen exponent (see Section 3). As a consequence, both studentizing factors act equivalently on the error bound of the Edgeworth expansion forŜ(β 0 ) (see (25)).
Second, from the convolution step of Equation (2), we are interested in T 1/2ḡ instead of T 1/2ḡ = T −1/2 T t=1 g t , as in Götze and Künsch (1996). Thus, we have to derive a valid Edgeworth expansion under this modification. To this end, we rewrite In this representation, the kernel smoothing induces a tapering window w(t) on the summand time series in such a way that T 1/2ḡ T = T −1/2 T t=1 w(t)g t . Hence, we need to check that the regularity conditions given by Götze and Hipp (1994) hold for the tapered moment indicators, when they are assumed to be true for the original ones (Assumptions 1-6).
Proof of Theorem 9.
To check Assumptions 1-6 for {w(t)g t }, we label as "RC i" the regularity conditions defined in Assumptions i; for instance "RC 1" represents the regularity condition of Assumption 1.
When we assume that RC 1 and RC 2 hold for the process {g t }, we can easily verify that they are true also for {w(t)g t }, given that w(t) < ∞, ∀t. To check RC 3, consider that by Assumption 3 on {g t }. Thus, the exponential rate of decay of the approximation error is not affected by tapering, and we can always take the process {w(t)g ‡ t,m } to approximate {w(t)g t }. Without loss of generality, let us take D t := σ g ‡ t,0 , and note that both w(t)g ‡ t,0 and g ‡ t,0 are D t -measurable. Then, RC 4 and RC 6 follow immediately, by means of Assumptions 4 and 6 on the same sigma-fields D t .
The verification of RC 5 is more technical and put in the online Supplementary Material. Here, we summarize the result in Lemma 10: Lemma 10. If RC 5 holds for {g t } T t=1 , then it holds for Therefore, the validity of Assumptions 1-6 and Lemma 10 imply that we can suitably approximate the probability distribution of T 1/2ḡ T by an Edgeworth expansion of the same kind as the one for T 1/2ḡ , when the latter exists. To derive the Edgeworth expansion forŜ, we have to adapt the expansion for T 1/2ḡ T , as defined in Götze and Hipp (1994), to accommodate our studentization as in (4). To this end, we modify the proof of Götze and Künsch (1996) to consider the tapering related to {w(t)} T t=1 . To define our Edgeworth expansion, we need the following quantities: collecting the expansion of E T and the error terms discussed above: Invoking Esseen Lemma, we get the approximation error of Υ † S,T as in Theorem 9, Equation (16).
We provide also a sketch for the derivation of Edgeworth expansion forQ. The detailed calculation is available in the Supplementary Material. To get an Edgeworth expansion forQ, namely (17) in Theorem 9, we make use of the univariate Edgeworth expansion Υ † S,T , explicitly defined in (16).
Second, asΩ is symmetric positive semi-definite by construction, we have a unique symmetric positive semi-definite square rootΩ 1/2 , which admits an inverse. Thus, we have a vectorQ 1/2 := Ω −1/2 T 1/2ḡ T , such thatQ 1/2 Q 1/2 =Q. Projecting the vector T 1/2ḡ T onto the orthonormal eigenvectors ofΩ, we getQ 1/2 = Λ −1/2 P T 1/2ḡ where {λ 1 , ..., λ r } are the eigenvalues ofΩ corresponding to its normalized eigenvectors {v 1 , ..., v r }, Λ := diag(λ 1 , ..., λ r ), and P := (v 1 , ..., v r ). As T 1/2ḡ T v j /λ 1/2 j = T 1/2 υ ḡ T /(υ Ω υ) when we choose υ = v j for each j = 1, ..., r, we directly see that there exist expansions of the same form as in Theorem 9 Equation (16) for each element ofQ 1/2 . Furthermore, taking any vector c such that c = 1, there exists a univariate expansion of the same form as in Theorem 9 Equation (16) for c Q 1/2 /(c I r c) 1/2 = c Q 1/2 , as the variance estimator ofQ 1/2 is the r-dimensional identity matrix I r by definition. By the Cramér-Wold device, the characteristic function ofQ 1/2 is E r (τ c) := E[exp(iτ c Q 1/2 )], where τ is a scalar. As a consequence, there exists an expansion of the same form as in (27) for E r (τ c). Taking the inverse Fourier transform of this approximation, we approximate the probability distribution ofQ 1/2 by a multivariate Edgeworth expansion Υ r,T (z) := Φ r (z)+T −1/2 p 1 (z)φ r (z)+(B T /T )p 2 (z)φ r (z), where φ r (z) is the normal density on R r with mean zero and variance I r and Φ r (z) is the corresponding cdf. In this expansion, p 1 is en even polynomial in z and p 2 is an odd polynomial in z. The error of this approximation is by construction the same as the univariate one, as it is built with the same type of variance estimator.
Finally, we consider the statistic of interestQ and we work on a variable transform. Indeed, Along the same lines of Chandra and Ghosh (1979) (see Supplementary Material), we identify an expansion Υ † Q,T such that In this expansion, the elementary probability measure is X 2 r instead of the Gaussian Φ r , and the term of order T −1/2 disappears because p 1 is even in z. In principle, we can find explicitly p 1 by inverting the Fourier transform in (27). However, we do not need p 1 in this proof since we work directly with the Fourier transform. Likewise, we only give the polynomials p 2 and p Q formally (implicitly), as we do not need their explicit form in the sequel of the proof. What matters is the order of error to which they correspond, namely O(B T /T ).

A.2. Edgeworth expansion of the bootstrap sample statistics
In the sequel, E * denotes the expectation under the bootstrap probability measure as in (14). We state the following: Theorem 11. Under Assumptions 1, 2, 7 and if E g t qs+δ < ∞, for δ > 0, s ≥ 8, andq ≥ 3: with uniform error bound: where K 1 * S , K 2 * S and K * Q are the bootstrap counterparts of the vectors K 1 S , K 2 S and K Q .
Proof of Theorem 11. To prove the theorem, we need to define the Edgeworth expansion for the bootstrap statistic. To this end, we need the same arguments as for the original statistic. Namely, we need to check RC 1-6 for the bootstrap sample {g * T,t } T t=1 , conditionally on {g t (β 0 )} T t=1 , uniformly on a set whose probability tends to one. When p = r (exactly-identified moment conditions), RC 1 holds by the nature of the estimating equations, as E * g * T,t =ḡ T β = 0. In the case of over-identification, we have to recenter the bootstrap statistic so that RC 1 holds. For RC 2, we have E * g * For the unconditional moments, we get E E * g * s ] < ∞, from Assumption 2 on g t and by Minkowski inequality. Then, assuming E g t qs < ∞, As a consequence, E * g * T,t s is bounded with probability tending to one.
RC 3, 4, and 6, trivially hold by independence of the g * T,t under the bootstrap probability measure as in (14), when D t := σ g * T,t for t = 1, ..., T . Finally, as the bootstrap random variables are i.i.d, we can show that the standard Cramér condition holds asymptotically, instead of checking RC 5. Thus, the Cramér condition in Assumption 7 is sufficient to verify RC 5 for the bootstrap process, with the same sub-sigma-fields D t . Indeed, from Assumption 7, we have : We have to make sure that the same limit holds when g T,t (β) is evaluated atβ instead of β 0 , since we resample the {g T,t (β)} T t=1 in FMB. From Assumption 7, there exists a constant c > 0 such that verifies the necessary Cramér condition.
We construct the Edgeworth expansion Υ * S,T of the bootstrap statistic S * in the same way as the one for the original sample statisticŜ in (27) above. Thus, we need to define the bootstrap counterpart of the quantities (20)- (24): Then, we can define the bootstrap Edgeworth expansion via its Fourier transform: We have the same error bound as in Theorem 9 up to four aspects. First, Υ * S,T is a random measure because it is conditional to the original sample. Second, by independence of the bootstrap variables, we have σ * 2 T = τ * 2 1T , so the contribution of the |σ * 2 T − τ * 2 1T | term to the error disappears. Third, in contrary to (27) Altogether, it leads to: To derive an Edgeworth expansion for Q * , we proceed along the same lines as for the original sample statisticQ, working with univariate Edgeworth expansions for studentized linear combinations of the form T 1/2 υ ḡ * T .

A.3. Higher-order correctness of FMB
To prove part (i) of Theorem 4, we essentially need the convergence of the terms of order T −1/2 in Υ † S,T (as in (16)) and Υ * S,T (as in (28)) to the same quantities. Indeed, in view of Theorems 9 and 11, the condition plim T →∞ sup x∈R |p 1 (x, K 1 * S ) − p 1 (x, K 1 S )| = 0, or equivalently K 1 * S = K 1 S + o p (1), is sufficient to show higher-order refinements of FMB. We give the details of this convergence in the proof of Lemma 12 in the online Supplementary Material.
Lemma 12. Under Assumptions 1-7 and if E g t qs+δ < ∞, for δ > 0, s ≥ 8, andq ≥ 3: Collecting the error bounds of Theorem 9, Theorem 11, and Lemma 12, we use the triangular inequality to get: The proof of part (ii) of Theorem 4 is more direct, as the even polynomial p 1 in Theorems 9 and 11 deletes the T −1/2 term from the Edgeworth expansion ofQ. Hence, only the higher-order terms remain. As a consequence, we immediately verify that sup x∈R Then, we prove part (ii) making use of Theorems 9, 11, and the triangular inequality: for Dependent Data

Online Supplementary Material
Davide La Vecchia, Alban Moor and Olivier Scaillet This Supplementary Material contains the proofs of Lemma 10 and Lemma 12, whose proof requires two additional Lemmas. We provide also the key points of the derivation of the Edgeworth expansion for the quadratic statisticsQ and Q * , as well as the proof of Corollary 5. In the sequel of this Supplementary Material, we provide the detailed comparison of CPU time between FMB and MBB (Table 6), a   complement of Tables 2 and 3 with a smaller sample size T = 150 in Table 7 and a graphical illustration of the use ofQ as in Remark 2.

SM.1. Proof of Lemma 10
Without loss of generality, take D t := σ g ‡ t,0 . Then, let us define F m as the set of index k ∈ {−m, ..., m} such that g t+k is {D j : j = t}-measurable. We can now rewrite RC 5 as follows: . First, the expectation is taken with respect to the same conditional probability measure, say F (g t+ : ∈ F m ) for convenience of notation. Then, It is now clear that the weights w(t + ) are rescaling horizontally the same characteristic function. As w(t) = 0, ∀t, the complex modulus of the characteristic function never reaches unity (corresponding to the infimum δ = 0). Thus, the range of E[exp(iτ ∈F m g t+ )|D j : j = t] is the same than the range of As a consequence, the upper bound of their complex modulus is the same, ∀ τ > 0.
For the second part of the condition, it is immediate that lim inf T →∞ T −1 Var T t=1 w(t)g t > 0, if the same condition holds for {g t } T t=1 , as w(t) = 0, ∀t.

SM.2. Edgeworth expansion for the quadratic statisticsQ and Q *
For the completeness of the presentation, we supplement here the arguments of the paragraph before Theorem 9, following closely the lines of Chandra and Ghosh (1979) with their notation. The changes concern the order of approximations in the Edgeworth expansion and the specification of the statistic of interest.
For an explicit method to determine the polynomials, see Remark 2.6 of Chandra and Ghosh (1979).
The arguments are the same for the bootstrap quadratic statistic Q * , except that we have to replace the orders o(B T /T ) by o(1/T ).
SM.6. Proof of Corollary 5 From the Taylor expansion definingQ in (8) we haveQ(β 0 ) =Q(β 0 ) + R T , where P[R T > δ T ] = δ T for some positive sequence δ T = o(T −1 ). It follows that: Following the same argument, we have: As both P[Q(β 0 ) ≤ x] and P[Q(β 0 ) ≤ x] are bounded above and below up to the Cδ T = o(T −1 ) term, we get: by continuity of the probability distribution. Then, the result follows from an application of Theorem 4.  The number of bootstrap samples is R = 2500 for both methods. SM.8. Coverages for CR of ACD(1,0) and ACD(1,1) with T = 150.

SM.7. CPU time
The results we show in Table 7 must be interpreted with caution since more than 10% of the estimators did not converge for this sample size in the Monte Carlo simulations.  The true values of the unknown parameters of the ACD(1,0) and ACD(1,1) are ω = 1.5 and β1 = 0.25 and β2 = 0.25. SM.9. Graphical illustration of the use ofQ as in Remark 2.
To illustrate the behavior of the approximationQ, we consider the problem of estimating the location parameter β 0 of a Cauchy distribution, from an i.i.d. sample of size T = 30. This example does not correspond to our inferential setting of interest because of its lack of finite moments, but it has the advantage to provide a clear picture of the worst-case scenario, where the studentized score function S(β 0 ) fails to be one-to-one in β 0 , making the direct inversion technique impossible. Figure 2 shows the plot ofŜ,Q, and its approximationQ. The local maximum ofQ (16.91 for this sample) is the maximum level allowing to define a simply connected CI as level set. As discussed in Remark 2, this maximum increases proportionally to the sample size T, by definition ofQ as a cubic polynomial. In this example, we draw a line at 6.63 -the 99% percentile of a chisquare distribution with one degree of freedomto illustrate the applicability ofQ already with T = 30. The studentized scoreŜ(β0) of the Cauchy location parameter β0, its squareQ(β0) and the approximation byQ(β0). In this example, the bounds of the CI are be c1 = −0.81 and c2 = 0.65.
As an example of process where Assumptions 1-7 are satisfied, we take the OLS moment indicators for the autoregressive parameters of an AR(p) process Y t = p k=1 θ k Y t−k + e t = ∞ j=0 w j e t−j , with e t i.i.d.