Saddlepoint approximations for spatial panel data models

We develop new higher-order asymptotic techniques for the Gaussian maximum likelihood estimator in a spatial panel data model, with fixed effects, time-varying covariates, and spatially correlated errors. Our saddlepoint density and tail area approximation feature relative error of order $O(1/(n(T-1)))$ with $n$ being the cross-sectional dimension and $T$ the time-series dimension. The main theoretical tool is the tilted-Edgeworth technique in a non-identically distributed setting. The density approximation is always non-negative, does not need resampling, and is accurate in the tails. Monte Carlo experiments on density approximation and testing in the presence of nuisance parameters illustrate the good performance of our approximation over first-order asymptotics and Edgeworth expansions. An empirical application to the investment-saving relationship in OECD (Organisation for Economic Co-operation and Development) countries shows disagreement between testing results based on first-order asymptotics and saddlepoint techniques.


Introduction
Accounting for spatial dependence is of interest both from an applied and a theoretical point of view. Indeed, panel data with spatial cross-sectional interaction enable empirical researchers to take into account the temporal dimension and, at the same time, control for the spatial dependence. From a theoretical point of view, the special features of panel data with spatial effects present the challenge to develop new methodological tools.
Much of the machinery for conducting statistical inference on panel data models has been established under the simplifying assumption of cross-sectional independence. This assumption may be inadequate in many cases. For instance, correlation across spatial data comes typically from competition, spillovers, or aggregation. The presence of such a correlation might be anticipated in observable variables and/or in the unobserved disturbances in a statistical model and ignoring it can have adverse effects on routinely-applied inferential procedures. See, e.g., Gaetan and Guyon (2010), Rosenblatt (2012), Cressie (2015, Cressie and Wikle (2015), and recently Wikle et al. (2019) for book-length discussions in the statistical literature. In the econometric literature, see, e.g., Kapoor et al. (2007), , Robinson and Rossi (2014b), , and, for book-length presentations, Baltagi (2008, Ch. 13), Anselin (1988), and Kelejian and Piras (2017). Different nonparametric, semiparametric, and parametric approaches have been proposed to incorporate cross-sectional dependence in panel data models. The choice on the modeling approach depends on the time series (T ) and cross-sectional (n) dimensions. A nonparametric approach is only feasible when T is large relative to n. In other situations, typically when T is very small (e.g., T = 2) and n is large, semiparametric models have been employed, including time varying regressors (namely factor models) and spatial au-toregressive component, when information on spatial distances is available. Least squares and quasi maximum-likelihood estimator represent the main popular tools for estimation within this setting. When both T and n are small, the parametric approach is the sensible choice and (Gaussian) likelihood-based procedures are applied to define the maximum likelihood estimator (MLE).
There is a vast literature on the MLE for spatial autoregressive models, an early reference being Ord (1975). The derivation of the first-order asymptotics is available in the econometric literature; we refer to the seminal paper by Lee (2004). For the class of spatial autoregressive processes, with fixed effects, time-varying covariates, and spatially correlated errors that we consider in this paper, the first-order asymptotic results for the Gaussian MLE are available in , where the authors derive asymptotic approximations (the exact finite-sample distribution being intractable), when the cross-sectional dimension n is large and T is finite or large.
The main issue related to first-order asymptotic approximations is that, when n is not very large, such approximations may be unreliable: alternatives are highly recommended. Bao and Ullah (2007) provide analytic formulae for the second-order bias and mean squared error of the MLE for the spatial parameter λ, in a Gaussian model. Bao (2013) and Yang (2015) extend these approximations to include also exogenous explanatory variables, which remain valid also when the process is not Gaussian. Robinson and Rossi (2014a,b) develop Edgeworth-improved tests for no spatial correlation in spatial autoregressive models for pure cross-sectional data based on least squares estimation and Lagrange multiplier tests.
Moreover,  work on the concentrated likelihood and derive an Edgeworth expansion for MLE of λ in the setting of a first-order spatial autoregressive panel data model, with fixed effects and without covariates.  (see their §6) and  (see their §3.5) propose saddlepoint approximations for the profile likelihood estimator of λ.
additional numerical experiments.

Motivating example
We motivate our research by a Monte Carlo (MC) exercise illustrating the low accuracy of the routinely applied first-order asymptotics in the setting of spatial panel data model. We consider the model: where Y nt = (y 1t , y 2t , ..., y nt ), X nt is an n × k matrix of non stochastic time-varying regressors, c n0 is an n × 1 vector of fixed effects, and V nt = (v 1t , v 2t , .., v nt ) ′ are n × 1 vectors with v it ∼ N (0, σ 2 0 ), i.i.d. across i and t. The matrices W n and M n are weighting matrices describing the spatial dynamics. Following the literature, we label this model SARAR(1,1) to emphasize the spatial dependence in both the response variable Y nt and the error E nt .
As in the MC example in  p. 172, we generate samples from (2.1) using θ 0 = (β 0 , λ 0 , ρ 0 , σ 2 0 ) ′ = (1.0, 0.2, 0.5, 1) ′ , T = 5, and k = 4 covariates. The quantities X nt , c n0 and V nt are generated from independent standard normal distributions and, as it is customary in the econometric literature, we set W n = M n , where the off-diagonal elements are different from zero, while the diagonal elements are all zero. We consider two sample sizes: n = 24 (small sample) and n = 100 (moderate/large sample). The choice of n = 24 is related to the empirical data analysis that we consider in §7, where we apply the model in (2.1) to conduct inference on the investment-saving relation for the 24 OECD (Organisation for Economic Co-operation and Development) countries. Similar sample sizes arise in many real data analyses, where panel datasets contain a limited number of cross-sectional units, e.g., sampling can be expensive and/or time consuming, as it is typically the case in field studies.
As it is customary in the statistical/econometric software, we illustrate the inference issues related to the use of the first-order asymptotics by means of three different spatial weight matrices: Rook matrix, Queen matrix, and Queen matrix with torus. In Figure   1, we display the geometry of Y nt as implied by each considered spatial matrix: the plots highlight that different matrices imply different spatial relations. For instance, we see that the Rook matrix implies fewer links than the Queen matrix. Indeed, the Rook criterion defines neighbours by the existence of a common edge between two spatial units, whilst the Queen criterion is less rigid and defines neighbours as spatial units sharing an edge or a vertex. Besides, we may interpret {Y nt } as a n-dimensional random field on the network graph which describes the known underlying spatial structure. Then, W n represents the weighted adjacency matrix (in the spatial econometrics literature, W n is called contiguity matrix). In Figure 1, we display the geometry of a random field on a regular lattice (undirected graph). In the real data example of §7, we consider a random field over a manifold (a sphere), providing two additional examples for W n .

Rook
Queen Queen torus Figure 1: Different types of neighboring structure for Y nt , as implied by different types of W n matrix, for n = 24.
To illustrate the inferential issues entailed by the use of first-order asymptotics, for each type of W n , we generate a sample of n observations. Since c n0 creates an incidental parameter issue, we eliminate it by the standard differentiation procedure, and for each MC run we estimate the model parameter θ using the transformation approach of , with maximum likelihood estimation method; we refer to the R package spml for implementation details. We set the MC size to 5000.
We illustrate graphically the behavior of the first-order asymptotic theory in finite sample by comparing the distribution ofλ to the Gaussian asymptotic distribution (see §4.1 for details). Via QQ-plot analysis, Figure 2 shows that the Gaussian approximation can be either too thin or too thick in the tails with respect to the "exact" distribution.
For instance, when n = 24 and W n is rook, the Gaussian quantiles are larger than the "exact" ones in the left tail, while we observe the opposite phenomenon in the right tail.
Similar considerations hold for the other types of W n . The more complex is the geometry of W n (e.g., W n has Queen structure) the more pronounced are the departures from the Gaussian. For n = 100, and W n Rook, the MLE displays a distribution which is in line with the Gaussian one (see bottom left panel). However, when W n becomes more complex (e.g., Queen with torus), larger departures in the tails are still evident. In Appendix D.1, we illustrate that similar conclusions are available also for the simpler SAR(1) model: where θ 0 = (λ 0 , σ 2 0 ) ′ . More generally, unreported results suggest that, in the considered SARAR setting, the "exact" and the asymptotic distribution, as well as the saddlepoint approximation, agree for the considered types of W n , when n ≥ 250.

Model setting and estimation method
Let us consider a random field described by the SARAR(1,1) model in (2.1). We label by P θ 0 ∈ P, with θ 0 ∈ Θ ⊂ R d , the actual underlying distribution, which is characterized by θ 0 = (β 0 , λ 0 , ρ 0 , σ 2 0 ) ′ , the true parameter value. The matrix W n is an n × n nonstochastic spatial weight matrix that generates the spatial dependence on y it among cross sectional units. The matrix X nt is an n × k matrix of non stochastic time varying regressors, and c n0 is an n × 1 vector of fixed effects. Similarly, M n is an n × n spatial weight matrix for the disturbances -quite often W n = M n . Moreover, we define S n (λ) = I n − λW n , and analogously R n (ρ) = I n − ρM n .
The vector c n0 introduces an incidental parameter problem; see  and . To cope with this issue, we follow the standard approach, and we transform the model in order to derive consistent estimator for the model parameter θ = (β ′ , λ, ρ, σ 2 ) ′ and θ ∈ Θ ⊂ R d . To achieve the goal, we first eliminate the individual effects by the deviation from the time-mean operator J T = (I T − 1 T l T l ′ T ), where I T is the T ×T identity matrix, and l T = (1, ..., 1) ′ , namely the T ×1 vector of ones. Without creating linear dependence in the resulting disturbances, we adopt the transformation introduced by .
represents a matrix horizontal concatenation and F T,T −1 is the T × (T − 1) submatrix corresponding to the unit eigenvalues. Then, for any n × T matrix [Z n1 , ..., Z nT ], we define the Thus, we transform the model in (2.1) and we obtain: where E[·] represents the expectation taken w.r.t. P θ 0 . Now, the Gaussian assumption on the innovation terms implies that v * it are independent for all i and t; without this assumption, they would be simply uncorrelated. See  p. 167. Thus, defining ζ = (β ′ , λ, ρ) ′ , the log-likelihood is: As remarked in , the function L n,T has a conditional likelihood interpretation: it is the likelihood conditional on the time average T t=1 Y nt /T , which is a sufficient statistic for c n0 , under normality. We rewrite ℓ n,T (θ) in terms of a quadratic form inṼ nt (ζ) as: The MLEθ n,T for θ is an M-estimator obtained by solvingθ n,T = arg max θ∈Θ ℓ n,T (θ).
It implies the system of estimating equations: where ψ nt is the likelihood score function

Methodology
We assume n ≫ T , so we deal with so-called micro panels: in the econometric literature, this type of data typically involve annual records covering a short time span for each individual. Within this setting for T being fixed, the standard asymptotic arguments rely crucially on the number n of individuals tending to infinity; see . In contrast, in our development, we consider small sample cross-sectional asymptotics (Field and Ronchetti (1990)), and we still leave T fixed (possibly small). However, we will keep T in the notation of normalizing factors to demonstrate the improved rate of convergence that would result if T → ∞ or it is large. The derivation of our higherorder techniques relies on three steps: (i) defining a second-order asymptotic (von Mises) expansion for the MLE, see §4.1; (ii) identifying the corresponding U-statistic, see §4.2; (iii) deriving the Edgeworth expansion for the U-statistic as in  and deriving the saddlepoint density by means of the tilted-Edgeworth technique, see §4.3 and §4.4. Similar approaches are available in the standard setting of i.i.d. random variables in , , and .

4.1
The M-functional related to the MLE and its first-order asymptotics Let us first define the M-functional related to the MLE. To this end, we remark that the likelihood score function in (3.5) is a vector in R d , and each l-th element of this vector, for l = 1, ..., d, is a sum of n terms. In what follows, for i = 1, ..., n, we denote by ψ i,t,l (θ) the i-th term, at time t, of this sum for the l-th component of the score.
To specify ψ i,t,l (θ), we set R n (ρ) = r ′ 1 (ρ), r ′ 2 (ρ), · · · , r ′ n (ρ) ′ , X nt = X nt,1 ,X nt,2 , · · · ,X nt,k , and h i (ρ) are the i th row of R n (ρ) and H n (ρ), g ii and h ii are i th element of the diagonal of G n (λ) and H n (ρ), respectively. Then, from (3.5), it follows (4.1) Thus, for every t = 1, 2, ..., T , we have ψ nt , and, from (3.4), it follows that the MLE is the solution to The M-functional ϑ related to the MLE is implicitly defined as the unique functional root of: or equivalently via the asymptotic maximization θ 0 = arg max θ∈Θ E[ℓ n,T (θ 0 )]; see e.g., Lee (2004). In what follows, we write θ 0 = ϑ(P θ 0 ) to emphasize the dependence of the functional on the measure P θ 0 . The finite sample version of the M-functional in (4.3) is the M-estimator defined in (4.2), or equivalently via the finite sample maximization θ n,T = arg max θ∈Θ ℓ n,T (θ). In what follows, we writeθ n,T = ϑ(P n,T ), where P n,T is the measure associated to the n-dimensional sample. We can check the uniqueness of the Mestimator on a case-by-case basis, using Assumption A (see below) and working on the Gaussian log-likelihood. For instance, in the case of the SAR model, we can compute the second derivative of ℓ n,T w.r.t. λ and check that ℓ n,T is a concave function, admitting a unique maximizer. Alternatively, we can solve the estimating equations implied by firstorder conditions related to ℓ n,T resorting on a one-step procedure and using for instance the GMM estimator (see  and reference therein) as a preliminary estimator; for a book-length description of one-step procedure; see, e.g., Van der Vaart (1998) Ch. 5.
In what follows, we set m := n(T − 1), with m → ∞, as n → ∞. Then, we introduce Assumption A.
(i) The elements ω n,ij of W n and the elements m n,ij of M n in (2.1) are at most of orderh −1 n , denoted by O(1/h n ), uniformly in all i,j, where the rate sequence {h n } is bounded, andh n is bounded away from zero for all n. As a normalization, we have ω n,ii = m n,ii = 0, for all i.
(ii) n diverges, while T ≥ 2 and it is finite.
(iv) Denote C n =G n − n −1 tr(G n )I n and D n = H n − n −1 tr(H n )I n whereG n = R n G n R −1 n and H n = M n R −1 n . Then C s n = C n + C ′ n and D s n = D n + D ′ n . The limit of n −2 [tr(C s n C s n )tr(D s n D s n ) − tr 2 (C s n D s n )] is strictly positive as n → ∞.
Assumptions A(i) characterizes the behavior of W n and M n in terms of n, and W n and M n are row-normalized. It means ω n,ij = d ij / n j=1 d ij , where d ij is the spatial distance of the i−th and the j−th units in some (characteristic) space. For each i, the weight ω n,ij defines an average of neighboring values. In what follows, we consider spatial weight matrices (like, e.g., Rook and Queen) such that n j=1 d ij = O(h n ) uniformly in i and the row-normalized weight matrix satisfies Assumption A(i); see e.g. Lee (2004). For instance, W n as Rook creates a square tessellation withh n = 4 for the inner fields on the chessboard, andh n = 2 andh n = 3 for the corner and border fields, respectively.
Assumption A(ii) defines the asymptotic scheme of our theoretical development, in which we consider n cross-sectional units and we leave T fixed. Assumption A(iii) refers to , who develop the first-order asymptotic theory. All W n , M n , S −1 n (λ), R −1 n (ρ) are uniformly bounded by Assumption A(iv) which guarantees the convergence of the asymptotic variance, see below. Assumption A(iv) states the identification conditions of the model and the conditions for the nonsingularity of the limit of the information matrix. In particular, it implies that the (d × d)-matrix is non-singular. Under Assumption A(i)-A(iv), Theorem 1 part(ii) in  shows that lim n→∞θ n,T = θ 0 . Furthermore, Theorem 2 point (ii) in  implies, as n → ∞, thatθ n,T satisfies √ m(θ n,T − θ 0 ) D → N 0, Σ −1 0,T , and Σ 0,T = plim n→∞ Σ 0,n,T . The operator plim stands for the limit in probability and the expression of Σ 0,n,T is available in the online Supplementary Material (see Appendix B). The first-order asymptotics is obtained letting n → ∞; there is no need for T → ∞ to obtain a consistent and asymptotically normal M-estimator.

Second-order von Mises expansion
To define a higher-order density approximation to the finite-sample density of the MLE, we need to derive its higher-order asymptotic expansion, making use of Assumption B.
Then, we state the following Lemma 1. Let the MLE be defined as in (3.4). Under Assumptions A-B, the following expansion holds: and and M i,T (ψ, P θ 0 ) is defined by (4.4).
In (4.5), we interpret the quantities IF i,T (ψ, P θ 0 ), the first-order von Mises kernel, and ϕ i,j,T (ψ, P θ 0 ), the second-order von Mises kernel, as functional derivatives of the Mfunctional related to the MLE; see Fernholz (2001). Specifically, the first term, of order m −1 ∝ n −1 , is the Influence Function (IF) and represents the standard tool applied to derive the first-order (Gaussian) asymptotic theory of the MLE; see, e.g.,  and Baltagi (2008) for a book-length introduction. The second term in (4.5), of order m −2 ∝ n −2 , plays a pivotal role in our derivation of higher-order approximation.

Approximation via U -statistic
The result of Lemma 1 together with the chain rule define a second-order asymptotic expansion for a real-valued function of the MLE, such as a component of ϑ(P n,T ) or a linear contrast. In Lemma 2, we show that we can write the asymptotic expansion in terms of a U-statistic of order two. To this end, we introduce the following assumption.

Assumption C.
Let q be a function from R d to R, which has continuous and nonzero gradient at θ = θ 0 and continuous second derivative at θ = θ 0 .

Then, we have
Lemma 2. Under Assumptions A-C, the following expansion holds: (4.11) The function q may select, e.g., a single component of the vector θ 0 . In many empirical applications, the most interesting parameter is the spatial correlation coefficient λ 0 , and the null hypothesis is zero correlation versus the alternative hypothesis of positive spatial correlation-the aim being to check whether there is a contagion effect.

Higher-order asymptotics
Making use of Lemma 1 and Lemma 2, we derive the Edgeworth and the saddlepoint approximation to the distribution of a real-valued function q of the MLE.
Let f n,T (z) be the true density of q[ϑ(P n,T )] − q[ϑ(P θ 0 )] at the point z ∈ A, where A is a compact subset of R d . Our derivation of the saddlepoint density approximation to f n,T (z) is based on the tilted-Edgeworth expansion for U-statistics of order two. With this regard, a remark is in order. From (4.1), we see that the terms in the random vector ψ nt (θ 0 ) depend on the rows of the weight matrix W n (ρ) and M n (λ). As a consequence, these terms are independent but not identically distributed random variables, and we need to derive the Edgeworth expansion for our U-statistic taking into account this aspect. To this end, we approximate the cumulant generating function (c.g.f.) of our U-statistic by summing (in i and j) the (approximate) c.g.f. of each h i,j,T kernel. This is an extension of the derivation by  for i.i.d. random variables. To elaborate further, we introduce Assumption D.
Suppose that there exist positive numbers δ, δ 1 , C and positive and continuous functions χ j : (0, ∞) → (0, ∞), j = 1, 2, satisfying lim z→∞ χ 1 (z) = 0, lim z→∞ χ 2 (z) ≥ δ 1 > 0, and a real number α such that α ≥ 2 + δ > 2, A few comments are in order. Assumptions D(i)-(iii) are similar to the technical assumptions in  p. 1465 and 1477. However, there are some differences between our assumptions and theirs. Indeed, to take into account the non identical distribution of ψ i,t and ψ j,t , for i = j, we consider the first-and second-order von Mises kernels for each i (as in D(i)-(iii)). It is different from : compare, e.g., our D(ii) to their Eq. (1.17). D(iv) is not considered in : it is a peculiar assumption needed for our higher-order asymptotics (the technical aspects are available in Lemma A.1 and its proof in Appendix A). In Appendix D.2, we illustrate that, in the case of the SAR(1) model, the validity of D(iv) is related to more primitive expressions involving the entries of (some powers of) W n . For other models, one should derive such primitive expressions on a case-by-case basis. For the sake of generality, here we provide an intuition on D(iv). Let us consider two different locations i and j. From (4.4), we see that D(iv) imposes a structure on the information available at different locations. Indeed, M i,T (ψ, P θ 0 ) and M j,T (ψ, P θ 0 ) contribute to the asymptotic variance of the MLE. Since M i,T (ψ, P θ 0 ) is related to the information available at the i-th location, D(iv) essentially assumes that there exists an informative content which is common to location i and j, whilst the (Frobenious norm of the) information content specific to each location is of order O(n −1 ). In addition, we can get the saddlepoint density approximation by exponentially tilting the Edgeworth expansion.

Proposition 4. Under Assumption A-D, the saddlepoint density approximation to the
with relative error of order O(m −1 ), ν := ν(z) is the saddlepoint defined bỹ K ′ n,T (ν) = z, (4.14) the functionK n,T is the approximate c.g.f. of √ n(q[ϑ(P n,T )] − q[ϑ(P θ 0 )]), as defined in (A.42), whileK ′ n,T andK ′′ n,T represent the first and second derivative ofK n,T , respectively. Moreover, The proofs of these Propositions are available in Appendix A. They rely on a argument similar to the one applied in the proof of  for the derivation of a saddlepoint density approximation of multivariate M-estimators, and in  for general statistics. Following Durbin (1980), we can further normalize p n,T to obtain a proper density by dividing the right hand side of (4.13) by its integral with respect to z.
This normalization typically improves even further the accuracy of the approximation.

Links with the econometric literature
The expansions in Proposition 3 and Proposition 4 are connected with the results on higherorder expansions available in the spatial econometric literature, as cited in §1. However, some key differences are worth a mention.
(i) The Edgeworth expansion in  is for the concentrated MLE of λ and it is based on a higher-order Taylor expansion of the concentrated likelihood score; see also Rossi (2014a,b, 2015), , and . In contrast, our method is based on a von Mises expansion of the MLE functional of the whole model parameter and we resort on a marginalization procedure to obtain the saddlepoint density approximation of the parameter(s) of interest.
Therefore, Lemma 1 and Lemma 2 give generality and flexibility to our approach: not only we may focus on λ, but also, e.g., on ρ (which contains information on the spatial dependence of the innovation terms) and/or on β (which convey information on the significance of the time-varying covariates).
(ii) Our saddlepoint approximation is more general than the Edgeworth-based approximations available in the econometric literature, since we work with a larger class of models, which includes the model in  as a special case.
(iii) Although the inference (e.g., testing) derived using the Edgeworth expansion improves on the standard first-order asymptotics, it is well-known (see, e.g., Field and Ronchetti (1990)) that, in general, this technique provides a good approximation in the center of the distribution, but can be inaccurate in the tails, where the Edgeworth expansion can even become negative. It can lead to inaccurate approximations. Our saddlepoint approximation is a density-like object and is always nonnegative.
(iv) Our saddlepoint approximation yields a tail-area approximation via a Lugannani-Rice type formula. A similar result is not available for the Edgeworth expansion of the concentrated MLE derived in . Recently,  studied the adjusted profile likelihood estimation method and obtained a result similar to our tail-area approximation. Their formula is derived for the spatial autoregressive model with covariates. However, they do not prove the higher-order properties of their approximation. In Proposition 3, we prove that our saddlepoint density approximation features relative error of order O(1/(n(T − 1))). This has to be contrasted with the extant Edgeworth expansion, which entails an absolute error of lower order-more precisely, the error order is o((nT ) −1/2 ), when the entries of the spatial matrix are O(1); see Eq. (2.15) in . Achieving a small relative error is appealing in tail areas where the probabilities are small.
(v) In the comparison with the bootstrap, our methodology does not need resampling.
Moreover, it does neither require bias correction, nor any studentization.

Computational aspects
Most of the quantities related to the saddlepoint density approximation p n,T and the tail area in (4.15) are available in closed-form. In Appendix C.1, we provide an algorithm (see Algorithm 1) in which we itemize the main computational steps needed to implement the saddlepoint tail area approximation, for a given transformation q and for a given reference parameter θ 0 -it is, e.g., the parameter characterizing the null hypothesis in a simple hypothesis testing, where the tail area probability is an approximate p-value.

Comparisons with other approximations and testing in the presence of nuisance parameters
We compare the performance of our saddlepoint approximations to other routinely-applied asymptotic techniques. To start with, we consider the SAR(1) model, where λ is the only unknown parameter. Then, we move to the SARAR(1,1) model, where we illustrate how to take care of nuisance parameters. We use the same setting as in §2; we refer to the online Supplementary Material (Appendix D) for more details and for additional results.

Comparisons with other asymptotic techniques
Saddlepoint vs first-order asymptotics. For the SAR(1) model, we analyse the behaviour of the MLE of λ 0 , whose PP-plots are available in Figure 3. For each type of W n , for n = 24 and n = 100, the plots show that the saddlepoint approximation is closer to the "exact" probability than the first-order asymptotics approximation. For W n Rook, the saddlepoint approximation improves on the routinely-applied first-order asymptotics. In Figure 3, the accuracy gains are evident also for W n Queen and Queen with torus, where the first-order asymptotic theory displays large errors essentially over the whole support (specially in the tails). On the contrary, the saddlepoint approximation is close to the 45-degree line.

Saddlepoint vs Edgeworth expansion (testing simple hypotheses).
The Edgeworth expansion derived in Proposition 3 represents the natural alternative to the saddlepoint approximation since it is fully analytic. To gain insights into the different behavior of the saddlepoint and Edgeworth approximations, we investigate the size of a hypothesis test based on the approximations. We set n = 24 and we assume that σ 2 is known and equal to one. We consider the simple null hypothesis H 0 : λ 0 = 0 for a one-sided test of zero against positive values of spatial correlation. We use 25,000 replications ofλ n,T to get the empirical estimateF 0 of the c.d.f. F 0 of the estimator under the null hypothesis. We use the generic notation G for the c.d.f. of one of the Edgeworth, or saddlepoint approximations, under the null hypothesis. For the sake of completeness, we also display the results for the Gaussian (first-order) approximation. The empirical rejection probabilitiesα = 1 −F 0 (G −1 (1 − α)) are shown in Figure 4 for nominal size α ranging from 1% to 10%, and correspond to an estimated size. We have overrejection when we are above the 45-degree line. We observe strong size distortions for the asymptotic and Edgeworth approximations as expected from the previous results. The saddlepoint approximation exhibits only mild size distortions. For example, we get an estimated sizeα of 11.72%, 7.36%, 5.70%, for the Normal, Edgeworth, and saddlepoint approximations, for a nominal size of 5%. the variability entailed by the bootstrap, we display the first and third quartile curves (twodash lines) and the median functional curve (dotted line with crosses); for details about functional boxplots, we refer to  and to R routine fbplot. We notice that, while the bootstrap median functional curve (representing a typical bootstrap density approximation) is close to the actual density (as represented by the histogram), the range between the quartile curves illustrates that the bootstrap approximation has a variability. Clearly, the variability depends on B: the larger is B, the smaller is the variability. However, larger values of B entail bigger computational costs: when B = 499, the bootstrap is almost as fast as the saddlepoint density approximation (computation time about 7 minutes, on a 2.3 GHz Intel Core i5 processor), but for B = 999, it is three times slower. We refer to Appendix D.5 for additional numerical results.
6.2 Testing in the presence of nuisance parameters 6.2.1 Saddlepoint test for composite hypotheses Our saddlepoint density and/or tail approximations are helpful for testing simple hypotheses about θ 0 ; see §6.1. Another interesting case suggested by the Associate Editor and an anonymous referee that has a strong practical relevance is related to testing a composite null hypothesis. It is a problem which is different from the one considered so far in the paper, because it raises the issue of dealing with nuisance parameters.
To tackle this problem, several possibilities are available. For instance, we may fix the nuisance parameters at the MLE estimates. Alternatively, we may consider to use the (re-centered) profile estimators, as suggested, e.g., in  and . Combined with the saddlepoint density in (4.13), these techniques yield a ready solution to the nuisance parameter problem. In our numerical experience (see Appendix D.6 for an experiment about the SAR(1)), these solutions may preserve reasonable accuracy in some cases. Nevertheless, the main theoretical drawback related to the use of MLE values for the nuisance parameter(s) is that it would not guarantee that the second-order properties derived in the previous sections still hold. To cope with this issue, we propose to build on Robinson et al. (2003), who derive a saddlepoint test statistic which takes into account explicitly the nuisance parameters, while preserving relative error in normal region. We feel this test statistic represents the natural candidate within our setting: it shares the same spirit as our saddlepoint density approximation and it is derived going through steps which are similar to ours. The paper by Robinson et al. (2003) defines the test statistic in the i.i.d. setting, while Lô and Ronchetti (2009) and Czellar and Ronchetti (2010) extend it to the non-i.i.d. data setting.
Let us consider a SARAR model whose parameter is θ = (θ 10 , θ 2 ) ′ , where θ 10 is specified by the null composite hypothesis: typically, the null concerns λ only, while θ 2 contains all the nuisance parameters. More specifically, the parameter is θ = (λ, β, ρ, σ 2 ) ′ and the general function q(θ) used in the previous sections is simply q(θ) = λ. Thus, we have the composite hypothesis: where θ = (λ, θ 2 ) ′ , with θ 10 = λ 0 and θ 2 = (β, ρ, σ 2 ) ′ . Then, we define the test statistic The function K ψ (ν, (λ, θ 2 )) is the c.g.f. of the estimating function: and ψ i,t is as in (4.1). The c.g.f. K ψ has a role analogous to the one of the c.g.f. of the U-statistic, that we derived in §4. We highlight that the expected value in (6.3) is taken w.r.t. the probability P (λ 0 ,θ 2 ) , where λ 0 is specified by the null, while the nuisance parameters are not fixed: the infimum over θ 2 takes care of the nuisance parameters. In our inference procedure, we have thatθ n,T = (λ,θ 2 ) ′ is the Under the null hypothesis, the test statistic SAD n (λ) is asymptotically χ 2 1 distributed with a relative error of order O(m −1 ) in the normal region.

Implementation aspects
To implement the test (6.2) for the problem (6.1), in Appendix C.2, we propose an algorithm (see Algorithm 2) and itemize the main steps needed to compute the test statistic.

Numerical results
Let us work with a SARAR(1,1) model, having no covariates and known variance σ 2 = 1 and n = 24. It implies that θ = (λ, ρ) ′ and we consider the problem in (6.1), with ρ being the nuisance parameter. We set three different values ρ = 0.25, 0.5, 0.75 to analyze numerically the impact that the spatial dependence in the innovation term has on SAD n . We study the behaviour of the Wald test, as obtained using the first-order asymptotic theory and making use of the expression of the asymptotic variance as available in Appendix B. We compare the Wald test to SAD n -to implement (6.2) we make use of the R routine nlm. We consider two types of spatial matrix W n , the Rook and the Queen, and we set W n ≡ M n . Both test statistics are asymptotically χ 2 1 distributed under the null hypothesis. To compare them in small samples, we first obtain the 95th and 97.5th quantile of each test statistic; then we compute the corresponding probability as obtained using the χ 2 1 . We display the results in Table 1. We see that the Wald test has severe size distortion. For instance, for ρ = 0.25, we observe a relative error of about 30%, for the quantile of 95%, when W n is Rook, while the saddlepoint test entails a relative error of about 1.8%. Looking at the performance of SAD n , we see that it is uniformly more accurate than the Wald test: considering all cases, we observe a maximal relative error of about 2%, for the quantile of 95%, when ρ = 0.75 and W n is Queen; in the same setting, the Wald test entails a relative error of about 24%.
Moreover, the size is fairly constant for the different values of ρ: it illustrates that the test statistic takes care correctly of the nuisance parameter. 7 Empirical application Feldstein and Horioka (1980) document empirically that domestic saving rate in a country has a positive correlation with the domestic investment rate. It contrasts with the understanding that, if capital is perfectly mobile between countries, most of any incremental saving is invested to get the highest return regardless of any locations, and that such correlation should actually vanish. Debarsy and Ertur (2010) suggest to use spatial modeling since several papers challenge these findings but under the strong assumption that investment rates are independent across countries. Such an assumption might influence the conclusions of applied statial economics.
In this empirical exercise, we investigate the presence of spatial autocorrelation in the investment-saving relationship. We consider investment and saving rates for 24 OECD countries between 1960 and 2000 (41 years). Because of macroeconomic reasons (deregulating financial markets), we divide the whole period into shorter sub-periods: 1960-1970, 1971-1985 and 1986-2000, as advocated by Debarsy and Ertur (2010). Since the crosssectional size is only n = 24, the asymptotics may suffer from size distortion as documented in §6. Therefore, we resort on a saddlepoint test to investigate whether or not there are inferential issues (coming from finite sample distortions and nuisance parameters) in the use of the first-order asymptotic theory. In line with the econometric literature, we specify the following SARAR(1,1) model for the three sub-periods: where Inv nt is the n × 1 vector of investment rates for all countries and Sav nt is the n × 1 vector of saving rates. Each element v it in V nt is i.i.d across i and t, having Gaussian distribution with zero mean and variance σ 2 0 . c n0 is the vector of fixed effects. We assume W n = M n and adopt two different weight matrices as in Debarsy and Ertur (2010). The first one is based on the inverse distance. Each element ω ij in W n is d −1 ij , where d ij is the arc distance between capitals of countries i and j. The second is the binary Otherwise, ω ij = 0, where d i is the 7 th order smallest arc-distance between countries i and j such that each country i has exactly 7 neighbors. Both weight matrices are row-normalized.
We estimate the parameters using the MLE described in §3. Table 2 gathers the point estimates (and their standard errors) that agree with the magnitudes found by Debarsy and Ertur (2010). To investigate the validity of the model (7.1), we test for spatial dependence, working on λ = 0 and/or ρ = 0. Specifically, our aim is to detect if and in which period(s) the inference yielded by the first-order asymptotic theory differs from the inference obtained using our saddlepoint test. With this goal, in Table 3 we provide the p-values for testing (at the 5% level) three different composite hypotheses: in the first row, we consider the problem of testing for λ = 0; in the second row, we test for ρ = 0; in the third row, we test for λ = ρ = 0. To perform the tests, we consider the routinely-applied Wald test (as obtained using the first-order asymptotic approximation, ASY) and the saddlepoint test (SAD n ). In each testing procedure, we treat the parameters not specified by the null hypothesis as nuisance parameters. In the SAD n test, we take care of the nuisance as indicated in (6.2), while in the ASY test we simply plug-in the MLE estimates for the nuisance parameters-as it is customary in the econometric software based on the first-order asymptotic theory.
In the period 60-70, both ASY and SAD n yield the same inference, for both the considered types of weight matrix, with conventional significance levels. The other sub-periods display some discrepancies between the inference obtained via ASY and via SAD n . We do not want to discuss all discrepancies but only briefly comment on some key differences-we highlights the corresponding values in Table 3. In the sub-period 71-85 under 7NN W n , the saddlepoint test finds no evidence against no spatial dependence in the investing rates across countries, and vice-versa for the asymptotic approximation. Moreover, the ASY test does not find evidence against ρ = 0, while the SAD n test rejects this composite hypothesis. Thus, the SAD n test indicates a spillover through the contemporary shocks between countries. This spillover goes through the innovations, i.e., through the unexpected part of the model dynamics, a finding not documentable when one relies on the first-order asymptotic theory. This results suggests that a test statistic designed to perform well in small samples and in the presence of nuisance parameters is able to document spatial dependence in the disturbances E nt . Some differences are detectable also in the sub-period 86-00, under the inverse distance matrix.

SUPPLEMENTARY MATERIAL
The online supplementary material includes proofs, lengthy analytical derivations and additional numerical results for the SAR(1) model. All the codes and data are available in our Github repository.
The expression of IF i,T (ψ, P θ 0 ) and of ϕ i,j,T (ψ, P θ 0 ) for general M-estimators are available in . Specifically, for the i-th observation and for the whole time span, IF i,T (ψ, P θ 0 ) is the Influence Function (IF) of the MLE, having likelihood score Withers (1983), it follows that the second term of the von Mises expansion in (A.1) is given by (4.7) and (4.8).
Lemma A.1. Under Assumptions A-D, for all i, j and i = j, Proof. From the definition of M i,T (ψ, P θ 0 ) in (4.4), we have: By a Taylor expansion in (A.7), we get From Assumption A, we know that M −1 j,T (ψ, P θ 0 ) = O(1). Under Assumption D(iv), we get By (A.7), (A.8) and (A.9), we finally find Thus, The mean, variance, the standardized third and fourth cumulant of U n,T (defined by (A.4)) are given by the following expressions.

Now we can prove Proposition
where ι 2 = −1. Making use of κ as the approximate c.f.. To prove (4.12) in Proposition 3, we use Esseen smoothing lemma as in Feller (1971) and show that there exist sequences {Z n } and {ε ′ n } such that n −1 Z n → ∞, ε ′ n → 0, and Zn −Zn We proceed along the same lines as in . We work on a compact subset of R and we consider the c.f. for small |z|. Then, we prove Lemma A.3, which essentially shows the validity of the Edgeworth by means of the Esseen lemma. With this regard, we flag that we need to prove the Esseen lemma within our setting (we are dealing with independent but not identically distributed random variables) and we cannot invoke directly the results in the the paper by . To this end, we prove a new result similar to the one in Lemma 2.1 of the last paper, adapting their proof to our context-see Lemma A.3 below. Finally, the application of the derived results concludes the proof.
Proof. For the sake of readability, we split the proof in five steps. At the beginning of each step, we explain the goal of the derivation.
Step 1. We approximate the characteristic function (c.f.) ofθ n,T via the c.f. of U n,T , up to the suitable order. This yields (A.33).
Step 3. We need to derive the expansions (up to a suitable order) of the products of Ψ g,i,T (z) represented as the four curly brackets in (A.35), to have similar expressions as the terms in Ψ * n,T (z); see (A.27). To achieve it, we introduce the approximate c.f. of σ −1 g n i=1 g i,T (ψ, P θ 0 ) in (A.37) and find a connection to the c.f. of σ −1 n,T n i=1 g i,T (ψ, P θ 0 ) so that we can get the expressions of the four curly brackets in (A.35).

A.4 Proof of Proposition 4
Proof. We derive (4.13) by the tilted-Edgeworth technique; see e.g.  for a book-length presentation. Our proof follows from standard arguments, like e.g.
those in , , and . The main difference between the available proofs and ours is related to the fact that we need to use our approximate c.g.f., as obtained via the ( see also , p. 677. Finally, a straightforward application of Lugannani-Rice formula yields (4.15); see Lugannani and Rice (1980) and .  have already provided a few calculations for the first-order asymptotics.

B The first and second derivatives of the log-likelihood
To go further, our online materials give additional and more explicit mathematical expres-sions for the higher-order terms needed for the saddlepoint approximation.
We recall the following notations, which are frequently used: The first derivative of the log-likelihood

B.1.1 Common terms
First, consider the following elements which are common to many partial derivatives that we are going to compute. To this end, we set ξ = (β ′ , λ, ρ) ′ and we compute: and making use of (B.2), we have and B.1.2 Component-wise calculation of the log-likelihood where ψ((Y nt , X nt ), θ n,T ) represents the likelihood score function and its expression is

B.2 The second derivative the log-likelihood
• The first row of • The second row of The matrix ∂ 2 ℓ n,T (θ 0 ) ∂θ∂θ ′ is symmetric. The first element is the transpose of the second one in the first row. So We could get the first two elements from the transpose of the third ones in the first two rows. So we only need to calculate the following two derivatives.
. We only need to calculate the derivative in respect with σ 2 .
We report the matrix
So we prove the first column of Σ 0,n,T , that is: The second column of Σ 0,n,T is: The third column of Σ 0,n,T is: The fourth column of Σ 0,n,T is: Thus, we have: Most of the quantities related to the saddlepoint density approximation p n,T and the tail area in (4.15) are available in closed-form. Here, we provide an algorithm (see Algorithm 1) in which we itemize the main computational steps needed to implement the saddlepoint tail area approximation, for a given transformation q and for a given reference parameter θ 0it is, e.g., the parameter characterizing the null hypothesis in a simple hypothesis testing, where the tail area probability is an approximate p-value.

Algorithm: Density and tail-area saddlepoint approximation
Input: a sample of Y nt and X nt ; the reference parameter θ 0 . Output: density and tail area saddlepoint approximation. 1: Given a sample of Y nt and X nt , first compute the transformed valuesỸ nt andX nt , as in (3.3) 2: Compute µ n,T , σ 2 g , κ n,T and κ (4) n,T , using the formulae available in Appendix A.3 and A.4 3: Combine the expressions of the approximate cumulants into the analytical expression of the approximate c.g.f.K n,T (ν) given by (A.42) in Appendix A.4 4: Define a grid Z of points {z j , j = 1, ..., j max }. Select the min (i.e. z 1 ) and max (i.e. z jmax ) value of this grid according to the min and max values taken by the parameter of interest 5: for j ← 1 to j max do Compute the saddlepoint density approximation p n,T (z j ) as in (4.13) end 6: Compute the tail area using formula (4.15) or by numerical integration of the density p n,T Some additional comments are in order.
Step 2 requires the approximation of some expected values, like, e.g., E g 2 i,T (ψ, P θ 0 ) , for g i,T (ψ, P θ 0 ) as in (4.10), which numerical methods can provide. For instance, we can rely on numerical integration with respect to the underlying Gaussian distribution P θ 0 , or on the Laplace method, or on the approximation of the integrals by Riemann sums, using simulated data. In our experience, the latter approximation represents a good compromise between accuracy, simplicity to code, and computational burden. Moreover for Step 5, one needs to solve Eq. (4.14) for each grid point. Some well-known methods are available. For instance, Kolassa (2006) p. 84 suggests the use of Newton-Raphson derivative-based methods. Specifically, for a given starting value ν 0 (which is an approximate solution to the saddlepoint equation), a firstorder Taylor expansion of the saddlepoint equation yieldsK ′ n,T (ν 0 )+K ′′ n,T (ν 0 )(ν −ν 0 ) ≈ z, whose solution is ν = ν 0 + (z −K ′ n,T (ν 0 ))/K ′′ n,T (ν 0 ). We apply this solution to update the approximate solution ν 0 , yielding a new approximation ν to the saddlepoint. We iterate the procedure until the approximate solution is accurate enough-e.g., we can set a tolerance value (say, tol) and iterate the procedure till |z −K ′ n,T (ν)| < tol. An alternative option is the secant method; see  p. 86. As noticed by , due to the approximate nature of the c.g.f. in (A.42), the saddlepoint equation can admit multiple solutions in some areas of the density. To solve this problem, one may use the modified c.g.f. proposed by Wang (1992).

C.2 Algorithm 2
To implement the test (6.2) for the problem (6.1), we propose the following algorithm: Algorithm: Saddlepoint test in the presence of nuisance parameters Input: a sample of Y nt and X nt ; the testing problem in (6.1).

As far as
Step 3 is concerned, we suggest to apply a standard MC integration to approximate numerically the c.g.f. K ψ -this is in line with our suggestion for Algorithm . Specifically, for each location i, we may approximate K (T ) ψ i (ν, λ, θ 2 ) = ln E P (λ 0 ,θ 2 ) [exp{ν T ψ (T ) i (λ, θ 2 )}] by simulating from the corresponding SARAR model under the null. To this end, for a userspecified MC size (MC.size), in each j-th MC run, we simulate the variables {Y (j) nt } t=1,··· ,T , under the probability P (λ 0 ,θ 2 ) . We remark that P (λ 0 ,θ 2 ) is given by the composite null hypothesis, where the nuisance parameters are not specified. Thus, θ 2 is not fixed in the MC runs: the numerical optimization needed to compute the infimum over θ 2 takes care of that aspect. Then, we set i,j is the estimating function at location i as evaluated in the j-th MC run. Finally, we build the test using the MC approximated c.g.f. K ψ (ν,λ, θ 2 ) = −n −1 n i=1 K ψ (T ) i (ν,λ, θ 2 ).
D Additional results for the SAR(1) model D.1 First-order asymptotics As it is customary in the statistical/econometric software, we consider three different spatial weight matrices: Rook matrix, Queen matrix, and Queen matrix with torus. In Figure 1, we display the geometry of Y nt as implied by each considered spatial matrix for n = 100: the plots highlight that different matrices imply different spatial relations. For instance, we see that the Rook matrix implies fewer links than the Queen matrix. Indeed, the Rook criterion defines neighbours by the existence of a common edge between two spatial units, whilst the Queen criterion is less rigid and defines neighbours as spatial units sharing an edge or a vertex. Besides, we may interpret {Y nt } as a n-dimensional random field on the network graph which describes the known underlying spatial structure. Then, W n represents the weighted adjacency matrix (in the spatial econometrics literature, W n is called contiguity matrix). In Figure 1, we display the geometry of a random field on a regular lattice (undirected graph).
We complement the motivating example of our research illustrating the low accuracy of the routinely applied first-order asymptotics in the setting of the spatial autoregressive process of order one, henceforth SAR (1): spatial autoregressive error (SARAR) of . Since c n0 creates an incidental parameter issue, we eliminate it by the standard differentiation procedure. Given that we have only two periods, the transformed (differentiated) model is formally equivalent to the cross-sectional SAR(1) model, in which c n0 ≡ 0, a priori; see  for a related discussion.
In the MC exercise, we set λ 0 = 0.2, and we estimate it through Gaussian likelihood maximisation. The resulting M-estimator (the maximum likelihood estimator) is consistent and asymptotically normal; see §4.1. We consider the same types of W n as in the motivating example of §2. In Figure 2, we display the MC results. Via QQ-plot, we compare the distribution ofλ to the Gaussian asymptotic distribution (as implied by the first-order asymptotic theory).
The plots show that, for both sample sizes n = 24 and = 100, the Gaussian approximation can be either too thin or too thick in the tails with respect to the "exact" distribution (as obtained via simulation). The more complex is the geometry of W n (e.g., W n is Queen), the more pronounced are the departures from the Gaussian approximation.

D.2 Assumption D(iv) and its relation to the spatial weights
We can obtain M i,T (ψ, P θ 0 ) from (4.4) on a SAR(1) model as in (D.1), whereg ii and g ii are the i th element of the diagonals of G n G ′ n and G 2 n (λ 0 ), respectively. Note that S n (λ 0 ) = I n − λ 0 W n and G n (λ 0 ) = W n S −1 n (λ 0 ). To check Assumption D(iv): ||M i,T (ψ, P θ 0 ) − M j,T (ψ, P θ 0 )|| = O(n −1 ), first let us find the expression for the difference between M i,T (ψ, P θ 0 ) and M j,T (ψ, P θ 0 ), whereg ii andg ii are i th and j th elements of the diagonal of G n G ′ n . g ii snd g jj the i th and j th elements of the diagonal of G 2 n (λ 0 ). Then, we rewrite the expression of G 2 n (λ 0 ) to check its diagonal: where a k is the coefficient of W k n . By comparing the entries of diagonals of W k n , we can make an assumption on the differences between any two entries of the diagonal of W k n that should be at most of order n −1 , for all integers k ≥ 2, such that ||g ii − g jj || = O(n −1 ), for any 1 ≤ i, j ≤ n and i = j.
We do the same for G n G ′ n : where a k 1 k 2 is the coefficient of W k 1 n W k 2 n ′ . Then, we can make a second assumption on the differences between any two entries of the diagonal of W k 1 n W k 2 n ′ that should be at most of order n −1 , for all positive integers k 1 and k 2 , such that ||g ii −g jj || = O(n −1 ), for any for any 1 ≤ i, j ≤ n and i = j. From those two assumptions on the behaviour of diagonal entries, we notice that W n plays a leading role. are always identical to each other. Thus, the differences are always 0. Since the entries of W n are all positive and less than 1, when the exponent becomes larger, all the values of the powers of W n and W ′ n will go close to 0.00. Thus, large powers of W n definitely satisfy the two assumptions.

D.3 First-order asymptotics vs saddlepoint approximation
For the SAR(1) model, we analyse the behaviour of the MLE of λ 0 , whose PP-plots are available in Figure 3. For each type of W n , for n = 100, the plots show that the saddlepoint approximation is closer to the "exact" probability than the first-order asymptotics approximation. For W n Rook, the saddlepoint approximation improves on the routinelyapplied first-order asymptotics. In Figure 3, the accuracy gains are evident also for W n Queen and Queen with torus, where the first-order asymptotic theory displays large errors essentially over the whole support (specially in the tails). On the contrary, the saddlepoint approximation is close to the 45-degree line.
Density plots show the same information as PP-plots displayed in Figure 3  Monte Carlo runs) to which we superpose both the Gaussian and the saddlepoint density approximation. W n are Rook and Queen. The plots illustrate that the saddlepoint technique provides an approximation to the true density which is more accurate than the one obtained using the first-order asymptotics.

D.4 Saddlepoint approximation vs Edgeworth expansion
The Edgeworth expansion derived in Proposition 3 represents the natural alternative to the saddlepoint approximation since it is fully analytic. Thus, we compare the performance of the two approximations, looking at their relative error for the approximation of the tail area probability. We keep the same Monte Carlo design as in Section 6.1, namely n = 24, and we consider different values of z, as in (4.15). Figure 5 displays the absolute value of the relative error, i.e., |approximation/exact − 1|, when W n is Rook, Queen, and Queen torus. The plots illustrate that the relative error yielded by the saddlepoint approximation is smaller (down to ten times smaller in the case of Rook and Queen Torus) than the relative error entailed by the first-order asymptotic approximation (which is always about 100%). The Edgeworth expansion entails a relative error which is typically higher than the one entailed by the saddlepoint approximation-the expansion can even become negative in some parts of the support, with relative error above 100%.

D.5 Saddlepoint vs parametric bootstrap
The parametric bootstrap represents a (computer-based) competitor, commonly applied in statistics and econometrics. To compare our saddlepoint approximation to the one obtained by bootstrap, we consider different numbers of bootstrap repetitions, labeled as functional curve (dotted line with crosses); for details about functional boxplots, we refer to  and to R routine fbplot. We notice that, while the bootstrap median functional curve (representing a typical bootstrap density approximation) is close to the actual density (as represented by the histogram), the range between the quartile curves illustrates that the bootstrap approximation has a variability. Clearly, the variability depends on B: the larger is B, the smaller is the variability. However, larger values of B entail bigger computational costs: when B = 499, the bootstrap is almost as fast as the saddlepoint density approximation (computation time about 7 minutes, on a 2.3 GHz Intel Core i5 processor), but for B = 999, it is three times slower. We refer to Jeganathan et al. (2015) for comments on the computational burden of the parametric bootstrap and on the possibility to use saddlepoint approximations to speed up the computation. For B = 499 and zooming on the tails, we notice that in the right tail and in the center of the density, the bootstrap yields an approximation (slightly) more accurate than the saddlepoint method, but the saddlepoint approximation is either inside or extremely close to the bootstrap quartile curves (see right panel of Figure 6). In the left tail, the saddlepoint density approximation is closer to the true density than the bootstrap typical functional curve or λ ≤ −0.85. Thus, overall, we cannot conclude that the bootstrap dominates uniformly (in terms of accuracy improvements over the whole domain) the saddlepoint approximation.
Even if we are ready to accept a larger computational cost, the accuracy gains yielded by the bootstrap are yet not fully clear: also for B = 999, the bootstrap does not dominate uniformly the saddlepoint approximation. Finally, for B = 49, the bootstrap is about eight times faster than the saddlepoint approximation, but this gain in speed comes with a large cost in terms of accuracy. As an illustration, forλ − λ 0 = −0.8 (left tail), the true density is 0.074, the saddlepoint density approximation is 0.061, while the bootstrap median value is 0.040, with a wide spread between the first and the third quartile, being 0.009 and 0.108 respectively.

D.6 Saddlepoint test for composite hypotheses: plug-in approach
Our saddlepoint density and/or tail approximations are helpful for testing simple hypotheses about θ 0 ; see §6.1 of the paper. Another interesting case suggested by the Associate Editor and an anonymous referee that has a strong practical relevance is related to testing a composite null hypothesis. It is a problem which is different from the one considered so far in the paper, because it raises the issue of dealing with nuisance parameters.
To tackle this problem, several possibilities are available. For instance, we may fix the nuisance parameters at the MLE estimates. Alternatively, we may consider to use the (re-centered) profile estimators, as suggested, e.g., in  and . Combined with the saddlepoint density in (4.13), these techniques yield a ready solution to the nuisance parameter problem. In our numerical experience, these solutions may preserve reasonable accuracy in some cases.
To illustrate this aspect, we consider a numerical exercise about SAR(1) model, in which we set the nuisance parameter equal to the MLE estimate. Specifically, we consider a SAR(1) with parameter θ = (λ, σ 2 ) ′ and our goal is to test H 0 : λ = λ 0 = 0 versus H 1 : λ > 0, while leaving σ 2 unspecified. To study the performance of the saddlepoint density approximation for the sampling distribution ofλ, we perform a MC study, in which we set n = 24, T = 2, and σ 2 0 = 1. For each simulated (panel data) sample, we estimate the model parameter via MLE and getθ = (λ,σ 2 ) ′ . Then, we compute the saddlepoint density in (4.13) usingσ 2 in lieu of σ 2 0 : for each sample, we have a saddlepoint density approximation. We repeat this procedure 100 times, for W n Rook and Queen, so we obtain 100 saddlepoint density approximations for each W n type. In Figure 7, we display functional boxplots of the resulting saddlepoint density approximation for the MLEλ. To have a graphical idea of the saddlepoint approximation accuracy, on the same plot we superpose the histogram ofλ, which represents the sampling distribution of the estimated parameter. Even a quick inspection of the right and left plots suggests that the resulting saddlepoint density approximation preserves (typically) a good performance. Indeed, we find that the median functional curve (dotted line with crosses, which we consider as the typical saddlepoint density approximation) is close to the histogram and it gives a good performance in the tails, for both W n Rook and Queen. The range between the first and third quartile curves (two-dash lines) illustrates the variability of the saddlepoint approximation. When W n is Queen, even though there is a departure from the median curve from the histogram over the x-axis intervals (−0.75, 0) and (0.25, 0.5), the histogram is inside the functional confidence interval expressed by the first and third curves. Thus, we conclude that computing the p-value for testing H 0 usingσ 2 in the expression of the saddlepoint density approximation seems to preserve accuracy in this SAR(1) model, even for such a small sample.