Spanning Tests for Markowitz Stochastic Dominance

We derive properties of the cdf of random variables defined as saddle-type points of real valued continuous stochastic processes. This facilitates the derivation of the first-order asymptotic properties of tests for stochastic spanning given some stochastic dominance relation. We define the concept of Markowitz stochastic dominance spanning, and develop an analytical representation of the spanning property. We construct a non-parametric test for spanning based on subsampling, and derive its asymptotic exactness and consistency. The spanning methodology determines whether introducing new securities or relaxing investment constraints improves the investment opportunity set of investors driven by Markowitz stochastic dominance. In an application to standard data sets of historical stock market returns, we reject market portfolio Markowitz efficiency as well as two-fund separation. Hence, we find evidence that equity management through base assets can outperform the market, for investors with Markowitz type preferences.


Introduction
An essential feature of any model trying to understand asset prices or trading behavior is an assumption about investor preferences, or about how investors evaluate portfolios. The vast majority of models assume that investors evaluate portfolios according to the expected utility framework. Investors are assumed to act as non -atiable and risk averse agents, and their preferences are represented by increasing and globally concave utility functions.
Empirical evidence suggests that investors do not always act as risk averters.
Instead, under certain circumstances, they behave in a much more complex fashion exhibiting characteristics of both risk loving and risk averting. They seem to evaluate wealth changes of assets w.r.t. benchmark cases rather than final wealth positions.
They behave differently on gains and losses, and they are more sensitive to losses than to gains (loss aversion). The relevant utility function can be either concave for gains and convex for losses (S-Shaped) or convex for gains and concave for losses (reverse S-Shaped). They seem to transform the objective probability measures to subjective ones using transformations that potentially increase the probabilities of negligible (and possibly averted) events, which, in some cases, share similar analytical characteristics to the aforementioned utility functions. Examples of risk orderings that (partially) reflect such findings are the dominance rules of behavioral finance (see Friedman and Savage (1948), Baucells and Heukamp (2006), Edwards (1996), and the references therein).
Accordingly, stochastic dominance has been used over the last decades in this framework, having more generally evolved into an important concept in the fields of 1 economics, finance and statistics/econometrics (see inter alia Kroll and Levy (1980), McFadden (1989), Levy (1992), Mosler and Scarsini (1993), and Levy (2005)), since it enables inference on the issue of optimal choice in a non-parametric setting. Several statistical tools have been developed to test whether, given some fixed notion of stochastic dominance, a probability distribution of interest (or some random element that represents it) dominates any other similar distribution in a given set, i.e., the former is super-efficient over the latter set (see Arvanitis et al. (2018)). Analogous procedures have been developed to test whether this distribution is not dominated by any other member of the given set, i.e., whether it is an efficient element of it (see Linton Post and Wang (2014)  There is a large evolving literature on the first (FSD) and on the second (SSD) order stochastic dominance. We can characterize FSD via the choice under uncertainty of every non-satiable investor, while we can characterize SSD by the analogous choice of every risk averse and non-satiable investor (see Hadar and Russell (1969), Hanoch and Levy (1969), and Rothschild and Stiglitz (1970)). Higher order stochastic dominance relations impose more restrictions on the underlying utilities of the set of investors while retaining non-satiety and risk aversion. Dropping global risk aversion, Levy and Levy (2002) formulate the notions of prospect stochastic dominance (PSD) (see also Levy and Wiener (1998) Given a stochastic dominance relation, the concept of stochastic spanning subsumes the aforementioned notion of super-efficiency. It is an idea of Thierry Post, influenced by Mean-Variance spanning in Huberman and Kandell (1987), that was formulated in the context of second order stochastic dominance in Arvanitis et al. (2018). It is yet generalizable to arbitrary stochastic dominance relations. Given such a relation, and if the underlying set of efficient elements, i.e., the efficient set, is non-empty, a spanning set is simply any superset of the efficient set. As such, we can use a spanning set to provide an "outer approximation" of the underlying efficient set, and/or, when small enough, to provide with a desirable reduction of the initial set of distributions upon which the stochastic dominance ordering is defined, and which could be complicated. In such a case, we can reduce the examination of the optimal choice problem, to a potentially easier and more parsimonious one.
Both issues are of interest to financial economics since the underlying distributions often represent the return behaviour of financial assets and the dominance orderings reflect classes of investor preferences (e.g. for the FSD and SSD, as well as the PSD and MSD rules and their relations to classes of utility functions, see Levy and Levy (2002)). Those notions could also be of potential interest in any field of economic theory or decision science that examines optimal choice under uncertainty.
For example, if a strict subset of a universe of available assets is known to be spanning w.r.t. a stochastic dominance relation that reflects all preferences with some sort of combination of local risk aversion with local risk seeking behavior (see for example the MSD preorder defined in Section 3.1), any investor with such a 3 disposition towards risk can safely restrict her choice to the spanning set. On the contrary, if it is not spanning, there must exist investors with suchlike preferences that benefit from the enlargement of the investment opportunities from the subset to the superset. This implies that stochastic spanning can be useful in extracting   bootstrap and subsampling (Linton et al. (2005)). In order to obtain exactness, we cannot thus rely on standard probabilistic results used in the previous work on tests of super-efficiency, due to the complexity of the aforementioned functional.
Hence, our first contribution is the theoretical study of continuity properties of the cdf of random variables defined as saddle type points of real valued stochastic processes. Section 2 of the paper sets up the probabilistic framework, and derives new properties of the law of a random variable defined by a finite number of nested optimizations on a continuous process w.r.t. possibly interdependent parameter spaces. Beside its usefulness for the limit theory of spanning tests developed in this paper, this result is also a non-trivial extension to results concerning suprema of other stochastic processes and can be useful in other econometric settings (see Section 2 for references and examples).
Our second contribution is the following. The results in Arvanitis et al. (2018) concern the concept of stochastic spanning w.r.t. the SSD relation, which essentially represents all preferences with global risk aversion, and are derived in a context of bounded support for the underlying distributions. We expect that analogous, yet possibly more complex results, on the properties of spanning sets, their representation by relevant functionals, the construction of testing procedures, and the derivation of their limit theory hold if we extend to local risk aversion and general supports.
Statistical tests concerning the issue of super-efficiency w.r.t. stochastic dominance rules representing local attitudes towards risk have already appeared in the literature (see for example Post and Levy (2005), or Arvanitis and Topaloglou (2017)), but to our knowledge the concept of spanning has not been studied yet for such dominance relations.
Section 3 investigates the concept of stochastic spanning w.r.t. the MSD preorder in the context of financial portfolios formation. We define the notion and provide with an original characterization of spanning by the zero of a functional. Using the principle of analogy, we define the non-parametric test statistic, derive its limit distribution under the null hypothesis, and define a subsampling algorithm for the approximation of the asymptotic critical values. Among others, we use the new probabilistic results of Section 2 and a novel combinatorial argument, for the derivation of asymptotic exactness when the relevant limit distribution is non-degenerate and a restriction on the significance level holds. In particular, we derive consistency of the subsampling procedure. In contrast to the results in Arvanitis et al. (2018), we allow for unbounded supports for the return distributions, and we suppose that the relevant parameter spaces are simplicial complexes. We explain in Section 3 why those extensions are useful and how we have to modify the theoretical arguments to accommodate them.
Section 4 provides with a numerical implementation consisting of a finite set of Linear Programming (LP) and Mixed Integer Programming (MIP) problems, the latter being highly non linear optimization problems to solve. Inspired by Arvanitis and Topaloglou (2017), who show that the market portfolio is not MSD efficient, we test in an empirical application in Section 5, whether investors with MSD preferences could beat the market through equity management, according to Markowitz preferences. We use equity portfolios as base assets. We show that the market portfolio is not Markowitz efficient, and the two-fund separation theorem does not hold for MSD investors. Thus, combinations of the market and the riskless asset do not span the portfolios created according to the MSD cri-6 terion. We also show that equity managers with MSD preferences could generate portfolios that yield 30 times higher cumulative return than the market over the last 50 years. Standard performance and risk measures show that the optimal MSD portfolios better suit the MSD investors that are risk averse for losses and risk lovers for gains. It achieves a transfer of probability mass from the left to the right tail of the return distribution when compared to the market portfolio. Its return distribution exhibits less negative skewness, less kurtosis, and less negative tail risk. Finally, using the four-factor model of Carhart (1997) and the five-factor model of Fama and French (2015), we investigate which factors explain these returns. We find that a defensive tilt explains part of the performance of the optimal MSD portfolios, while momentum and profitability do not.
In the final section, we conclude. We present the proofs of the main and the auxiliary results in the Appendix.

Probabilistic Results
Suppose that Λ 1 , Λ 2 , . . . , Λ s are separable metric spaces, and let Λ := s i=1 Λ i be equipped with the product topology. Consider the functional oper := opt 1 • opt 2 • · · · • opt s where opt i = sup or inf w.r.t. to some non-empty compact Λ ⋆ i ⊆ Λ i , for i = 1, . . . , s. When i > 1, Λ ⋆ i is allowed to depend on the elements of i−1 j=1 Λ ⋆ i−j . The probabilistic framework follows closely Chapter 2 of Nualart (2006). It consists of a complete probability space (Ω, F , P), where F is generated by some isonormal Gaussian process W = {W (h) , h ∈ H} and H is an appropriate Hilbert space.
X is some vector valued stochastic process on Λ with sample paths in the space of 7 continuous functions Λ → R q equipped with the uniform metric. In many applications, X is a Gaussian weak limit of some net of processes. We denote the Malliavin derivative operator (see Nualart (2006)) by D and by D 1,2 the completion of the family of Malliavin differentiable random variables w.r.t. the norm E z 2 + (Dz) 2 .
We are interested in the form of the support and the continuity properties of the cdf of the law of the random variable ξ := operX λ . The following assumption describes sufficient conditions for the aforementioned law to have a countable number of atoms while being absolutely continuous when restricted between their successive pairs. Given this, the result to be established below, allows first for the random variable at hand to be defined by saddle-type functionals, 2 and second for discontinuities of the resulting cdf. Hence, it generalizes known results concerning the absolute continuity of the distribution of suprema of stochastic processes. For an excellent treatment of those see inter alia, Propositions 2.1.7 and 2.1.10 of Nualart (2006), and for the discontinuities related literature on the fibering method and its probabilistic applications, see Lifshits (1983). Assumption 1. For the process X suppose that: 2. For all λ ∈ Λ, X (λ) ∈ D 1,2 , and the H -valued process DX has a continuous version and E [sup Λ DX λ 2 ] < +∞.
2 The term "saddle-type" is used here in a somewhat abusive manner, since commutativity between the successive optimization operations does not hold in general.

8
In the usual case where X is zero-mean Gaussian, we can establish the first condition by strong results that imply the subexponentiality of the distribution of sup Λ X λ , like Proposition A.2.7 of van der Vaart and Wellner (1996). Its validity follows from conditions that restrict the packing numbers of Λ × R metrized as a totally bounded metric space by the use of the covariance function of X, to be polynomially bounded, something that is easily established if the Λ i are subsets of Euclidean spaces for all i. In the same respect, the second condition is easily established as in Nualart (2006) (see page 110). More specifically, if K (λ 1 , λ 2 ) is the aforementioned covariance function, then H is the closed span of {h λ (·) = K (λ, ·) , λ ∈ Λ}, with inner product h λ 1 , h λ 2 H = K (λ 1 , λ 2 ), whence DX λ = K (λ, λ). In this case, the previous along with dominated convergence implies the existence of The third condition is the most difficult to establish. In the cases that we have in mind, we can derive "outer approximations" of T by analogous, as well as easier to establish, properties of random variables that are stochastically dominated by ξ, see for example the corollary below.
We are now able to state and prove the main probabilistic result.
Theorem 1. Under Assumption 1, the law of ξ has connected support, say supp (ξ), that contains T . If τ ∈ T , the cdf of the law evaluated at τ has a jump discontinuity of size at most P (Ω τ ). If τ 1 , τ 2 are successive elements of T , the law restricted to (τ 1 , τ 2 ) is absolutely continuous w.r.t. the Lebesgue measure. If T is bounded from below then the law restricted to (−∞, inf T ) is absolutely continuous w.r.t. the Lebesgue measure. Dually, if T is bounded from above then the law restricted to (sup T , +∞) is absolutely continuous w.r.t. the Lebesgue measure.
Theorem 1 encompasses the standard absolute continuity results in the aforemen-tioned literature that hold when oper is a composition of suprema (or dually infima), the parameter spaces Λ are not dependent, and P (Ω τ ) = 0, for all τ ∈ T . Furthermore, even in the special case where T is a singleton, the result is a generalization of Theorem 2 of Lifshits (1983) since it allows for non-Gaussianity, dependence between the domains of the optimization operators, as well as saddle-type optimizations. The following corollary focuses on this particular case and estimates the size of the potential jump discontinuity by assuming the existence of an auxiliary random variable that is stochastically dominated by ξ. Under the null limit theory for the test statistic, the results above are useful for the construction of an asymptotically exact decision procedure based on a resampling scheme. They do so by providing with restrictions on the asymptotic significance level that guarantee the convergence of the critical values to continuity points of the null limiting cdf. In such frameworks, X is usually zero-mean Gaussian, while ξ is conveniently defined as a difference between infima of X defined on different regions of Λ with given properties (see the following sections for explicit derivations of those properties in the case of MSD).
We can meet similar probabilistic structures in other econometric applications.
An example concerns the null hypothesis of nesting of a given statistical model by a

A Spanning Test for MSD
We now introduce the concept of stochastic spanning for the MSD relation. We initially provide some order theoretical characterization of the concept, and derive an analytical representation using a functional defined by recursive optimizations.
We then construct a testing procedure using a scaled empirical counterpart of that functional and subsampling. We derive its first order limit theory mainly thanks to Corollary 1.

MSD and Stochastic Spanning
Given (Ω, F , P), suppose that F denotes the cdf of some probability measure on R n with finite first moment. 3 Let G(z, λ, F ) be R n I{λ T r u ≤ z}dF (u), i.e., the cdf of the linear transformation R n ∋ x → λ T r x where λ assumes its values in L which is a closed non-empty subset of S = {λ ∈ R n + :1 T r λ= 1, }. Analogously, let K denote some distinguished subcollection of L. In the context of financial econometrics, F usually represents the joint distribution of n base asset returns, and S the set of linear portfolios that can be constructed upon the previous. 4 The parameter set L represents the collection of feasible portfolios formed by economic, legal, and/or other investment restrictions. We denote generic elements of L by λ, κ, etc. In order 3 In comparison to the spanning test for the SSD relation of Arvanitis et al. (2018), we do not assume that the random variables have compact supports. 4 The base assets are not restricted to be individual securities but are defined simply as the extreme points of the maximal portfolio set S.
to define the concepts of MSD and subsequently of spanning, we consider The existence of the mean of the underlying distribution implies that we can allow the limits of integration above to assume extended values, hence the integral differences ∆ 1 and ∆ 2 in (1) are well defined. Levy and Levy (2002) show that κ M λ iff the expected utility of κ is greater than or equal to the expected utility of λ for any utility function in the set of increasing and, concave on the negative part and convex on the positive part real functions (termed as reverse S-shaped (at zero) utility functions). Such utility functions represent preferences towards risk that are associated with risk aversion for losses and risk loving for gains. Hence, in financial economics, Markowitz-dominance is the case iff portfolio κ is weakly prefered to portfolio λ by every reverse S-shaped individual investor.
The uncountable system of inequalities in (1) defines an order on L. If those are satisfied as equalities, the pair (κ, λ) belongs to the possibly non-trivial equivalence part of the order. Strict dominance κ ≻ M λ corresponds to the irreflexive part of the order and it holds iff at least one of the previous inequalities holds strictly for some z ∈ R, i.e., portfolio κ is strictly prefered to portfolio λ by some reverse S-shaped individual investor. Finally, given the possibility that ∆ 1 and/or ∆ 2 can change sign as functions of z, the relation is not generally total. When this is the case, we cannot compare κ and λ w.r.t. M .

13
As in the Mean-Variance case, we can define the efficient set of L w.r.t. M , as the set of maximal elements of the preorder. This means that κ lies in the efficient set iff for any λ ∈ L, either κ M λ or κ is incomparable to λ. The efficient set has the property that, for any λ ∈ L, there exists some κ in the former for which κ M λ.
Any superset of the efficient set has also the same property, but the efficient set is Spanning sets always exist since by construction L M L. The efficient set minimally (ignoring equivalencies) spans L, in the sense that any other spanning set must be a superset of it. Hence, we can view any spanning subset of L as an "outer approximation" of the efficient set. Due to the complexity of (1) w.r.t. the Mean-Variance case, the mathematical properties of the efficient set are generally difficult to derive, but fortunately, they are approximable by properties of sequences of spanning sets that converge to it (see below).
Furthermore, if K M L, the optimal choice of every reverse S-shaped investor function lies necessarily inside K. Hence, if K ⊂ L and spanning occurs, we can reduce the problem of optimal choice within L to the analogous problem within K, and the latter is more parsiminious than the former. Dually, if K does not span L, there must exist optimal choices, and thereby investment opportunities, in the increment L − K for some MSD investors. Therefore we can motivate the interest in the verification of spanning by tractability reasons related to optimal portfolio choice, or by detection of new investment opportunities.
Super-efficiency (Arvanitis and Topaloglou (2017)) corresponds to the existence of a greatest element for M , i.e., of a unique (excluding equivalencies) element that weakly Markowitz-dominates every element of L. Given the complexity of (1), greatest elements do not generally exist. This implies that the notion of spanning not only encompasses that of super-efficiency but it is also a property of the order that will more often hold.
The above raise the following question: given K, a non empty subset of L, 5 is The following proposition provides with an analytical characterization by means of nested optimizations. where The case of super-efficiency is then trivially obtained.
Corollary 2. Under the scope of the previous lemma, κ is Markowitz super-efficient 5 We do not look at the issue of the selection of K. Here, the latter is considered as given. In some cases, we can select K by economically relevant information, see for example the application in Arvanitis et al. (2018) for SSD. We leave the issue of the selection of a candidate spanning set, especially when this selection is related to the approximation of the efficient set, for future research.
Given K, it is generally difficult to directly use the previous proposition since F is usually unknown and/or the optimizations involved are infeasible. However, given the availability of a sample containing information for F and in conjunction with the principle of analogy, it provides the backbone for the construction of inferential procedures that address MSD spanning.

A Consistent Non-parametric Test Hypotheses Structure and Test Statistic
We employ Lemma 1 to construct a non-parametric test for MSD spanning. If K M L is chosen as the null hypothesis, the hypothesis structure takes the form: 6 To design the decision rule, we extend our framework as follows. Consider a process (Y t ) t∈Z taking values in R n . Y t i denotes the i th element of Y t . The sample of size T is the random element (Y t ) t=1,...,T . In our portfolio framework, it represents the observable returns of the n financial base assets. We denote the unknown cdf of Y t by F , and the empirical cdf by F T . We consider the test statistic which is the √ T -scaled empirical analog of ξ (F ). We can equivalently express ξ T as a usual scaled empirical average: where This is instrumental in the numerical implementation of (3) in Section 4. When K is a singleton, the test statistic coincides with the one used in Arvanitis and Topaloglou (2017).

Null Limit Distribution
In order to show that our testing procedure is asymptotically meaningful, we need a limit theory for ξ T under the null hypothesis. We derive it using the following assumption.
The mixing rates condition is implied by stationarity and geometric ergodicity.
The latter holds for many stationary models used in the context of financial econometrics, like ARMA, GARCH-type, and stochastic volatility models (see Francq and Zakoian (2011) for several examples). The moment existence condition enables the validity of a mixing CLT. A CLT typically holds under stricter restrictions. The positive definiteness of the long run covariance matrix is for instance satisfied, if (Y t ) t∈Z is a vector martingale difference process and the elements of Y 0 are linearly independent random variables. From the compactness of L, the previous implies In what follows, we denote convergence in distribution by .
Proposition 2. Suppose that K is closed, Assumption 2 holds, and H 0 is true. Then Γ i , and G F is a centered Gaussian process with covariance kernel given by and P almost surely uniformly continuous sample paths defined on R n . 8 7 For example, since the support is bounded, we can cover it by some hypercube of the form [z l , z u ] n where we can choose z l as negative. Obviously, (λ, z l ) ∈ Γ 1 , for any λ ∈ L.

18
The covariance kernel above, and thereby G F , are well defined due to the mixing condition and the existence of sup λ∈L Algorithm. The testing procedure consists of the following steps: 1. Evaluate ξ T at the original sample value.
2. For 0 < b T ≤ T , generate subsample values from the original observations 3. Evaluate the test statistic on each subsample value, obtaining ξ T,b T ,t for all We derive the first order limit theory via the use of Proposition 2 and of relevant results from the theory of subsampling. We first make the following standard assumption in the subsampling methodology.
The assumption does not provide with much information on the practical choice of the subsampling rate for fixed T . It is designed to handle issues like asymptotic exactness and consistency. In the following section, along with the numerical implementation for ξ T , we discuss a method of fixed T correction for the algorithm above, (see the proof of Lemma 2 in the Appendix) that we can obtain such a bound in the form of a non-negative random variable defined as the difference between the suprema at L and K respectively, of a linear Gaussian process. Hence, we get the needed estimate of the jump size as the probability that the latter random variable attains the value zero.
In order to evaluate this, we essentially use some combinatorial notions that allow the estimation of the proportion of the linear functions for which their unique maximizer over L is a common extreme point of both the parameter spaces.   in the literature, we prove in the Appendix (see Lemma 2) that the probability that the aforementioned bounding random variable attains the value zero is less than or equal to ch L (K). Then, via the use of Corollary 1, we establish that, when ξ ∞ is non degenerate, the 1 − α quantile is a continuity point for its cdf when α < 1 − ch L (K). Hence, we immediately obtain the following first order limit theory for the subsampling testing procedure described above via Theorem 3.5.1 in Politis,

Romano and Wolf (1999).
Theorem 2. Suppose that Assumptions 2, 3 and 4 hold. For the testing procedure described in Algorithm 3.2, we have that When the distribution of ξ ∞ is degenerate, the procedure is asymptotically conservative even if the restriction α < 1 − ch L (K) does not hold. This is reminiscent of the results in Linton et al. (2005) concerning testing procedures for superefficiency w.r.t.
several stochastic dominance relations. The non-degeneracy of the aforementioned limit distribution is not easy to establish except for cases such as the one about bounded supports which was discussed above.
When the distribution of ξ ∞ is non-degenerate, the procedure is asymptotically exact if the restriction α < 1 − ch L (K) holds. The restriction on the significance level is non-binding in usual applications. For example, when L = S and K is a singleton, i.e., when the test is applied for super-efficiency, it implies at worst that α < 1/2, something that is usually satisfied. The closer to binding the restriction becomes, the more extreme points of L = S exist inside K. An extreme case is when n is large, K is finite, and contains n − 1 extreme points. In such a case, the result leads to subsampling tests that tend to asymptotically favor the null hypothesis of spanning. We could handle that by breaking up K is "smaller pieces" and iterating the testing procedure w.r.t. them. For example, we can apply the procedure for any subset of K that contains m points, for m sufficiently small in order to obtain a meaningful significance level. If for some subset, we cannot reject spanning, we can infer that we cannot reject spanning for the initial K, since supersets of spanning sets are spanning sets from Definition 2. It is also possible that the structure of the efficient set prohibits such a K to be a spanning set. We leave the study of such questions for future work. In any case, the testing procedure is consistent.
Under some assumptions, we can prove, using again among others the main result, that an analogous testing procedure based on block bootstrap is generally asymptotically conservative and consistent.

A Numerical Implementation and Bias Correction
We first describe a potential numerical implementation via the use of a testing procedure asymptotically equivalent to the one of Subsection 3.
where the q i are defined in 3. From the finiteness of A (T ) i , i = 1, 2, the non trivial parts of the optimizations involved concern the n i,T := sup λ∈L inf κ∈K and we can reduce each of the minimizations involved to the solution of linear programming problems.
There is a set of at most T values, say R = {r 1 , r 2 , ..., r T }, containing the optimal value of the variable z (see Scaillet and Topaloglou (2010) for the proof). Thus, we solve smaller problems P (r), r ∈ R, in which z is fixed to r. Now, each of the above minimization problems boils down to a linear problem. Without loss of generality, the first optimization problem is the following: Furthermore, and via the results in the first Appendix of Arvanitis and Topaloglou (2017), we have that Hence, we need to solve both optimization problems appearing above. We do so via representing them as MIP programs. Again, there is a set of T values, say .., r ′ T }, containing the optimal value of the variable z (see Arvanitis and Topaloglou (2017) for the proof). Thus, we solve smaller problems P (r), r ∈ R ′ , in which z is fixed to r. Consider without loss of generality the first optimization problem: Hence, the computational cost of the implementation above consists of cardA 1 linear programming problems, cardA 2 mixed integer programming problems, and three trivial optimizations.
Secondly, and although the tests above have asymptotically correct size, it is for a range of values for the subsample size b T . We then estimate the intercept and slope of the following regression line using OLS regression analysis: We then estimate the bias-corrected (1 − α)-quantile as the OLS predicted value for b T = T : Since q T,b T (1 − α) converges in probability to q(ξ ∞ , 1 − α) and (b T ) −1 converges to zero as T → 0,γ 0;T,1−α converges in probability to q(ξ ∞ , 1 − α), and the asymptotic properties are not affected.

Monte Carlo Study
We now design Monte Carlo experiments to evaluate the size and power of our testing procedure in finite samples. We allow for conditional heteroskedasticity consistent with empirical findings on returns of financial data as observed in the empirical application below. The multivariate return process (Y t ) t∈Z is a vector GARCH (1,1) process, which is transformed to accommodate both spanning (size) and non spanning cases (power) for K given assets. Such a process permits both temporal and cross sectional dependence between the random variables stacked in the vector process.
Suppose that (z t ), t ∈ Z, are i.i.d. with mean zero, unit variance, and E [|z t | 2+ǫ ] < ∞, for some ǫ > 0. We assume that the cdf of z t is strictly increasing. Furthermore, we define the components of the return process for i = 1, ..., K − 1 as Let τ = (0, 0, ..., 1, 0), τ ⋆ = (0, 0, 0, ..., 1), and L := (λ, 0, 0, ) T r , τ, τ ⋆ , with λ ∈ R K−2 + and 1 T r λ = 1. Using this portfolio space, we obtain the following result on Markowitz-spanning. We present our Monte Carlo results in Table 1. The number of replications to compute the empirical size and power is 1000 runs. We use either a combination of 9 Another example is a process with different positive means and no serial dependence such that , the spanning results stated in Proposition 3 also hold. We have checked in unreported simulation results that the spanning test behaves also well in such an example including the case of student innovations with infinite variance.
..,K−1} . We use innovations generated by a Student distribution with 5 degrees of freedom. 10 We use three different sample sizes. For T = 300, we get the subsampling dis-

Without bias correction
With bias correction   Since the market is MSD inefficient, our next research hypothesis is whether twofund separation holds, i.e., whether all MSD investors can satisfy themselves with combining the T-bill and the market portfolio only. The test of MSD efficiency for a given portfolio developped by Arvanitis and Topaloglou (2017) cannot answer that question since their approach is limited to the simple case of a spanning test for K being a singleton, and not any linear combination of two assets.
For non-normal distributions, two-fund separation generally does not occur, unless one assumes that preferences are sufficiently similar across investors (see, for example, Cass and Stiglitz (1970)). Our MSD spanning test can analyze two-fund separation without assuming a particular form for the return distribution or utility 36 functions.
We get the subsampling distribution of the test statistic for subsample size b T ∈ [120, 240, 360, 480]. Using OLS regression on the empirical quantiles q T,b T (1−α), for significance level α = 0.05, we get the estimate q T for the critical value. We reject the MSD spanning if the test statistic ξ T is higher than the regression estimate q T .
In all the considered cases, L = S, and α < 3 4 ≤ 1 − ch L (K) holds. Hence, if our assumption framework is valid, we expect that asymptotic exactness holds. We find that: • The 6 FF benchmark portfolios: The regression estimate q T = 15.74 is lower than the value of the test statistic ξ T = 26.78. As a final step in this analysis, we test for two-fund separation using the Mean-Variance criterion rather than the MSD criterion. We use the same methodology as for the above prospect spanning test, but we restrict the utility functions to take a quadratic shape. We solve the embedded expected-utility optimization problems (for every given quadratic utility function) using quadratic programming. In contrast to MSD spanning, we cannot reject the Mean-Variance spanning at conventional significant levels.
The combined results of the market MSD efficiency and market MSD spanning tests suggest that combining the T-bill and market portfolio is not optimal for some MSD investors. Investors with reverse S-shaped utility functions are investors that could outperform the market by staying away from a buy-and-hold strategy on the market. Active investors often take concentrated positions in assets with high upside potential or follow dynamic strategies like momentum. They can also prefer looking at defensive strategies. That can produce opportunities with positively skewed returns, or at least less negatively skewed, which are attractive for MSD investors.

Performance Summary of the MSD portfolios
The rejection of the spanning hypothesis implies that there exists at least one portfolio in L which is weakly prefered to every portfolio in K by at least one reverse S-shaped utility function (see Definition 2). Such a portfolio is by construction effi- is the optimal portfolio λ that maximizes ξ T for the particular sample value. In what follows, and given this characterization, we analyze the performance of such empirically optimal MSD portfolios through time, compared to the performance of the market portfolio (buy-and-hold strategy).
We resort to backtesting experiments on a rolling window basis. The rolling horizon computations cover the 642-month period from 07/1963 to 12/2016. At each month, we use the data from the previous 30 years (360 monthly observations) to calibrate the procedure. We solve the resulting optimization model for the MSD spanning test and record the optimal portfolio made of the base assets as well as the market portfolio and the T-bill. We determine the realized return of the chosen MSD optimal portfolio from the actual returns of the asset weight allocation picked by the optimizer for that month. Then, we repeat the same procedure for the next one-month rolling window and compute the ex-post realized returns for the period from 07/1963 to 12/2016. Therefore, the MSD optimal portfolios are outcomes of the testing procedure based on an unconditional distribution updated for each rolling window and performance is realized out of the optimization sample (no look-ahead bias).
Let us first compute the cumulative performance of the MSD optimal portfolios as well as the market portfolio for the entire sample period from July 1963 to December 2016 based on the optimal portfolio weights obtained for each one-month rolling window. The value for the MSD optimal portfolios is 426 times higher at the end of the holding period compared to the initial value, while the market portfolio is only 13.9 times higher. Hence, the relative performance of MSD type investors is 30 times higher than the performance of the market in the evaluated period. Such an increase of 3000% is significant at any significance level (unreported results).
To get further insights of the differences between two investment strategies, we report the first four moments of the realized returns and the Value-at-Risk in Table 2.
We further compute a number of commonly used performance measures: the Sharpe ratio, the downside Sharpe ratio, the return loss and the opportunity cost.
The downside Sharpe ratio based on the semi-variance (Ziemba (2005)) is considered to be a more appropriate measure of performance than the typical Sharpe ratio given the asymmetric return distribution of the assets. This indicates the way that the proportional transaction costs, generated by the portfolio turnover, affect the portfolio returns. Let trc be the proportional transaction cost, and R P,t+1 the realized return of portfolio P at time t + 1. The change in the net of transaction cost wealth NW P of portfolio P through time is, The portfolio return, net of transaction costs is defined as Let µ M and µ M SD be the out-of-sample mean of (15) for the market portfolio and the MSD optimal portfolios, and σ M and σ M SD be the corresponding standard deviations.
Then, the return-loss measure is, i.e., the additional return needed so that the market performs equally well with the MSD optimal portfolios. We follow the literature and use 35 bps for the transaction costs of stocks and bonds.
Finally, the opportunity cost presented in Simaan (2013) gauges the economic significance of the performance difference between two portfolios. Let R M SD and R M be the realized returns of the MSD optimal portfolios and the market portfolio, respectively. Then, the opportunity cost θ is defined as the return that needs to be added to (or subtracted from) the market return R M , so that the investor is indifferent (in utility terms) between the strategies imposed by the two different investment opportunity sets, i.e., A positive (negative) opportunity cost implies that the investor is better (worse) off if the investment opportunity set allows for MSD type investing. The opportunity cost takes into account the entire probability density function of asset returns and hence it is suitable to evaluate strategies even when the distribution is not normal. For the calculation of the opportunity cost, we use the following utility function which satisfies the curvature of Markowitz theory (reverse-S-shaped): where c is the coefficient of loss aversion (usually c = 2.25) and a, b > 1. We use several values of a, b in Table 2 to drive the curvature of the utility functions. Table 2 reports the performance and risk measures for the MSD optimal portfolios and the market portfolio. These measures allow us to better figure out the differences between the market portfolio and the MSD strategy. The mean is higher for the MSD optimal portfolio and the variance is lower, which results in a higher Sharpe ratio.
The skewness is less negative as expected for a portfolio built for investors with preferences towards risk that are associated with risk aversion for losses and risk loving for gains. The kurtosis and VaR are lower as expected when investors want to mitigate the impact of large losses. The MSD portfolio targets and achieves a transfer of probability mass from the left to the right tail of the return distribution when compared to the market portfolio. The opportunity cost is above 70 bps and increases with the curvatures of the gain and loss parts of the utility function. Table 3  First, we consider the following linear regression (Carhart four-factor model): where R it is the return of the MSD optimal portfolio at period t, R F t is the riskless rate, R M t is the return on the value-weight (VW) market portfolio, SMB t is the return on a diversified portfolio of small stocks minus the return on a diversified portfolio of big stocks, HML t is the difference between the returns on diversified portfolios of high and low BE/ME stocks, MOM t is the average return on the two high prior return portfolios minus the average return on the two low prior return portfolios, and e it is a zero-mean residual. If the exposures b i , s i , h i , and r i to the market, size, value, and momentum factors capture all variation in expected returns, the intercept a i is zero. We additionally consider the following linear regression (five-factor model): where R it is the return of the MSD optimal portfolio at period t, R F t , R M t , SMB t and HML t as before, RMW t is the difference between the returns on diversified portfolios of stocks with robust and weak profitability, CMA t is the difference between the returns on diversified portfolios of the stocks of low and high investment firms, which are called conservative and aggressive, and e it is a zero-mean residual. If the exposures b i , s i , h i , r i , and c i to the market, size, value, profitability and investment factors capture all variation in expected returns, the intercept a i is zero. In both factor models, we observe that the beta market is slightly smaller than one (defensive) for the MSD portfolios as expected. The negative sign for the SMB factor loading and positive sign for the HML factor loading correspond to an additional defensive tilt. Defensive strategies overweight large value stocks and underweight small growth stocks (see Novy-Marx (2016)).

Conclusions
We have derived properties of the cdf of a random variable defined by recursive optimizations applied on a continuous stochastic process w.r.t. possibly dependent parameter spaces. Those properties extend previous results and can be useful for the derivation of the limit theory of tests for stochastic spanning w.r.t. stochastic dominance relations.
As a theoretical application, we have defined the concept of spanning, constructed an analogous test based on subsampling, and derived the first-order limit theory and a numerical implementation for the case of the MSD relation.
We have used the non-parametric test in an empirical application, inspired by Arvanitis and Topaloglou (2017), who show that the market portfolio is not MSD efficient. The spanning test enables us to explore whether MSD equity managers could outperform the market portfolio. First, we test whether the market portfolio is MSD efficient, and then whether the two-fund separation theorem holds for investors with MSD preferences. We use as base assets either the FF size and book to market portfolios, a set of momentum portfolios, a set of industry portfolios, or a set of beta or size decile portfolios. Empirical results indicate that the market portfolio is not MSD efficient, and the two-fund separation theorem does not hold for MSD investors. Thus, the combination of the market and the riskless asset do not span the portfolios created according to the MSD criterion. Hence, there exist MSD investors that could benefit from investment opportunities that involve assets beyond portfolios constructed solely by the market portfolio and the safe asset. We verify this by showing that equity managers with MSD preferences could generate portfolios that yield 30 times higher cumulative return than the market over the last 50 years.
The return distribution of the MSD optimal portfolio is less negatively skewed, less leptokurtic, and thiner left-tailed, when compared to the market portfolio. Finally, using the four-factor model of Carhart (1997) and the five-factor model of Fama and French (2015), we investigate which factors explain these returns. We find that a defensive tilt explains part of the performance of the optimal MSD portfolios, while momentum and profitability do not.
The derivations and methodology used above can also be explored for other forms of stochastic dominance relations, such as the first-or the third-order, or Prospect stochastic dominance. We leave such issues for future research.
Proof of Corollary 1. It follows simply by Theorem 1 since the relation between ξ and η implies that supp (ξ) is the closure of (c, +∞) and also that P (ξ = c) ≤ P (η = c).
Proof of Proposition 2. The results in the auxiliary Lemma 1 imply that    Since K is compact, Theorem 3.4 (ch. 5, p. 338) of Molchanov (2006)  is true, for some λ ⋆ ∈ L − K, and any κ ∈ K, there exists some i, z ⋆ ∈ A i such that ∆ i (z, λ, κ, F ) > 0. Then, we have that

Auxiliary Lemmata
The following are auxiliary lemmata used for the derivation of the proofs of Proposition 2 and Theorem 2.
and it is at most only then that ξ ∞ has degenerate variance. Thereby, T = {0} and we can try to obtain a lower bound for ξ ∞ . From the integration by parts formula for the Lebesgue-Stieljes integral and Assumption 2, we get where Z ∼ N (0 n×1 , V). Hence, ξ ∞ ≥ η ∞ ≥ 0.
The previous inequality implies the applicability of Corollary 1 for c = 0. We obtain the result by estimating an upper bound for P (η ∞ = 0). From Assumption 2 and the non-degeneracy of V the latter probability equals exactly the probability that the maximum of the random vector Z occurs at a coordinate that represents an extreme point of S to which corresponds a common effective extreme point for L