Estimation of Large Dimensional Conditional Factor Models in Finance

This chapter surveys recent econometric methodologies for inference in large dimensional conditional factor models in finance. Changes in the business cycle and asset characteristics induce time variation in factor loadings and risk premia to be accounted for. The growing trend in the use of disaggregated data for individual securities motivates our focus on methodologies for a large number of assets. The chapter starts with an historical perspective on conditional factor models with a small number of assets for comparison purpose. Then, it outlines the concept of approximate factor structure in the presence of conditional information, and reviews an arbitrage pricing theory for large dimensional factor models in this framework. For inference, we distinguish between two different cases depending on whether factors are observable or not. We focus on diagnosing model specification, estimating conditional risk premia, and testing asset pricing restrictions under increasing cross-sectional and time series dimensions. At the end of the chapter, we provide new empirical findings based on a broad set of factor models and contrast analysis based on individual stocks and standard sets of portfolios. We also discuss the impact on computing time-varying cost of equity for a firm, and summarize differences between results for developed and emerging markets in an international setting.


Introduction
The objective of this chapter is to provide an econometric methodology for inference in conditional factor models in finance. We focus on diagnosing model specification, estimating conditional risk premia, and testing asset pricing restrictions under increasing cross-sectional and time series dimensions.
Risk premia measure financial compensation asked by investors for bearing systematic risk. The workhorse to estimate equity risk premia in a linear multi-factor setting is the two-pass cross-sectional regression method developed by Black, Jensen, and Scholes (1972) and Fama and MacBeth (1973). A series of papers address its large and finite sample properties for linear factor models with time-invariant coefficients, see e.g. Shanken (1985Shanken ( , 1992, Jagannathan and Wang (1998), Shanken and Zhou (2007), Kan, Robotti and Shanken (2013), and the review paper of Jagannathan, Skoulakis and Wang (2009). That early literature did not formally address statistical inference for equity risk premia in conditional linear factor models despite its empirical relevance.
In this chapter, we study how we can infer the time-varying behaviour of equity risk premia from large stock returns databases under conditional linear factor models. Our approach is inspired by the recent trend in macro-econometrics and forecasting methods trying to extract cross-sectional and time-series information simultaneously from large panels (see e.g. Stock and Watson (2002a,b), Bai (2003Bai ( , 2009), Ng (2002, 2006), Forni, Hallin, Lippi and Reichlin (2000, Pesaran (2006)). Ng (2007, 2009) exemplify this promising route when studying bond risk premia. Connor, Hagmann, and Linton (2012) show that large cross-sections exploit data more efficiently in a semiparametric characteristicbased factor model of stock returns. The theoretical framework underlying the Arbitrage Pricing Theory (APT) also inspires our approach relying on individual stocks returns. In this setting, approximate factor structures with nondiagonal error covariance matrices (Chamberlain and Rothschild (1983)) answer the 2 potential empirical mismatch of exact factor structures with diagonal error covariance matrices underlying the original APT of Ross (1976). Under weak cross-sectional dependence among idiosyncratic error terms, such approximate factor models generate no-arbitrage restrictions in large economies where the number of assets grows to infinity. This chapter develops an econometric methodology tailored to the APT framework.
Indeed, we let the number of assets grow to infinity mimicking the large economies of financial theory.
As already mentioned, empirical work in asset pricing vastly relies on linear multi-factor models with either time-invariant coefficients (unconditional models) or time-varying coefficients (conditional models).
The factor structure is often based on observable variables (empirical factors) and supposed to be rich enough to extract systematic risks while idiosyncratic risk is left over to the error term. Linear factor models are rooted in the Arbitrage Pricing Theory (Ross (1976), Chamberlain and Rothschild (1983)) or come from a loglinearization of nonlinear consumption-based models (Campbell (1996)). A central and practical issue is to determine whether there are one or more factors omitted in the chosen specification. If the set of observable factors is correctly specified, the errors are weakly cross-sectionally correlated, namely the covariance matrix of the error terms in the factor model has a fastly vanishing largest eigenvalue. If the set of observable factors is not correctly specified, the no-arbitrage restrictions derived from APT do not hold, and the risk premia estimated by the two-pass regression approach are meaningless. Even if the omitted factors are not priced, i.e., their associated risk premia are nil, direct computations of the limits of first pass and second pass estimators under misspecification show that second pass estimates do not converge to the risk premia of the priced factors, and that biases on betas and risk premia do not compensate each other. Besides, since the no arbitrage restrictions do not hold, we cannot simply say that the risk premia are the expected factor returns for models with traded factors. Hence detecting an omitted factor is also important in that case to produce correct expected excess returns from the no arbitrage restrictions. Given the large menu of factors available in the literature (the factor zoo of Cochrane (2011), see also , ), we need a simple diagnostic criterion to decide whether we can feel comfortable with the chosen set of observable factors before proceeding further in the empirical analysis of large cross sectional equity data sets under the APT setting. For example, if the factor model passes the diagnostic, and we reject that alphas are zero using a GRS-type statistic (Gibbons et al. (1989)), it will not be because of an omitted factor. This chapter also aims at providing with such a diagnostic criterion. This outline of this chapter is as follows. In Section 2, we consider a general framework of conditional linear factor model for asset returns in large economies. Section 3 presents inference in models with observable factors. Those two sections are largely inspired by Gagliardini, Ossola, Scaillet (2016, GOS) and Gagliardini, Ossola, Scaillet (2019, GOS2). We focus on diagnosing model specification, estimating conditional risk premia, and testing asset pricing restrictions under increasing cross-sectional and time series dimensions. In Section 4, we investigate models with unobservable factors. We look at empirical findings in Section 5. There we contrast analysis based on individual stocks and standard sets of portfolios. We also summarize differences between results for developed and emerging markets in an international setting.

Large dimensional factor models
In this section, we consider a conditional linear factor model with time-varying coefficients. We work in a multi-period economy (Hansen and Richard (1987)) under an approximate factor structure (Chamberlain and Rothschild (1983)) with a continuum of assets as in GOS. Such a construction is close to the setting advocated by Al-Najjar (1995, 1999a in a static framework with an exact factor structure. He discusses several key advantages of using a continuum economy in arbitrage pricing and risk decomposition. A key advantage is robustness of factor structures to asset repackaging (Al-Najjar (1999b); see GOS for a proof).
Let F t , with t = 1, 2, ..., be the information available to investors. Without loss of generality, the continuum of assets is represented by the interval [0, 1]. The excess returns R t (γ) of asset γ ∈ [0, 1] at dates t = 1, 2, ... satisfy the conditional linear factor model: where vector f t gathers the values of K factors at date t. The intercept a t (γ) and factor sensitivities b t (γ) are F t−1 -measurable. The error terms ε t (γ) have mean zero and are uncorrelated with the factors conditionally on information F t−1 , and satisfy a weak cross-sectional dependence condition in the form of an upper bound on the largest eigenvalue of the error variance-covariance matrix (Assumption APR.3 in GOS). Moreover, we exclude asymptotic arbitrage opportunities in the economy: there are no portfolios that approximate arbitrage opportunities when the number of assets increases. In this setting, GOS show that the following 4 asset pricing restriction holds: almost surely in probability, where random vector ν t ∈ R K is unique and is F t−1 -measurable. The asset is the vector of the conditional risk premia.
To have an empirically workable version of Equations (1) and (2), we define how the conditioning information is generated and how the model coefficients depend on it via simple functional specifications.
The conditioning information F t−1 contains Z t−1 and Z t−1 (γ), for all γ ∈ [0, 1], where the vector of lagged instruments Z t−1 ∈ R p is common to all stocks, the vector of lagged instruments Z t−1 (γ) ∈ R q is specific to stock γ, and Z t = {Z t , Z t−1 , ...}. Vector Z t−1 may include the constant and past observations of the factors and some additional variables such as macroeconomic variables. Vector Z t−1 (γ) may include past observations of firm characteristics and stock returns. To end up with a linear regression model, we assume that: (i) the vector of factor loadings b t (γ) is a linear function of lagged instruments Z t−1 (Shanken (1990), Ferson and Harvey (1991), Dumas and Solnik (1995)) and Z t−1 (γ) (Avramov and Chordia (2006)); (ii) the vector of risk premia λ t is a linear function of lagged instruments Z t−1 (Dumas and Solnik (1995), Cochrane (1996), Jagannathan and Wang (1996)); (iii) the conditional expectation of f t given the information F t−1 depends on Z t−1 only and is linear (as e.g. if Z t follows a Vector Autoregressive (VAR) model of order 1).
To ensure that cross-sectional limits exist and are invariant to reordering of the assets, we introduce a sampling scheme as in GOS. We formalize it so that observable assets are random draws from an underlying population (Andrews (2005)). In particular, we rely on a sample of n assets by randomly drawing i.i.d.
indices γ i from the population according to a probability distribution G on [0, 1]. For any n, T ∈ N, the excess returns are R i,t = R t (γ i ). Similarly, let a i,t = a t (γ i ) and b i,t = b t (γ i ) be the coefficients, ε i,t = ε t (γ i ) be the error terms, and Z i,t = Z t (γ i ) be the stock specific instruments. By random sampling, we get a random coefficient panel model (e.g. Hsiao (2003), Chapter 6). Such a formalisation is key to reconcile finance theory and econometric modelling. Without drawings, cross-sectional averages such as 1 n i b i correspond to determinist sequences since the b i s are then parameters. Working with the standard arbitrage pricing theory with approximate factor models has three issues as discussed in GOS. First, crosssectional limits depend in general on the ordering of the financial assets, and there is no natural ordering between assets (firms). Second, we cannot exploit either a law of large numbers to guarantee existence of those limits, nor a central limit theorem to get distributional results. Third, the asset pricing restrictions derived under no arbitrage are not testable, the so-called Shanken critique (Shanken (1982)).
In available datasets, we do not observe asset returns for all firms at all dates due to entry and exit from the panel. Thus, we account for the unbalanced nature of the panel through a collection of indicator variables I i,t , for any asset i at time t. We define I i,t = 1 if the return of asset i is observable at date t, and 0 otherwise (Connor and Korajczyk (1987)). In GOS and GOS2, we assume independence between the observability and return generating processes conditionally on observed variables, which amounts to a missing-at-random hypothesis (Rubin (1976)). A more general assumption would imply model nonlinearities.
Through appropriate redefinitions of the regressors and coefficients, GOS show that we can rewrite the model for Equations (1) and (2) as a generic random coefficient panel model: where the regressor x i,t = x 1,i,t , x 2,i,t has dimension d = d 1 + d 2 and includes vectors In vector x 2,i,t , the first components with common instruments take the interpretation of scaled factors (Cochrane (2005)), while the second components do not since they depend on i. The symmetric matrix X t = [X t,k,l ] ∈ R p×p is such that X t,k,l = Z 2 t−1,k , if k = l, and X t,k,l = 2Z t−1,k Z t−1,l , otherwise, k, l = 1, . . . , p, where Z t,k denotes the kth component of the vector Z t . The vector-half operator vech [·] stacks the elements of the lower triangular part of a p × p matrix as a p (p + 1) /2 × 1 vector (see Chapter 2 in Magnus and Neudecker (2007) for properties of this matrix tool). The vector of coefficients β i is a function of asset specific and common instrument parameters defining the dynamics of a i,t and b i,t in (2) and (3). We give their explicit forms in Section 3.2 where we first need them. Those forms are compatible 6 with restrictions from asymptotic no arbitrage. In matrix notation, for any asset i, we have where R i and ε i are T × 1 vectors. Regression (4) contains both explanatory variables that are common across assets (scaled factors) and asset-specific regressors. It includes models with time-invariant coefficients as a particular case. In such a case, the regressor reduces to x t = (1, f t ) and is common across assets, and the regression coefficient vector is

Inference in models with observable factors
In Section 3.1, we first develop the diagnostic criterion for omitted factors before looking at the determination of the number of omitted factors. In Section 3.2, we discuss how to estimate risk premia via the two-pass regression methodology. We dedicate Section 3.3 to testing asset pricing restrictions. Throughout this chapter we assume a joint asymptotics in which the cross-sectional dimension n and time series dimension T grow such that: with 0 < γ ≤γ ≤ ∞. Our asymptotics accommodate, among others, schemes such that T is much smaller than n (i.e.,γ < 1), or n and T are comparable (γ =γ = 1). We omit technical details and refer the reader to GOS and GOS2 which give all required assumptions and proofs.

Model diagnostic
In order to build the diagnostic criterion for the set of observable factors, we consider the following rival models: M 1 : the linear regression model (4), where the errors (ε i,t ) are weakly cross-sectionally dependent, and M 2 : the linear regression model (4), where the errors (ε i,t ) satisfy a factor structure.
Under model M 1 , the observable factors fully capture the systematic risk, and the error terms do not feature pervasive forms of cross-sectional dependence (see GOS2 for a formal definition). This zero-factor case in the error terms should hold when we choose factors and instruments in a time-varying setting to build the variables x i,t , so that their explanatory power for excess returns achieves weak cross-sectional correlation in the noise terms. Working with weak cross-sectional dependence, namely an approximate factor structure, avoids the stronger assumption of zero cross-sectional correlations, namely an exact factor structure. Under model M 2 , the following error factor structure holds where the m × 1 vector h t includes unobservable (i.e., latent or hidden) factors, and the u i,t are weakly cross-sectionally correlated. The latent factors may include scaled factors to cover latent time-varying factor loadings with common instruments. Such scaled factors may come from mispecification of the functional form of the time-varying betas. Since the factors h t are unobservable by definition, we cannot tell from the output of the diagnostic criterion whether they are pure or scaled factors. We cannot allow for latent time-varying factor loadings with stock-specific instruments in our setting because of identification issues in disentangling time-varying loadings and latent factors. This lack of identification means that we cannot estimate a generic time-varying unobservable structure from the spectral properties of a covariance matrix alone. A recent proposal in the direction of a functional specification for a time-varying θ i,t is the Instrumented Principal Components Analysis (IPCA) of Kelly et al. (2017, which we review in Section 4 together with other inference approaches for latent factor models with time-varying betas. IPCA works with linear loading specifications, with balanced panels, and without observable factors. The m × 1 vector θ i corresponds to the factor loadings, and the number m of common factors is assumed unknown. In vector notation, we have: where H is the T × m matrix of unobservable factor values, and u i is a T × 1 vector. In Equation (7), the θ i s and h t s are also called interactive fixed effects in the panel literature (Pesaran (2006), Bai (2009), Moon and Weidner (2015)). King et al. (1994) use them to capture the correlation between the unanticipated innovations in observable descriptors of economic performance (e.g. industrial production, inflation, etc.) and stock returns. Gobillon and Magnac (2016) use them to get treatment effect estimates in regional policy evaluation and characterize the generic bias induced by the popular difference-in-differences procedure. To diagnose the absence of omitted interactive effects is clearly important when applying the difference-indifferences procedure.
To compute the diagnostic criterion that checks whether the error terms are weakly cross-sectionally correlated or share at least one common factor, we estimate the generic panel model (4) by OLS applied asset by asset, and we get estimatorŝ In available panels, the random sample size T i for asset i can be small, and the inversion of matrix Q x,i can be numerically unstable. To avoid unreliable estimates of β i , we apply a trimming approach as in is the condition number of the d × d matrixQ x,i , µ 1 Q x,i and µ d Q x,i are its largest, resp. its smallest, eigenvalue and τ i,T = T /T i . We assume that the two sequences χ 1,T > 0 and χ 2,T > 0 diverge asymptotically. The first trimming condition {CN Q x,i ≤ χ 1,T } keeps in the cross-section only assets for which the time-series regression is not too badly conditioned. A too large value of CN Q x,i indicates multicollinearity problems and ill-conditioning (Belsley et al. (2004), Greene (2008)). The second trimming condition {τ i,T ≤ χ 2,T } keeps in the cross-section only assets for which the time series is not too short. We also use both trimming conditions in the proofs of the asymptotic results.
We consider the following diagnostic criterion: where the vectorε i of dimension T gathers the valuesε i,t = I i,tεi,t , the penalty g(n, T ) is such that g(n, T ) → 0 and C 2 n,T g(n, T ) → ∞, when n, T → ∞, for C 2 n,T = min{n, T }. Bai and Ng (2002) consider several simple potential candidates for the penalty g(n, T ). In vectorε i , the unavailable residuals are replaced by zeros. Then we use the following model selection rule: we select M 1 if ξ < 0, and we select M 2 if ξ > 0, since (a) P r (ξ < 0 | M 1 ) → 1, and (b) P r (ξ > 0 | M 2 ) → 1, when n, T → ∞ under the asymptotics (6) withγ ≤ 1. This characterizes an asymptotically valid model selection rule, which treats both models symmetrically. The model selection rule is valid since (a) and (b) imply P r (M 1 |ξ < 0) =P r (ξ < 0|M 1 ) P r (M 1 ) [P r (ξ < 0|M 1 ) P r (M 1 ) + P r (ξ < 0|M 2 ) P r (M 2 )] −1 → 1, as n, T → ∞, by Bayes Theorem. Similarly, we have P r (M 2 |ξ > 0) → 1. The diagnostic criterion (10) does not deliver a testing procedure since we do not use a critical region based on an asymptotic distribution and a chosen significance level. The zero threshold corresponds to an implicit critical value yielding a test size asymptotically equal to zero since P r(ξ < 0|M 1 ) → 1. The selection procedure is conservative in diagnosing zero factor by construction. We do not allow type I error under M 1 asymptotically, and really want to ensure that there is no omitted factor as required in the APT setting. This also means that we will not suffer from false discoveries related to a multiple testing problem (see e.g. Barras et al. (2010), ) in our empirical application where we consider a large variety of factor models on monthly and quarterly data. However, a possibility to achieve p-values is to use a randomisation procedure as in Trapani (2018) (see Bandi and Corradi (2014) and Corradi and Swanson (2006) for recent applications in econometrics). This type of procedure controls for an error of the first type, conditional on the information provided by the sample and under a randomness induced by auxiliary experiments.
The proof of the validity of the selection rule in GOS2 shows that the largest eigenvalue in (10) vanishes at a faster rate than the penalization term under M 1 when n and T go to infinity. Under M 1 , we expect a vanishing largest eigenvalue because of a lack of a common signal in the error terms. The negative penalizing term −g(n, T ) dominates in (10), and this explains why we select the first model when ξ is negative. On the contrary, the largest eigenvalue remains bounded from below away from zero under M 2 when n and T go to infinity. Under M 2 , we have at least one non vanishing eigenvalue because of a common signal due to omitted factors. The largest eigenvalue dominates in (10), and this explains why we select the second model when ξ is positive. We can interpret the criterion (10) as the adjusted gain in fit including a single additional (unobservable) factor in model M 1 . We can rewrite (10) as ξ = SS 0 − SS 1 − g (n, T ), i,t is the sum of squared errors and SS 1 = min where the minimization is w.r.t. the vectors H ∈ R T of factor values and Θ = (θ 1 , ..., θ n ) ∈ R n of factor loadings in a one-factor model, subject to the normalization constraint H H T = 1. Indeed, the largest eigenvalue µ 1 1 nT i 1 χ iε iε i corresponds to the difference between SS 0 and SS 1 . Furthermore, the criterion ξ is equal to the difference of the penalized criteria for zero-and one-factor models defined in Bai and Ng (2002) applied on the residuals. Indeed, ξ = P C (0) − P C (1) , where P C (0) = SS 0 , and P C (1) = SS 1 + g (n, T ) .
The proof of the validity of the selection rule in GOS2 exploits an asymptotic upper bound on the largest eigenvalue of a symmetric matrix based on similar arguments as in Geman (1980), Yin et al. (1988), and Bai and Yin (1993) without exploiting distributional results from random matrix theory valid when n is comparable with T . This exemplifies a key difference with the proportional asymptotics used in Onatski (2010) or Ahn and Horenstein (2013) for balanced panel without observable factors. The asymptotic setting in GOS2 accommodates the condition T /n = o (1) by havingγ < 1 in (6), which agrees with the "large n, small T " case that we face in empirical applications (for example, ten thousand individual stocks monitored over forty-five years of either monthly, or quarterly, returns). Another key difference of GOS2 w.r.t. the rest of the literature is the handling of unbalanced panels. We need to address explicitly the presence of the observability indicators I i,t and the trimming devices 1 χ i in the proofs of the asymptotic results. The recent literature on the properties of the two-pass regressions for fixed n and large T shows that the presence of useless factors (Kan and Zhang (1999a,b), Gospodinov et al. (2014)) or weak factor loadings under M 2 , we face an identification issue. We cannot distinguish such a specification from M 1 since it corresponds to a particular approximate factor structure. Again the selection rule remains the same since the probability of taking the right decision still approaches 1. In a "large n, large T " setting, the estimates of the risk premia are unchanged since we keep an approximate factor structure and risk remuneration is only attached to the strong factors in an APT framework. Here the presence of weak factors affects the pattern of the weak cross-sectional dependence and this only impacts variance estimator obtained by thresholding in the next section. On the contrary, if we have weak factors among the observable factors, Anatolyev and Mikusheva (2018) show that the conventional two-pass estimation procedure delivers inconsistent estimates of the risk premia. In the time-invariant case, they propose a modified procedure based on sample-splitting instrumental variables estimation at the second pass, and examine its asymptotic distribution. In the previous lines, we have studied a diagnostic criterion to check whether the error terms are weakly cross-sectionally correlated or share at least one unobservable common factor. Hereafter we aim at answering: do we have one, two, or more omitted factors? The design of the diagnostic criterion to check whether the error terms share exactly k unobservable common factors or share at least k + 1 unobservable common factors follows the same mechanics. We consider the following rival models: M 1 (k) : the linear regression model (4), where the errors (ε i,t ) satisfy a factor structure with exactly k unobservable factors, and M 2 (k) : the linear regression model (4), where the errors (ε i,t ) satisfy a factor structure with at least k + 1 unobservable factors.
The following model selection rule extends the previous one. We select The proof of the validity of that second selection rule in GOS2 is more complicated than the proof of the first one. We need additional arguments to derive an asymptotic upper bound when we look at the (k + 1)th eigenvalue of a symmetric matrix, and this further complexity explains why we have developed the first selection rule as a special case. We rely on the Courant-Fischer min-max theorem and Courant-Fischer formula which represent eigenvalues as solutions of constrained quadratic optimization problems.
We cannot directly exploit standard inequalities or bounds associated to a norm when we investigate the asymptotic behavior of the spectrum beyond its largest element. We know that the largest eigenvalue µ 1 (A) of a symmetric positive semi-definite matrix A is equal to its operator norm. There is no such useful norm interpretation for the smaller eigenvalues µ k (A), k ≥ 2. We cannot directly exploit standard inequalities or bounds associated to a norm when we investigate the asymptotic behavior of the spectrum beyond its largest element. We cannot either exploit distributional results from random matrix theory since we also allow for T /n = o(1). The slow convergence rate √ T for the individual estimatesβ i also complicates the proof. In the presence of homogeneous regression coefficients β i = β for all i, the estimateβ in Bai (2009) is straightforward due to the small asymptotic contribution of (β −β). Hence 13 our results also apply to diagnose the absence of omitted interactive effects before applying a difference-indifferences procedure to avoid bias. The approach of Onatski (2010) requires the convergence of the upper edge of the spectrum (i.e., the first k largest eigenvalues of the covariance matrix, with k/T = o(1)) to a constant, while the approach of Ahn and Horenstein (2013) requires an asymptotic lower bound on the eigenvalues. Extending these approaches for residuals of an unbalanced panel when T /n = o(1) looks challenging.
We can use the results of the selection rule in order to estimate the number of unobservable factors. It suffices to choose the minimum k such that ξ(k) < 0. GOS2 state the consistency of that estimate even in the presence of a degenerate distribution of the eigenvalues, and without needing to give conditions on the growth rate of the maximum possible number kmax of factors as in Onatski (2010) and Ahn and Horenstein (2013). We believe that this is a strong advantage since there are many possible choices for kmax and the estimated number of factors is sometimes sensitive to the choice of kmax (see the simulation results in those papers).

Estimation of conditional risk premia
In the linear regression (4), the coefficients associated to x 1,i,t and x 2,i,t are β i = β 1,i , β 2,i such that where parameter matrices stacks the elements of a m × n matrix as a mn × 1 vector.
for any matrix A ∈ R p×p (see Chapter 3 in Magnus and Neudecker (2007)). The commutation matrix W p,q is such that vec , for any matrix A ∈ R p×q , and W p := W p,p . When Z t = 1 and Z i,t = 0, we have p = 1 and q = 0, and the model in (4) reduces to a factor model with time-invariant coefficients and regressor x t common across assets (scaled factors).
In Equations (12), the d 1 × 1 vector β 1,i is a linear transformation of the d 2 × 1 vector β 2,i . This clarifies that the asset pricing restriction (2) implies a constraint on the distribution of the random vector β i via its 14 support. The coefficients of the linear transformation depend on matrix Λ−F . For the purpose of estimating the loading coefficients of the risk premia in matrix Λ, we rewrite the parameter restrictions as: Furthermore, we can relate the d 1 × Kp matrix β 3,i to the vector β 2,i (see GOS): where the d 1 pK × d 2 block-diagonal matrix of constants J a is given by (14) is instrumental in deriving the asymptotic results. In the time-invariant setting, β 1,i = a i , β 2,i = β 3,i = b i , and the matrix J is equal to Hence, Equations (13) and (14) in the time-varying case are the counterparts of restriction Let us now describe the two-pass approach to estimate the factor risk premia. The first pass consists in computing time-series OLS estimatorsβ i , and was described in the previous subsection (see Equation (9)). The second pass consists in computing a cross-sectional estimator of ν by regressing theβ 1,i on theβ 3,i keeping non-trimmed assets only. We use a multivariate WLS approach. The weights are esti- We use the To estimate C ν , we use the multivariate OLS estimator , a first-step estimator with unit weights. The WLS estimator is:ν Weighting accounts for the statistical precision of the first-pass estimates and includes trimming. The final estimator of the risk premia isλ t =ΛZ t−1 , where we deduceΛ from the relationship vec Λ =ν + vec F with the estimatorF obtained by a SUR In the time-invariant case, the estimator of the risk premia vector simplifies tô Hence, we estimate the model coefficients a i and b i by time series OLS regression, and the risk premium by cross-sectional WLS regression of theâ i s on theb i s augmented by the factor mean. Moreover, under conditional homoskedasticity σ ii,t = σ ii and a balanced panel Then, v i is directly proportional to σ ii , and we can simply pick the weights asŵ Shanken (1992)). In the time-invariant case, we can avoid the trimming on the condition number if we Starting from the asset pricing restriction E[R i,t ] = b i λ in the time-invariant case, another estimator of λ is This estimator is numerically equivalent toλ in the balanced case, where I i,t = 1 for all i and t. In the unbalanced case, it is equal toλ =ν +Q Estimatorλ is often studied by the literature (see, e.g., Shanken (1992), Kandel and Stambaugh (1995), Jagannathan and Wang (1998)), and is also consistent. Estimating E [f t ] with a simple average of the observed factor instead of a weighted average based on estimated betas simplifies the form of the asymptotic distribution in the unbalanced case. This explains our preference forλ overλ.
The estimatorν has a fast convergence rate √ nT and features an asymptotic bias term: . Bothβ 1,i andβ 3,i in the definition ofν contain an estimation error; forβ 3,i , this is the well-known Error-In-Variable (EIV) problem. The EIV problem does not impede consistency since we let T grow to infinity. However, it induces a bias termB ν /T which centers the asymptotic distribution ofν (see GOS for details). Ang, Liu, and Schwarz (2010) look at a maximum likelihood analysis with a single asymptotic treatment (large T , n fixed) and balanced panel under a particular approximate Gaussian factor structure (block diagonal covariance matrix of residuals) and time-invariant coefficients. Their setting further assumes that the factors have zero mean. Such an assumption givesλ =ν in a time-invariant setting. Under a zero mean (or a known mean, i.e., not to be estimated), the asymptotic variance ofλ corresponds to the asymptotic variance Σ ν ofν and the rate of convergence is √ nT . On the contrary, if we do not know the mean of the factor and need to estimate it, we haveλ =ν + 1 T f t . The asymptotic variance ofλ corresponds to the asymptotic variance Σ f of the sample average of the factors, and the rate of convergence is √ T . Jagannathan and Wang (2002) is an early reference on the impact of knowing or not the mean of the factors for asymptotic analysis. With an unknown mean, only the variability of the factor drives the asymptotic distribution ofλ, since the estimation error This result is an oracle property forλ, namely that its asymptotic distribution is the same irrespective of the knowledge of ν. This property is in sharp difference with the single asymptotics with a fixed n and T → ∞. In the balanced case and with homoskedastic errors for the time-invariant case, Theorem 1 of Shanken (1992) shows that the rate of convergence ofλ is √ T and that its asymptotic variance is Σ λ,n = Σ f + 1 n Σ ν,n , for fixed n and T → ∞.
The two components in Σ λ,n come from estimation of E[f t ] and ν, respectively (see also Theorem 1 in Jagannathan and Wang (1998), or Theorem 3.2 in Jagannathan, Skoulakis, and Wang (2009)). Letting n → ∞ gives Σ f under weak cross-sectional dependence. Thus, exploiting the full cross-section of assets improves efficiency asymptotically, and the positive definite matrix Σ λ,n − Σ f corresponds to the efficiency gain. Using a large number of assets instead of a small number of portfolios does help to eliminate the contribution coming from estimation of ν.
GOS suggest exploiting the analytical bias correctionB ν /T and using estimatorν B =ν − 1 TB ν instead ofν. In the time-invariant setting,λ B =ν B + 1 T t f t delivers a bias-free estimator of λ at order 1/T , which shares the same root-T asymptotic distribution asλ. We can relate that suggestion to bias-corrected estimation accounting for the well-known incidental parameter problem (Neyman and Scott (1948)) in the panel literature (see Lancaster (2000) for a review). To highlight the main idea, let us focus on the model with time-invariant coefficients. We can write the factor model under restriction a i = b i ν as In the likelihood setting of Hahn and Newey (2004) (see also Hahn and Kuersteiner (2002)), the b i s correspond to the individual fixed effects and ν to the common parameter of interest.
Available results on the fixed-effect approach tell us: (i) the ML estimator of ν is inconsistent if n goes to infinity while T is held fixed, (ii) the ML estimator of ν is asymptotically biased even if T grows at the same rate as n, (iii) an analytical bias correction may yield an estimator of ν that is root-(nT ) asymptotically normal and centered at the truth if T grows faster than n 1/3 . The two-pass estimatorsν andν Finally, let us discuss confidence intervals. Their construction for components ofΛ to achieve valid asymptotic coverage is straightforward through the use of standard HAC estimators such as in Newey and West (1994) or Andrews and Monahan (1992). The construction of confidence intervals for the components ofν is more difficult. Indeed, the asymptotic variance involves a limiting double cross-sectional sum scaled by n and not n 2 . A naive approach consists in replacing unknown quantities by any consistent estimator, but this does not work here. To handle this, GOS rely on recent proposals in the statistical literature on consistent estimation of large-dimensional sparse covariance matrices by hard thresholding (Bickel and Levina (2008), El Karoui (2008)). Fan, Liao, and Mincheva (2011)

Testing asset pricing restrictions
From (13), the null hypothesis underlying the asset pricing restriction (2) is where β 1 (γ) and β 3 (γ) are defined as β 1,i and β 3,i in Equations (12) and (13) replacing B (γ) and C (γ) for B i and C i . This null hypothesis is written on the continuum of assets. Under H 0 , we have Since we estimate ν via the WLS cross-sectional regression of the estimatesβ 1,i on the estimatesβ 3,i , GOS suggest a test based on the weighted sum of squared residuals SSR of the cross-sectional regression. The weighted SSR isQ e = 1 n Let us now introduce the fol- where the recentering term simplifies toB ξ = d 1 thanks to the weighting scheme. Under the null hypothesis H 0 and asymptotics (6) with 1/γ < 2, GOS provê ξ nT ⇒ N (0, Σ ξ ), and show how to get a feasible testing procedure by exploiting a consistent estimate of the asymptotic variance Σ ξ .
Finally, GOS derive a test for the null hypothesis when the factors come from tradable assets, i.e., are portfolio excess returns: against the alternative hypothesis We only have to substituteQ a = 1 n iβ 1,iŵ iβ1,i forQ e . Since the constrained form of β 1,i in (13) comes from (2), we take directly into account the no-arbitrage restrictions imposed by the model specification. This gives an extension of Gibbons, Ross and Shanken (1989) to the conditional case with double asymptotics.
Implementing the original GRS test, which uses a weighting matrix corresponding to an inverted estimated large variance-covariance matrix, becomes quickly problematic. We face a large number nd 1 of restrictions; each β 1,i is of dimension d 1 × 1, and the estimated covariance matrix to invert is of dimension nd 1 × nd 1 .
We expect to compensate the potential loss of power induced by a diagonal weighting via the larger number of restrictions since we use a large number n of assets. Monte Carlo simulations in GOS show that the test exhibits good power properties against both risk-based and non risk-based alternatives (e.g. MacKinlay (1995)) already for a thousand assets with a time series dimension similar to the one in the empirical analysis. Fan et al. (2015) discuss power enhancement in high dimensional cross-sectional tests.
Finally, let us mention that Ma et al. (forthcoming, 2019) has recently developed a test of the nullity of the alphas when the alphas and betas are taken as smooth functions of time in a "large n, large T " setting (see Li and Yang (2011) and Ang and Kristensen (2012) for the "small n, large T " case).

Inference in models with unobservable factors
In this section, we review methodologies for inference in large-dimensional conditional factor models when the factor values are unobserved by the econometrician. In this setting, we cannot use standard Principal Component Analysis (PCA) to extract the factor space since PCA assumes either constant factor loadings (Stock and Watson (2002a,b), Bai (2003Bai ( , 2009), Ng (2002, 2006)) or at most small instabilities in the factors loadings (Bates et al. (2013)). Intuitively, invalidity of standard PCA in a conditional framework comes from a factor with time-varying loading being potentially confused with multiple static factors.
The model specification is: where f t is the K-dimensional vector of the unobservable factor values. Several estimation approaches are based on assuming that the intercepts a i,t and the factor loadings b i,t are either parametric or nonparametric functions of lagged time-varying observable variables, with or without imposing the no arbitrage restrictions.
Among the parametric approaches, Kelly et al. (2017 model the coefficients as linear functions of characteristics plus some noise term: where Z i,t is a vector of observed characteristics, A and B are a vector and a matrix of unknown parameters, and ν i,t and η i,t are unobservable noise terms. By plugging (18) and (19) into (17) Component Analysis (IPCA) estimator of Kelly et al. (2017 is obtained by minimizing a LS criterion w.r.t. parameter matrices A, B and the factor values f t , t = 1, ..., T , i.e., min subject to the static normalization restrictions that the matrix BB is diagonal, and They propose an iterative numerical procedure to perform the optimization. In the nonparametric setting, an early contribution is provided by an extension of the model considered by Connor and Linton (2007) and Connor et al. (2012), in which the factor loadings are functions of observed covariates: where b k (·) is an unknown smooth function of observable variable Z k,i,t−1 , for k = 1, ..., K (see where the b i,k (·) are smooth functions and Z t is a vector of observed variables, common across assets. Pelger i.e., characteristics-based portfolio returns. They minimize the penalized criterion min where θ 1 is the L 1 norm of the parameter vector θ = (θ b , θ f ) . The inferential theory for this estimator is unknown.
In the rest of this section, we review a recent proposal for inference in time-varying statistical factor models developed by Gagliardini and Ma (2019). As the focus of these authors is on the problem of conducting inference on the conditional factor space, including its dimension, the adopted nonparametric framework is general regarding the beta dynamics and encompasses the linear and nonlinear beta specifications of e.g. Here we review the main results in the simpler framework with constant number of conditional factors, and refer the reader to Gagliardini and Ma (2019) for the more general setting, the regularity conditions, and the 22 derivation of the results.
After imposing the no arbitrage restrictions a i,t = b i,t ν t (see Equation (2)), model (17) becomes: Gagliardini and Ma (2019) assume that the m-dimensional lagged instruments Z i,t−1 are cross-sectionally uncorrelated with errors and correlated with betas under a full rank condition: for all t, which implies the order condition m ≥ K. Being the limit of a cross-sectional average of predetermined variables, the matrix Γ t is measurable w.r.t. the information set G t−1 of aggregate shocks at time t − 1, i.e., the non-diversifiable shocks (see Gagliardini and Ma (2019) for more details). It is assumed that G t is generated by the vector process Z t , and that the econometrician observes Z t . Under (22), it holds: Process ξ t is identifiable from population moments. Its conditional variance given Thus, the number of non-zero eigenvalues of V [ξ t |G t−1 ] equals the number of factors K, and the associated eigenvectors span the column space of matrix Γ t . This allows to identify g t from (23) up to a non-singular transformation matrix which is G t−1 -measurable. In fact, the conditional factor space in model (17) is identifiable up to transformations f t → c t−1 + A t−1 f t , where c t−1 and A t−1 are G t−1 -measurable. Gagliardini and Ma (2019) show how to choose a convenient normalization of the factor space in order to get a closed form expression for g t . Specifically, Gagliardini and Ma (2019) normalize the latent factors such that E[f t |G t−1 ] = 0, and Γ t = J t , where J t is the matrix whose columns are the K normalized eigenvectors of V [ξ t |G t−1 ] associated with the non-zero eigenvalues. Under this normalization, where Ω t is any m × m positive definite matrix measurable w.r.t. G t−1 , In a setting with n, T → ∞, Gagliardini and Ma (2019) define consistent estimators for the conditional factor space and for its dimension by replacing population (cross-sectional, or conditional) expectations with sample analogues.
, where µ k (·) denotes the k largest eigenvalue of a symmetric matrix. Estimatork t exploits the idea of the eigenvalue ratio test but in a different context than Ahn and Horenstein (2013), sinceV [ξ t |G t−1 ] is not a large-dimensional sample variance-covariance matrix.
In the framework of Kelly et al. (2017, Equation (19) which implies a constraint on the time variation of Γ t . In Gu et al. (2019), we have Hence, for large n, the autoencoder mapping for the latent factor essentially amounts to fixing a normalization of the latent factor such that some k × k block of Q −1 Z,t−1 Γ t is time-invariant, so that we can write g t as a time-invariant function of x t . This function is linear. The methodology of Gagliardini and Ma (2019) does not impose constraints on the dynamics of Γ t and deploys the structural linear link between ξ t and g t conditional on G t−1 .
Among the possible extensions of the model setting, we can further impose a group structure on the latent factor space in order to accommodate the presence of both common pervasive factors and groupspecific pervasive factors. The former affect all series in the panel while the latter have an impact on subgroups of assets. The subgroups can correspond to e.g. economic sectors, asset classes, markets or countries. Andreou et al. (2019) develop inference procedures in a "large n, large T " setting for estimating the common and group-specific numbers of factors and the corresponding spanned factor spaces.
Finally, let us mention that there is also work on inference for large-dimensional models with unobservable factors with high frequency data (Fan et al. (2016a), Ait-Sahalia and Xiu (2017), Pelger (2019a,b)), but extensions to the conditional case with instruments still need to be developed there. Fan and Kim (2018) discuss how to robustify such methods and Kim and Fan (2019) how to impose a dynamic parametric structure based on a factor GARCH-Itô process for prediction. Li et al. (2019) develop tests for deciding whether a large cross-section of asset prices obey an exact factor structure at the times of factor jumps with infill asymptotics.
Besides, if we face a short time series panel (for example a 5-year window) without the availability of high-frequency data, asymptotics with fixed T and large n are better suited. Keeping T fixed impedes consistent estimation of the risk premia, and inference has to focus on ex-post risk premia (Shanken (1992)).
Examples of work in that direction are Zaffaroni (2019)

Empirical findings
In this section, we provide some empirical findings based on a large number of financial factor models. We provide contrast analysis based on monthly returns of individual stocks and standard sets of portfolios.

Data description and factor models
Our dataset includes monthly excess returns of stocks data from CRSP database. We proxy the risk free rate with the monthly 30-day T-bill beginning-of-month yield. We exclude financial firms (Standard Industrial Classification Codes between 6000 and 6999) as in Fama and French (2008). The dataset after matching CRSP and Compustat contents comprises n = 10, 827 stocks, and covers the period from July 1963 to December 2017 with T = 654 months. Table 1 provides the distribution of asset returns of stocks w.r.t. T i the number of observations available for each asset. About half of the stocks in the panel have more than 120 monthly return observations. We observe the complete time series of observations for only 2% of the stocks. Table 2 provides the distribution of stocks w.r.t. the classification of industry in Ferson and Harvey (1999). The two most frequent industry categories are Professional Services (2282) and Healthcare (1194), while the two less frequent ones are Aerospace (64) and Paper (129).
For comparison purposes with a standard methodology for small n, we consider i) the 25 Fama-French (FF) portfolios and ii) the 44 industry (Indu.) portfolios excluding four financial sectors (banking, insurance, real estate, and trading) as base assets.
We consider several linear factor models that involve financial variables (see GOS2 for models with 25 macroeconomic variables). Table 3 lists the financial models, the factors, the number of parameters to estimate, and the trimmed cross-sectional dimensions n χ considering time-invariant and time-varying specifications. The three factors of Fama and French (1993) are the monthly excess return r m,t on CRSP NYSE/AMEX/Nasdaq value-weighted market portfolio over the risk free rate, and the monthly returns on zero-investment factor-mimicking portfolios for size and book-to-market, denoted by r smb,t and r hml,t .
We denote the monthly returns on portfolio for momentum by r mom,t (Carhart (1997)). The two operative profitability factors of Fama and French (2015) are the difference between monthly returns on diversified portfolios with robust and weak profitability and investments, and with low and high investment stocks, denoted by r rmw,t and r cma,t . We have downloaded the time series of these factors from the website of Kenneth French. We also consider a model with long-only factors, that should be more immune to market imperfections (e.g., transaction costs). We build the long-only factors from the six FF research portfolios available on the website of Ken French. The excess return of the "Small" factor (denoted by r s,t ) is the average excess return of the three small portfolios, and the excess return of the "Value" factor (denoted by r h,t ) is the average excess return of the two value portfolios. Furthermore, we include quality minus junk (qmj t ) and bet against beta (bab t ) factors as described in Asness et al. (2019) and Frazzini and Pedersen (2014). The factor return qmj t is the average return on the two high quality portfolios minus the average return on the two low quality (junk) portfolios. The bet against beta factor is a portfolio that is long low-beta securities and short high-beta securities. We have downloaded these data from the website of AQR. As additional specifications, from the website of Kenneth French, we consider the two reversal factors which are monthly returns on portfolios for short-term and long-term reversals, denoted by r strev,t and r ltrev,t .
To account for time-varying alphas, betas and risk premia, we use a conditional specification based on one common variable and a firm-level variable. We take the instruments Z t−1 = (1, divY t−1 ) , where divY t−1 is the lagged dividend yield and the asset specific instrument bm i,t−1 corresponds to the lagged book-to-market equity of firm i. We compute the book-to-market equity of firm i as defined in logarithmic terms by Fama and French (2008). We compute the firm characteristics from Compustat as in the appendix of Fama and French (2008). We consider all the assets for which the book-to-market equity is always positive over the sample period, as in Fama and French (2008). The number of assets reduces to n = 8, 570 for the estimation of the time-varying specifications. We refer to Avramov and Chordia (2006) for convincing theoretical and empirical arguments in favor of the chosen conditional specification. In Table 3, the vector x i,t has maximum dimension d = 23 (CAR and REV model), and parsimony explains why we have not included e.g. the size of firm i as an additional stock specific instrument. We have downloaded time series of portfolio characteristics from the website of Kenneth French.

Time-invariant specifications
Let us first focus on the time-invariant specifications (i.e. Z t = 1 and Z i,t = 0) in order to benchmark the results of the next section for the time-varying specifications. We use χ 1,T = 15 as advocated by Greene (2008), together with χ 2,T = 546/60. The number of assets whose condition number is below 15 is 7, 754 for each model specification.
First, we compute the diagnostic criterion and the number k of omitted factors. Table 4 reports the contribution in percentage of the first eigenvalue µ 1 with respect to the variance of normalized residuals 1 n χ T i 1 χ iε iε i , that is equal to one by construction under our variance scaling for each time series of residuals. We also report the selected number of omitted factors k, the contribution of the first k eigenvalues, i.e., k j=1 µ j , and the incremental contribution of the (k + 1)-th eigenvalue µ k+1 . For each model, we also specify the numerical value of the penalisation function g (n χ , T ), as defined in GOS2. The number k of omitted factors is larger than one for the most popular financial models, e.g., the CAPM (Sharpe (1964)) and the three-factor Fama-French model (FF). On the contrary, for the the four-factor Carhart (1997) model (CAR), the five-factor Fama-French model (5FF), quality minus junk (QMJ), and models involving the reversal factors, we find no omitted latent factor. We observe that adding observable factors helps to reduce the contribution of the first eigenvalue µ 1 to the variance of residuals. However, when we face latent factors, the omitted systematic contribution k j=1 µ j only accounts for a small proportion of the residual variance. For instance, we find k = 1 omitted factors in the CAPM. That latent factor only contributes to µ 1 = 2.39% of the residual variance. Figures 1, 3, 5 summarize this information graphically by displaying the penalized scree plots and the plots of cumulated eigenvalues for the CAPM, the three Fama-French factors model and the four-factor CAR model. For instance, µ 2 = 1.54% lies below the horizontal line g (n χ , T ) = 1.55% in Panel A for the time-invariant CAPM, so that k = 1. In Panel B for the time-invariant CAPM, the vertical bar µ 1 + µ 2 = 3.93% is divided into the contribution of µ 1 = 2.39% (light grey area) and that of µ 2 = 1.54% (dark grey area). Figure 2 Panel A displays the scree plots of squared eigenvalues for the CAPM and the square g 2 (n χ , T ) of the penalisation function relative to the squared Frobenius norm T l=1 µ 2 l 1 nT i 1 χ iε iε i . By construction, the conclusion of the number of omitted factor is the same as for the scree plot shown in Figure 1. For example, we get that the sum of the square of the two first eigenvalues accounts for 21.45% of the square of the Frobenius norm for the time-invariant CAPM. Thus, the two latent factors are much more representative of the off-diagonal components. We conclude similarly for the time-invariant FF model (see Figure 4), even if the correlation explanation provided by the single omitted factor is lower.
Tables 5-8 gather the estimated annual risk premia and the estimates of the components of ν, with the corresponding confidence intervals at 95% level, for the ten time-invariant models listed in Table 3. For individual stocks, we use bias-corrected estimates for λ and ν. In order to build the confidence intervals, we use the HAC estimatorsΣ f defined as in Newey and West (1994) andΣ ν defined in GOS. When we consider the 25 FF and 44 Indu. portfolios as base assets, we use asymptotics for fixed n and T → ∞. In particular, we compute the estimates of the variance-covariance matrices Σ λ,n and Σ ν,n defined in GOS. The estimated risk premia for the market factor are of the same magnitude and all positive across the three universes of assets and all financial models. In Table 7, for the four-factor CAR model and the individual stocks, the size factor is positively remunerated (3.5430%) and it is significantly different from zero. The value factor commands a significant negative reward (-4.9265%). The momentum factor is largely remunerated (8.0947%) and significantly different from zero. For the 25 FF portfolios, we observe that the size factor is not significantly positively remunerated while the value factor is significantly positively remunerated (2.5028% and 4.1996%). The momentum factor bears a significant positive reward (34.6689%). For λ m , λ smb , λ hml , we obtain similar inferential results when we consider the Fama-French model in Table 8. Our point estimates of λ m , λ smb , and λ hml for large n agree with Ang et al. (2010). Our point estimates and confidence intervals for λ m , λ smb , and λ hml agree with the results reported by Shanken and Zhou (2007) for the 25 FF portfolios. The large, but imprecise, estimate for the momentum premium when n = 25 comes from the estimate for ν mom (26.7559%) that is much larger and less accurate than the estimates for ν m , ν smb , and ν hml (0.9447%, -0.0225%, -0.3662%). Moreover, while the estimates of ν m , ν smb , and ν hml are statistically not significant for the 25 FF portfolios, the estimates of ν m , ν smb , and ν hml are statistically different from zero for individual stocks. In particular, the estimate of ν hml is large and negative. This explains the negative estimate on the value premium for individual stocks displayed in Table 7 they yield negative estimates of coefficient ν hml and value premium λ hml (albeit the latter not statistically significant). In Table 6, the 5FF model also exhibits large differences between estimated risk premia on individual stocks, FF and Indu. portfolios. For example, we get a significant λ rmw = 5.3198% for the FF portfolios and an insignificant λ rmw = 0.8911% for individual stocks. On the contrary, we get an insignificant λ cma = 0.8787% for the FF portfolios (with a large confidence interval) and a significant λ cma = −3.2867% for individual stocks. The estimated risk premia on the Indu. portfolios exhibit large confidence intervals. For example, we get insignificant λ rmw = 2.3817% and λ cma = −0.3614%.
The size, value, and momentum factors are tradable in theory. In practice, their implementation faces transaction costs due to rebalancing and short selling. A nonzero ν might capture these market imperfections (Cremers et al. (2012)). In Table 8, we also get zero estimates with the FF portfolios except for value, and nonzero estimates with the Indu. portfolios and the individual stocks for market and value, when we use a time-invariant model with long-only factors derived from the FF methodology. Market imperfections are probably not the key drivers here (see Frazzini et al. (2012)) for empirical support based on live trading data from a large institutional money manager).
A potential explanation of the discrepancies revealed in Tables 5-8 between individual stocks and the FF portfolios is the much larger heterogeneity of the factor loadings for the former. As already discussed in Lewellen et al. (2010), the FF portfolio betas are all concentrated in the middle of the cross-sectional distribution obtained from the individual stocks. Creating portfolios with an ad hoc methodology distorts information by shrinking the dispersion of betas. The estimation results for the momentum factor on the FF portfolios exemplify the problems related to a small number of portfolios exhibiting a tight factor structure.
Another potential explanation of the discrepancy revealed in Tables 5-8 is the effect of model misspecification on the risk premia because of omitted factors as observed in Table 4 for the three-factor FF model.

Time-varying specifications
We use χ 1,T = 15 and χ 2,T = 546/60. The number of assets whose condition number is below 15 is often between 2,000 and 3,000, for instance 2,578 for the four-factor CAR model.
For the time-varying specifications of Table 3, we still find one omitted factor for the CAPM and the 4-factor MOM and REV model in Table 4. The other time-varying models passes the diagnostic criterion.
As already discussed in the Introduction, this diagnostic step is crucial to decide whether we can feel com-   (Jagannathan and Wang (1996), Lewellen and Nagel (2006), Boguth et al. (2011)) explains the discrepancy between the time-invariant estimate and the average over time. After trimming, we compute the risk premia on n χ = 2, 549 individual assets in the four-factor CAR model. The observed discrepancy w.r.t. the average over time is only marginally explained by the larger size of the stock universe used for the time-invariant estimates. The risk premia for the factors feature a counter-cyclical pattern most of the time. Indeed, these risk premia increase during economic contractions and decrease during economic booms. Gomes et al. (2003) and Zhang (2005) constructed equilibrium models exhibiting a counter-cyclical behavior in size and book-to-market effects. Furthermore, time-varying estimates of the value premium are negative and might take positive values because of the large confidence intervals around recessions. Growth firms are riskier in boom times because of their in-the-money growth options; value firms are riskier in recession times because of default risk. However, empirical evidence for such an interpretation is mixed. Some papers find that distress is related to size and book-to-market effects (Griffin andLemmon (2002), Vassalou andXing (2004)) while other papers find the opposite (Dichev (1998), Campbell et al. (2008)). Chava and Purnanandam (2010) find support for a positive relation and argued that conclusions regarding the risk return trade-off can change significantly depending on how the expected return is measured. Gomes and Schmid (2010) and Garlappi and Yan (2011) argue that financial leverage provides a rationale for a positive relation.
The time-varying estimates of the size premium are most of the time slightly positive.

Asset pricing restriction tests
As already discussed in Lewellen et al. (2010), the 25 FF portfolios have four-factor CAR market and momentum betas close to 1 and zero, respectively. As depicted in Figure 1 by Lewellen, Nagel, and Shanken (2010), this empirical concentration implies that it is easy to get artificially large estimatesρ 2 of the crosssectional R 2 for three-factor FF and four-factor CAR models. On the contrary, the observed heterogeneity in the betas coming from the individual stocks impedes this. This suggests that it is much less easy to find factors that explain the cross-sectional variation of expected excess returns on individual stocks than on portfolios. Reporting largeρ 2 , or small SSRQ e , when n is large, is much more impressive than when n is small.
Tables 9 and 10 gather the results for the tests of the asset pricing restrictions in factor models with time-invariant coefficients. When n is large, we prefer working with test statistics based on the SSRQ e instead ofρ 2 since the population R 2 is not well-defined with tradable factors under the null hypothesis (its denominator is zero). For the individual stocks, we compute the feasible test statistics based onQ e and Q a and hard thresholding to get consistent estimatesΣ ξ of covariance matrices, as well as their associated one-sided p-value. Our Monte Carlo simulations show that we need to set a stronger trimming level χ 2,T to compute the test statistic than to estimate the risk premium. We use χ 2,T = 546/240. For the 25 and 44 Indu.
portfolios, we compute weighted test statistics (Gibbons et al. (1989)) as well as their associated p-values.
For individual stocks, the test statistics reject both null hypotheses H 0 : a(γ) = b(γ) ν and H 0 : a(γ) = 0 for all specifications at 1% level. Similar conclusions are obtained when using the 25 FF portfolios as base assets. For the 44 Indu. portfolios, we do not reject the null hypothesis H 0 : a(γ) = b(γ) ν, but we reject H 0 : a(γ) = 0.
Tables 11 and 12 gather the results for tests of the asset pricing restrictions in time-varying specifications.
We do not report results for the FF long-only model since multicollinearity problems prevent us to estimate and test that model. Contrary to the time-invariant case, we do not report the values of the weighted test statistics (Gibbons et al. (1989)) computed for portfolios because of the numerical instability in the inversion of the covariance matrix. Instead, we report the values of the test statistics TQ e and TQ a . For individual stocks, the test statistics reject both null hypotheses H 0 : a(γ) = b(γ) ν and H 0 : a(γ) = 0 for all specifications at 1% level.
In addition, we compare the cross-sectional distributions ofβ 1,iβ 1,i , the idiosyncratic risk (square root of residual variance), and the estimated time-series coefficient of determinationρ 2 i (ratio of explained variance and total variance) for the time-varying specifications assuming the four-factor CAR model for the excess returns. We can view those estimates as measures of limits-to-arbitrage and missing factor impact (Pontiff (2006), Lam andWei (2011), Ang et al. (2009)). For each asset (either stock, or portfolio) i, we compute four measures: (i) the estimated time-series coefficient of determinationρ i of individual stocks tend to be a bit larger in the time-varying model than in the time-invariant one, as a result of the explanatory power that we gain by allowing for beta dynamics. Figures   13 and 14 show that the use of the FF portfolios also shrinks the dispersion ofρ 2 i , IdiV ol i , and SysRisk i , by a large amount. The distributions for the individual stocks and the 44 Indu. portfolios are comparable and share a wide support. Figure 15 plots the cross-sectional distributions ofβ 1,iβ 1,i for the three universes of assets. We observe a huge heterogeneity inβ 1,iβ 1,i for the individual stocks in Figure 15, similar to the one observed on IdiV ol i in Figure 14. We may face the presence of limits-to-arbitrage and missing factors in that case. On the contrary, the estimatesβ 1,iβ 1,i are concentrated close to zero for the 25 FF and 44 Indu. portfolios. The 25 FF portfolios exhibit smallβ 1,iβ 1,i , small idiosyncratic risks, and large estimateŝ ρ i compared to individual stocks as expected from the previous empirical results. Unreported preliminary results based on linear quantile regressions reveal that stocks with small size tend to yield largeβ 1,iβ 1,i , large idiosyncratic risks, and small estimatesρ i . We also find that firms with short observation periods tend to be associated with large values of both idiosyncratic and systematic risks (with a larger proportion of systematic risk to total risk), as well as small market capitalization.

Time-varying cost of equity
We can use the results in Section 3.2 for estimation and inference on the cost of equity in conditional factor models. We can estimate the time-varying cost of equity CE i,t = r f,t + b i,t λ t of firm i with CE i,t = r f,t +b i,tλ t , where r f,t is the risk-free rate. We have where ψ i,t = λ t ⊗ Z t−1 , λ t ⊗ Z i,t−1 . Standard results on OLS imply that estimatorβ i is asymptotically normal, x,i , and independent of estimatorΛ. Then, from the asymptotic normality results for the estimatorΛ, we deduce that

International equity data sets
All of the empirical findings on factor structure, asset pricing restrictions tests, risk premia estimation and cost of equity discussed so far are based on evidence from a large cross-sectional equity data set for the U.S. market. It is interesting to examine these important issues in a global context. Market integration and currency risk are two main factors that distinguish international financing and investment decisions from domestic ones. Second, CLS do not reject the asset pricing restrictions implied by these conditional asset pricing models in a large proportion of countries. That finding is not coming from lack of power or misspecification.
Third, CLS estimate the dynamics of the various risk premia across time and countries. They find heterogeneity in the level and volatility of risk premia across countries and across time. Specifically, value and momentum have more volatile risk premia than profitability and investment. More importantly, the magnitude and dynamics of factor non-tradability differ substantially between DMs and EMs.

Concluding remarks
This chapter has reviewed recent advances in econometrics for conditional factor models estimated on data sets with "large n, large T " in finance. The tools studied above are simple to implement and often similar to the ones used in a "small n, large T " setting. The asymptotic treatment however differs substantially.
We believe that extracting information directly from disaggregated data in finance will become increasing popular in the upcoming years. The current big data trend favours the development of new econometric 35 tools, the collection of data sets at the individual level, and the improvement of computation/storage powers. We report the frequency counts of the individual stocks w.r.t. their buckets of sample size T i . We report the frequency counts of the individual stocks w.r.t. their industry category. MOM and REV r m,t , r mom,t , r strev,t , r ltrev,t 4 7,754 17 2,569 5FF r m,t , r smb,t , r hml,t , r rmw,t , r cma,t 5 7,754 20 1,928 FF and REV r m,t , r smb,t , r hml,t , r strev,t , r ltrev,t 5 7,754 20 2,460 CAR and REV r m,t , r smb,t , r hml,t , r mom,t , r strev,t , r ltrev,t 6 7,754 23 2,019 For each financial model, we report the list of observable factors and the trimmed cross-sectional dimension n χ for estimation from monthly data. We use χ 1,T = 15 and χ 2,T = 546/60. For the time-invariant specifications, we report the number of observable factors K. The number of parameters to estimate is For the time-varying specifications, we give the dimension d of vector x i,t using Z t−1 = (1, divY t−1 ) and Z i,t−1 = bm i,t−1 . The table shows the contribution of the first eigenvalue µ 1 to the variance of normalised residuals, the number of omitted factors k, the contributions of the first k, and of the (k + 1)-th eigenvalues, and the penalty term. Panel A and B report the results for time-invariant and time-varying specifications, respectively. The table contains the estimates and the corresponding confidence intervals of the annualized risk premia and of the components of vector ν for the time-invariant specifications CAR and REV, and FF and REV estimated using the three different sets of base assets.    and Tâ V −1 aâ (Gibbons, Ross and Shanken (1989)), whereê andâ are n × 1 vectors with elementsê

39
and  We compute the statisticsΣ and Tâ V −1 aâ (Gibbons, Ross and Shanken (1989)), whereê andâ are n × 1 vectors with elementsê and          in the four-factor CAR model The figure plots the path of estimated annualized risk premiaλ m,t ,λ smb,t ,λ hml,t , andλ mom,t and their pointwise confidence intervals at 95% probability level in the four-factor CAR model. We also report the time-invariant (dashed horizontal line) and the average conditional estimate (solid horizontal line). We consider all stocks as base assets (n = 8, 570 and n χ = 2, 549). The vertical shaded areas denote recessions determined by the National Bureau of Economic Research (NBER). The recessions start at the peak of a business cycle and end at the trough. in the four-factor CAR model The figure plots the path of estimated annualized risk premiaλ m,t ,λ smb,t ,λ hml,t , andλ mom,t and their pointwise confidence intervals at 95% probability level in the four-factor CAR model. We use the returns of the 25 FF portfolios. We also report the time-invariant (dashed horizontal line) and the average conditional estimate (solid horizontal line). The vertical shaded areas denote recessions determined by the National Bureau of Economic Research (NBER).
55 Figure 9: Path of estimated annualized risk premia with n = 44 in the four-factor CAR model The figure plots the path of estimated annualized risk premiaλ m,t ,λ smb,t ,λ hml,t , andλ mom,t and their pointwise confidence intervals at 95% probability level in the four-factor CAR model. We use the returns of the 44 industry portfolios. We also report the timeinvariant (dashed horizontal line) and the average conditional estimate (solid horizontal line). The vertical shaded areas denote recessions determined by the National Bureau of Economic Research (NBER).
56 Figure 10: Path of estimated annualized ν t with n = 8, 570 in the four-factor CAR model The figure plots the path of estimated annualizedν m,t ,ν smb,t ,ν hml,t , andν mom,t and their pointwise confidence intervals at 95% probability level in the four-factor CAR model. We also report the time-invariant (dashed horizontal line) and the average conditional estimate (solid horizontal line). We consider all stocks as base assets (n = 8, 570 and n χ = 2, 549). The vertical shaded areas denote recessions determined by the National Bureau of Economic Research (NBER). The recessions start at the peak of a business cycle and end at the trough.
57 Figure 11: Path of estimated annualized ν t with n = 25 in the four-factor CAR model The figure plots the path of estimated annualizedν m,t ,ν smb,t ,ν hml,t , andν mom,t and their pointwise confidence intervals at 95% probability level in the four-factor CAR model. We use the returns of the 25 FF portfolios. We also report the time-invariant (dashed horizontal line) and the average conditional estimate (solid horizontal line). The vertical shaded areas denote recessions determined by the National Bureau of Economic Research (NBER).
58 Figure 12: Path of estimated annualized ν t with n = 44 in the four-factor CAR model The figure plots the path of estimated annualizedν m,t ,ν smb,t ,ν hml,t , andν mom,t and their pointwise confidence intervals at 95% probability level in the four-factor CAR model. We use the returns of the 44 industry portfolios. We also report the time-invariant (dashed horizontal line) and the average conditional estimate (solid horizontal line). The vertical shaded areas denote recessions determined by the National Bureau of Economic Research (NBER). Figure 13: Cross-sectional distributions ofρ 2 i ,ρ 2 ad,i , IdiV ol i , and SysRisk i for the time-invariant four-factor CAR model The figure displays the cross-sectional distributions of (i) the estimated coefficients of determinationρ 2 i , (ii) the estimated adjusted coefficients of determinationρ 2 ad,i , (iii) the idiosyncratic risks IdiV ol i , and (iv) the systematic risks SysRisk i for the individual stocks (box-plots), the 25 FF portfolios (red triangles) and the 44 Indu. portfolios (blue stars). Estimates are for the time-invariant four-factor CAR model. For comparison purposes, the cross-sectional distribution for individual stocks refers to the n χ = 2, 549 stocks that are used in the estimation of the time-varying model after trimming. Figure 14: Cross-sectional distributions ofρ 2 i ,ρ 2 ad,i , IdiV ol i , and SysRisk i for the time-varying fourfactor CAR model The figure displays the cross-sectional distributions of (i) the estimated coefficients of determinationρ 2 i , (ii) the estimated adjusted coefficients of determinationρ 2 ad,i , (iii) the idiosyncratic risks IdiV ol i , and (iv) the systematic risks SysRisk i for the n χ = 2, 549 individual stocks (box-plots), the 25 FF portfolios (red triangles) and the 44 Indu. portfolios (blue stars). Estimates are for the time-varying four-factor CAR model. The figure plots the path of estimated annualized costs of equity for Microsoft Inc., Apple Inc., Disney Walt, and Sony and their pointwise confidence intervals at 95% probability level. We use the time-varying four-factor CAR model estimated on individual stocks (n = 8, 570, n χ = 2, 549). We also report the average conditional estimate (solid horizontal line).