Statistical Estimation of High-Dimensional Portfolio Hiroyuki Oka*, Hiroshi Shiraishi Keio University Graduate School of Science and Technology Abstract Methods Results References Bai, Z., Liu, H., and Wong, W.-K. (2009). Enhancement of the applicability of markowitz’s portfolio optimization by utilizing random matrix theory. Mathematical Finance, 19(4):639–667. El Karoui, N. et al. (2010). High-dimensionality effects in the markowitz problem and other quadratic programs with linear constraints: Risk underestimation. The Annals of Statistics, 38(6):3487–3566. Bodnar, T., Parolya, N., and Schmid, W. (2014). Estimation of the global minimum variance portfolio in high dimensions. arXiv preprint arXiv:1406.0437. Fujikoshi, Y., Ulyanov, V. V., and Shimizu, R. (2010). Multivariate Statistics: High-Dimensional and Large-Sample Approximations (Wiley Series in Probabilityand Statistics). Wiley, 1 edition. Glombek, K. (2012). High-Dimensionality in Statistics and Portfolio Optimization. Josef Eul Verlag Gmbh. 0.5 0.6 0.7 0.8 −0.4 −0.2 0.0 0.2 efficient frontirer Portfolio standard deviation Portfolio mean 0.5 0.6 0.7 0.8 −0.4 −0.2 0.0 0.2 Objective n=100 sim100n[, 4] − theta100n[1] Density 0 2 4 6 0.0 0.2 0.4 0.6 n=500 sim500n[, 4] − theta500n[1] Density −0.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 n=1000 sim[, 4] − theta[1] Density −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 0.0 0.5 1.0 1.5 2.0 2.5 Background Future Work We introduce a Markowitz’s mean-variance optimal portfolio estimator from d × n data matrix under high dimensional set- ting where d is the number of assets and n is the sample size. When d/n converges in (0, 1), we show inconsistency of the traditional estimator and propose a consistent estimator. Let X be a asset returns (r.v.), and w be portfolio weights. Then, optimal portfolio weights are the solution of following optimization problem. max w∈R d u(w)= E[w X ] - 1 2γ Var(w X ) subject to w 1 d =1 Here, γ is a positive constant depending on individual investor. And then, using expressions E[X ]= μ and Var(X )= Σ, expected value and variance of optimal portfolio are expressed as follows. Expected value μ opt and variance σ 2 opt of optimal portfolio return are expressed as follows. μ opt (γ ; θ )= γ θ 1 - θ 2 2 θ 3 + θ 2 θ 3 σ 2 opt (γ ; θ )= γ 2 θ 1 - θ 2 2 θ 3 + 1 θ 3 A set of (σ opt ,μ opt ) is called ”efficient frontier ”. Here, θ is θ = θ 1 θ 2 θ 3 = μ Σ -1 μ 1 d Σ -1 μ 1 d Σ -1 1 d Fig.1 Efficient frontier:Following figure1 shows portfolio plot. Black points (·) are the portofolios in feasible area which can obtain by changing value of w. And red circles (◦) shows optimal portfolios which can obtain by changing value of γ . This figure shows that rational investor prefer lower risk when mean is same value, and higher return when risk is same value. Purpose:estimate efficient frontier in high dimension • d:fix, n →∞⇒ S -1 is consistent • n, d →∞⇒ S -1 is inconsistent To estimate optimal portfolio, we should estimate optimal portfolio parametor θ . It is assumed that n data vectors X 1 ,..., X n is following unknown distribution which has mean vector μ, and covariance matrix Σ. Then, ˜ θ , estimator of parmetor θ , is defined as follows. ˜ θ = ˜ θ 1 ˜ θ 2 ˜ θ 3 = ¯ X S -1 ¯ X 1 d S -1 ¯ X 1 d S -1 1 d For some mathematical argument, we put some assumptions. So, we would like to derive asymptotic property of optimal portfolio on the following assumption. n →∞, d →∞, d n → ρ ∈ (0, 1) ˆ μ opt (γ ; θ )= γ ˆ θ 1 - ˆ θ 2 2 ˆ θ 3 + ˆ θ 2 ˆ θ 3 , ˆ σ 2 opt (γ ; θ )= γ 2 ˆ θ 1 - ˆ θ 2 2 ˆ θ 3 + 1 ˆ θ 3 Fig.2 Consistency of estimators:Fig.2 shows histgrams of generated ˜ θ 1 and ˆ θ 1 . Red part is histgram of ˆ θ 1 , and blue part is histgram of ˜ θ 1 . In order of n = 100, 500, 1000 from the left, and this shows that ˆ θ 1 converge to true value θ 1 Fig.3 Asymptotic normality of √ n( ˆ θ 1 - θ 1 ):Fig.3 shows histgram of √ n( ˆ θ 1 - θ 1 ) and blue curve line of asymptotic normal distribution. In order of n = 100, 500, 1000 from the left, and this shows that √ n( ˆ θ 1 - θ 1 ) converge to objective normal distribution. Simulation Study • fix d/n =0.8 • increase n = 100, 500, 1000, and assume X t i.i.d. ∼ N d (μ, Σ) • components of μ are devided [-1, 1] into d equal parts • Σ has diagonal components 1、and the others 0.5 In this condition, generate ˜ θ 1 and ˆ θ 1 in 10000 times and confirm theoretical results. When an investor invest in d financial products, he consider how maximize the portfolio return for a given level of risk, defined as variance. Define optimal portfolio as follows. Because E[X ]= μ and Var(X )= Σ is generally unknown, it is considered that optimal portofolio should be estimated by d × n data matrix (X 1 ,..., X n ). We estimate μ by sample mean vector ¯ X , and Σ by sample covariance matrix S . ¯ X = 1 n n t=1 X t , S = 1 n n t=1 (X t - ¯ X )(X t - ¯ X ) In these days, because of expansion of market scale, the num- ber of assets d grows bigger. But, it is known that the bigger dimension size d grows, the worse estimator S -1 becomes. We introduce (n, d)-asymptotic properties of estimators of op- timal portfolio parametor θ . It is known that when X 1 ,..., X n i.i.d. ∼ (μ, Σ) and satisfy previous assumption 1∼4, ˜ θ converges following value as n goes to infinity. ˜ θ 1 a.s. → 1 1 - ρ α 1 + ρ 1 - ρ , ˜ θ 2 a.s. → 1 1 - ρ α 2 , ˜ θ 3 a.s. → 1 1 - ρ α 3 This shows that estimator using ¯ X and S is overestimated. So, we need to correct estimators. We propose the following estimator ˆ θ . Result1 Define estimator ˆ θ =( ˆ θ 1 , ˆ θ 2 , ˆ θ 3 ) as following expressions. ˆ θ 1 = 1 - d n ˜ θ 1 - d n , ˆ θ 2 = 1 - d n ˜ θ 2 , ˆ θ 3 = 1 - d n ˜ θ 3 Then, ˆ θ is consistent estimator of θ . This estimator ˆ θ has asymptotic normality. Result2 X 1 ,..., X n i.i.d. ∼ (μ, Σ) satisfy previous assump- tion 1∼4. Then, √ n( ˆ θ - θ ) converges to normal distribution as n goes to infinity. √ n( ˆ θ - θ ) D → N 3 (0, Ω) ((n, d)-asymptotic ) In this, Ω is following matrix. Ω = 1 1 - ρ 2α 2 1 +4α 1 +2ρ * * 2α 1 α 2 α 2 2 + α 1 α 3 + α 3 * 2α 1 α 3 2α 2 α 3 2α 2 3 Using this estimator ˆ θ , we make efficient frontier estimators ˆ μ opt and ˆ σ 2 opt as follows. These efficient frontier estimators ˆ μ opt and ˆ σ 2 opt has consis- tency. Result3 ˆ μ opt and ˆ σ 2 opt satisfy previous assumption 1∼4. Then, ˆ μ opt and ˆ σ 2 opt converge to following values as n goes to infinity. ˆ μ opt a.s. → μ opt , ˆ σ 2 opt a.s. → σ 2 opt ((n, d)-asymptotic ) • To derive asymptotic nomality of ˆ μ opt and ˆ σ 2 opt • To derive confidential interval and test of efficient frontier • To analyze efficient frontier from actual stock price data Assumption1 Let Z 1 ,..., Z n i.i.d. ∼ (0, I d ). Assume that entries of Z t are independent with 4+ moment. Assumption2 Then, data vectors X t can be expressed as X t = Σ 1 2 Z t + μ.(μ ∈ R d , Σ > 0) Assumption3 d is expressed with n, and d/n → ρ ∈ (0, 1) (n →∞) is satisfied. We call this limit operation ”(n, d)-asymptotic ”. Assumption4 Assume that θ converge the following constants α 1 , α 2 , α 3 . θ 1 → α 1 , θ 2 → α 2 , θ 3 → α 3 (d →∞) However, it is satisfied that α 1 , α 3 > 0, α 2 ∈ R, α 1 α 3 - α 2 2 > 0.