Adjusting the Tests for Skewness and Kurtosis for Distributional

Adjusting the Tests for Skewness and Kurtosis forDistributional Misspecifications

Anil K. Bera Gamini PremaratneUniversity of Illinois at Urbana−Champaign National University of Singapore

Abstract

The standard root−b1 test is widely used for testing skewness. However, several studies havedemonstrated that this test is not reliable for discriminating between symmetric andasymmetric distributions in the presence of excess kurtosis. The main reason for the failure ofthe standard root−b1 test is that its variance formula is derived under the assumption of noexcess kurtosis. In this paper we theoretically derive adjustment to the root−b1 test under theframework of Roa’s Score (or the Lagrange multiplier) test principle. Our adjusted testautomatically corrects the variance formula and does not lead to over− or under−rejection ofthe correct null hypothesis. In a similar way, we also suggest an adjusted test for kurtosis inthe presence of asymmetry. These tests are then applied to both simulated and real data. Thefinite sample performances of the adjusted tests are far superior compared to those of theirunadjusted counterparts

Published: 2001URL: http://www.business.uiuc.edu/Working_Papers/papers/01−0116.pdf

http://www.business.uiuc.edu/Working_Papers/papers/01-0116.pdf

Adjusting the Tests for Skewness and Kurtosis

for Distributional Misspeci�cations

Anil K. Bera1 and Gamini Premaratne

Department of Economics,University of Illinois,

1206 S. 6th Street,Champaign, IL 61820,USA.

Abstract

The standardpb1 test is widely used for testing skewness. However, several studies

have demonstrated that this test is not reliable for discriminating between symmetric

and asymmetric distributions in the presence of excess kurtosis. The main reason for

the failure of the standardpb1 test is that its variance formula is derived under the

assumption of no excess kurtosis. In this paper we theoretically derive adjustment

to thepb1 test under the framework of Rao's Score (or the Lagrange multiplier) test

principle. Our adjusted test automatically correct the variance formula and does not

lead to over- or under-rejection of the correct null hypothesis. In a similar way, we also

suggest an adjusted test for kurtosis in the presence of asymmetry. These tests are

then applied to both simulated and real data. The �nite sample performances of the

adjusted tests are far superior compared to those of their unadjusted counterparts.

1Correspondence to: Anil K. Bera, Department of Economics, University of Illinois, 1206 S. 6th Street,

Champaign, IL 61820, U.S.A.; email: [email protected]

1

1 Introduction

Suppose we have n independent observations y1; y2; : : : ; yn on a random variable Y . For

simplicity, assume that they are measured from their mean. Then the sample skewness and

kurtosis are de�ned, respectively, as

qb1 =

m3

m3=22

(1)

and

b2 =m4

m22

; (2)

where mj = n�1Pn

i=1 yji ; j = 2; 3; 4: The use of these coeÆcients in the statistics literature

goes back a long way. Pearson (1895) suggested that one use these coeÆcients to identify

a density within a family of distributions. Fisher (1930) and Pearson (1930) formulated

formal tests of normality based onpb1 and b2 separately. Using the Pearson (1895) family of

distributions and the Rao (1948) score (RS) test principle, Bera and Jarque (1981) derived

the following omnibus statistic for testing normality

RS = n

24pb12

6+(b2 � 3)2

24

35 ; (3)

jointly based onpb1 and b2 [see also D'Agostino and Pearson (1973), Cox and Hinkley

(1974, p. 42), Bowman and Shenton (1975), and Jarque and Bera (1987)]. Under the null

hypothesis of normality, RS is asymptotically distributed as �22.

It is true that the skewness and kurtosis coeÆcientspb1 and b2 jointly provide a powerful

scheme for assessing normality against a wide variety of alternatives. In many practical

applications, researchers are using n(pb1)

2=6 and n(b2� 3)2=24 separately as the �21 statistic

for assessing asymmetry and excess kurtosis [for instance, see A�eck and McDonald (1989),

Richardson and Smith (1993), and Christie-David and Chaudhury (2001)]. The purpose of

2

this paper is to demonstrate that n(pb1)

2=6 and n(b2 � 3)2=24 are not reliable measures of

asymmetry and excess kurtosis, respectively, that can be used as proper test statistics. For

instance, the asymptotic variance 6=n ofpb1 is valid only under the assumption of normality,

and the resulting test is not valid when the population kurtosis measure �2 6= 3. Similarly,

the test statistics n(b2 � 3)2=24 will not provide a reliable inference under asymmetry. Bera

and John (1983, p. 104) clearly stated that n(pb1)

2=6 and n(b2�3)2=24 are not pure tests of

skewness and kurtosis, since the asymptotic distribution of these statistics is derived using

full normality assumption. What is needed is to obtain the correct variances ofpb1 and b2

that are valid under distributional misspeci�cation. In this paper we derive correct variance

formulae using White's (1982) approach to make inference under misspeci�ed models. To

do this, we go back to the original derivation of (3) as given in Bera and Jarque (1981) and

Jarque and Bera (1987), using the Pearson family of distribution, and apply White's (1982)

modi�cation.

The plan of this paper is as follows: In the next section we derive the Rao (1948) score and

White's (1982) modi�cation to it. Section 3 discusses the adjusted RS test for asymmetry

in the presence of excess kurtosis, followed by a �nite sample performance study in Section

4. We do the same in sections 5 and 6 for the test of excess kurtosis allowing for asymmetry.

Section 7 presents an empirical illustration. In the last section we o�er a conclusion.

2 Rao's (1948) Score Test and White's (1982)

Modi�cation

Suppose there are n independent observations y1; y2; : : : ; yn with identical density function

f(y; �), where � is a p � 1 parameter vector with � 2 � � <p. It is assumed that f(y; �)

satis�es the regularity condition stated in Rao (1973, p. 364) and Ser ing (1980, p. 144).

The log-likelihood function, the score function, and the information matrix are then de�ned,

3

respectively, as

l(�) =nXi=1

ln f(yi; �) (4)

s(�) =@l(�)

@�(5)

I(�) = �E"@2l(�)

@�@�0

#: (6)

Let the hypothesis to be tested be H0 : h(�) = 0, where h(�) is an r � 1 vector function

of � with r � p. It is assumed that H(�) = @h(�)@�

has full rank, i.e., rank[H(�)] = r. Rao's

(1948) score statistic for testing H0 can be written as

RS = s(~�)0I(~�)�1s(~�); (7)

where ~� is the restricted maximum likelihood estimator (MLE) of �. Under H0, RS is asymp-

totically distributed as a �2r. RS is not a valid test statistic when the true data generating

process (DGP), g(y), di�ers from f(y; �). This is because some of the standard results break

down under distributional misspeci�cation. For instance, consider the information matrix

equality

Ef

"@lnf(y; �)

@�:@lnf(y; �)

@�0

#= Ef

"�@

2lnf(y; �)

@�@�0

#; (8)

in which Ef [:] denotes expectation under f(y; �). Let us de�ne

J(�g) = Eg

"@lnf(y; �)

@�� @lnf(y; �)

@�0

#(9)

K(�g) = Eg

"�@

2lnf(y; �)

@�@�0

#; (10)

where �g minimizes the Kullback-Leibler information criterion [see White (1982)]

IKL = Eg

"ln

g(y)

f(y; �)

#: (11)

4

One can easily see that J(�g) 6= K(�g), in general. Due to this divergence between J

and K, and taking expectation in (6) under f(y; �) instead of the under of the DGP g(y),

in some cases the standard RS test in (7) is not valid. White (1982) suggested the following

robust form of the RS statistic [see also Kent (1982)]:

RS� =1

ns(~�)0K(~�)�1H(~�)[H(~�)0B(~�)H(~�)]�1H(~�)0K(~�)�1s(~�); (12)

where B(�) = K(�)�1J(�)K(�)�1 and ~� denotes quasi MLE (QMLE). Under H0, RS� is

asymptotically distributed as �2r even under distributional misspeci�cation, i.e., when the

assumed density f(y; �) does not coincide with the true DGP g(y).

For our purposes we need only a special case of RS�. Let us partition � as � = (�01; �02)0,

where �1 and �2 are respectively, (p� r)� 1 and �2 is r� 1 vectors. Let H0 : �2 = �20, where

�20 is a known quantity. Let ~�1 be the QMLE under H0, and let ~� = (~�01; �020)

0. We partition

the score vector and other matrices as follows:

s(�) � s(�1; �2) = [s1(�)0; s2(�)

0]0;

I(�) =

264 I11(�) I12(�)

I21(�) I22(�)

375 ;

I�1(�) =

264 I11(�) I12(�)

I21(�) I22(�)

375 ; (13)

and so on. Under these notations the standard RS statistic in (7) for our special case is

given by

RS = s2(~�1; �20)0I22(~�)s2(~�1; �20):

This can also be written as

RS = s2(~�1; �20)0K22(~�)s2(~�1; �20) = ~s02 ~K

22~s2 (14)

5

after dropping the arguments. Now let us consider RS� in (12). Here h(�) = �2 � �20,

s(�) = (0; s2(~�)0), and H(�) = @h(�)

@�0= [0r�p�rIr]. Hence,

s(~�)0K�1(~�)H(~�)0 =�00 s2(~�)

0

�0B@ ~K11 ~K12

~K21 ~K22

1CA 0

Ir

!

= s2(~�)0 ~K22: (15)

We again drop the arguments of K�1 and other matrices whenever convenient, and \~"

denotes that the quantity is evaluated at � = ~�. Similarly,

H ~BH 0 =�0 Ir

�0B@ ~B11~B12

~B21~B22

1CA0B@ 0

Ir

1CA

= ~B22: (16)

Hence,

RS� =1

n~s02 ~K

22 ~B22~K22~s2: (17)

Note that K22 = (K22 � K21K11K12)�1. In one of our special cases, as will be seen later,

K21 = 0, i.e., K22 = K�122 . We can then simplify B as

B =

0B@ K11 0

0 K22

1CA0B@ J11 J12

J21 J22

1CA0B@ K11 0

0 K22

1CA (18)

=

0B@ K11J11K

11 K11J12K22

K22J21K11 K22J22K

22

1CA ; (19)

and hence, B22 = K22J22K22 = K�1

22 J22K�122 . Therefore, from (17),

RS� =1

n~s02 ~K�1

22 ( ~K�122

~J22 ~K�122 )

�1 ~K�122 ~s2

=1

n~s02 ~J

�122 ~s2: (20)

6

Therefore, when the matrix K(�) is block diagonal, i.e., K12 = 0, the robust version of

the RS statistic requires calculation of J22(�) only. Also, comparing (14) and (20) we see

that RS and RS� have similar forms except that the former uses the Hessian, whereas the

latter uses the outer product gradient form for the underlying information matrix.

3 An Adjusted Test for Asymmetry

Let us start with the Pearson (1895) system of distributions,

d ln f(y)

dy=

c1 � y

c0 � c1y + c2y2; (21)

where we write f(y; �) simply as f(y) with � = (c0; c1; c2)0. The normal distribution is a

special case of this when c1 = c2 = 0. As discussed in Premaratne and Bera (2000), c1 and c2

could be treated, respectively, as the \asymmetry" and \kurtosis" parameters. Suppose we

are interested only in testing asymmetry ignoring excess kurtosis. Then we can start with

the above system with c2 = 0, i.e.,

d ln f(y)

dy=

c1 � y

c0 � c1y: (22)

By integrating (22), it can be shown that (22) leads to gamma density, which is a skewed

distribution. Under c1 = 0, (22) becomes the normal density with mean zero and variance

c0, which is symmetric. Let us derive the Rao score test for c1 = 0 in (22). That will give

us a test for asymmetry without allowing for excess kurtosis. Let us denote

(�; y) =Z c1 � y

c0 � c1ydy =

Z c1 � y

�dy;

with � = (c0; c1)0 and � = c0� c1y. Then by integrating (22) for the ith observation, we have

ln f(yi) = const: (�; yi)

i.e.,

f(yi) = const: exp( (�; yi))

7

or

f(yi) =exp( (�; yi))R1

�1 exp( (�; y)) dy: (23)

Note that in the denominator of (23), we don't use a subscript for y, since this term is purely

a constant. Therefore, the log-likelihood function l(�) = lnQn

i=1 f(yi) can be written as

l(�) = �n ln�Z 1

�1exp (�; y) dy

�+

nXi=1

(�; yi)

= l1(�) + l2(�) (say): (24)

To get the Rao score test, let us �rst derive the score functions @l(�)@c0

and @l(�)@c1

under the null

hypothesis H0 : c1 = 0. One can easily see that

@l1(�)

@c1

��c1=0

= 0; (25)

@l2(�)

@c1=

nXi=1

Z �i � (c1 � yi)(�yi)�2i

dyi (26)

That is,

@l2(�)

@c1

��c1=0

=nXi=1

Z c0 � y2ic20

dyi

=nXi=1

(yic0� y3i3c20

): (27)

When evaluated at the restricted MLE ~� = ( ~c0; 0), this becomes

@l2(~�)

@c1=nm1

m2� nm3

3m22

;

where mj =Pn

i=1 yji =n; j = 1; 2; 3; and ~c0 = m2 under the null. Therefore, we have

@l(~�)

@c1=@l1(~�)

@c1+@l2(~�)

@c2=nm1

m2

� nm3

3m22

: (28)

To get the information matrix I(�) we use the J matrix where we take the expectation

with respect to the assumed density f(y), and denote it by, Jf , i.e.,

Jf = Ef

"@lnf(y; �)

@�� @lnf(y; �)

@�0

#: (29)

8

From (23), we can easily see that, under H0 : c1 = 0,

@ ln f(y)

@c0= � �2

2c20+

y2

2c20= � 1

2�2+

y2

2�22(30)

@ ln f(y)

@c1=

y

c0� y3

3c20=

y

�2+

y3

3�22; (31)

where we replace c0 by �2 = Ef (y2) with the density f(y) being N(0; c0). Using (30) and

(31), the Jf matrix of (29) can be calculated as follows:

Jf = Ef

264 (@ ln f

@c0)2 @ ln f

@c1

@ ln f@c1

* (@ ln f@c1

)2

375

= Ef

2666664

14�2

2

+ y4

4�42

� y2

2�32

� y2�2

2

+ y3

2�32

+ y3

6�32

� y5

6�42

* y2

�22

+ y6

9�42

� 2y4

3�32

3777775 : (32)

Under symmetry, Ef (y) = Ef (y3) = Ef(y

5) = 0, and under H0 : c1 = 0; Ef(y2) =

�2; Ef (y4) = 3�22; Ef(y

6) = 15�32. Therefore,

Jf =

264

14�2

2

+3�2

2

4�42

� �22�3

2

0

0 �2�22

+15�3

2

9�42

� 6�22

3�32

375

=

264 1

2�22

0

0 23�2

375 : (33)

From (28) we have the estimated score function s2(~�) under the null hypothesis as

s2(~�) =@l(~�)

@c1= �nm3

3m22

(34)

by putting the sample mean m1 = 0. Since the information matrix in (33) is block diagonal,

the standard score test is

9

RSc1 =n

9

m23

m42

3

2�̂2

=n

6

m3

m3=22

!2

=n

6(qb1)

2: (35)

Note that while deriving this test, we have taken the density (23) as the DGP and have

fully utilized it while taking the expectation in (32) . If f(y) is not the DGP, the test RSc1

will not be valid; in particular, the asymptotic variance formula in (35), V(pb1) = 6=n is

not correct. For instance, in the presence of excess kurtosis, there will be proportionately

more outliers. As Pearson (1963) showed, as we move from short-tailed to very long-tailed

distributions, the contribution to the moments will come increasingly from the extreme tails

of the frequency function. As a result, the variance of a statistic such aspb1 will increase,

and for fat-tailed distributions, 6=n will underestimate the variance.

Let us try to use the modi�ed score statistic in (17). For that we need the Jg and Kg

matrices. As we noted earlier, if K12 = 0, then the modi�ed score test takes a very simple

form (20). After some tedious algebra, we can see that

K12 = Eg

"@2 ln f

@c0@c1

#= 0: (36)

To calculate RS� in (20), we need to �nd

J22 = Eg

"@ ln f

@c1

#2: (37)

From our earlier derivation,@ ln f

@c1=

y

c0� y3

3c20:

Hence, @ ln f

@c1

!2

=y2

c20+

y6

9c40� 2y4

3c30;

10

and

Eg

@ ln f

@c1

!2

=�2c20

+�69c40

� 2�43c30

=1

�2+

�69�42

� 2�43�32

:

Therefore,

~J22 =1

m2+

m6

9m42

� 2m4

3m32

: (38)

Using (34) and (38), RS� in (20) for testing H0 : c1 = 0 can be expressed as

RS�c1 =n

9

m23m

�32h

m�12 + 1

9m6m

�42 � 2

3m4m

�32

i

= nm2

3m�32h

9 +m6m�32 � 6m4m

�22

i

= n(pb1)

2h9 +m6m

�32 � 6m4m

�22

i : (39)

Note that if we impose the normality assumption, then the population counterpart of the

denominator in (39) is

9 + �6��32 � 6�4�

�22 = 9 + 15� 6:3 = 9 + 15� 18 = 6;

as in (35). Basically, the construction of the adjusted statistic RS�c1 indicates that asymp-

totically, an estimate of the variance of thepb1 that is valid under excess kurtosis is

1

n[9 +m6m

�32 � 6m4m

�22 ]: (40)

Let us again consider the term

9 + �6��32 � 6�4�

�22 : (41)

It di�ers from the variance formula under normality by

9 + �6�32 � 6�4�

�22 � 6 = 3 + �6�

�32 � 6�4�

�22 : (42)

11

Let us de�ne the generalized notion of kurtosis as

�2r =�2r+2

�r+12

r = 1; 2; : : :

so that

�2 =�4�22; �4 =

�6�32:

Then (42) is

3 + �4 � 6�2:

It can be shown that for leptokurtic distributions, 3 + �4 � 6�2 � 0. For example, with t7

density we have �2 = 7=5, �4 = 49=5, �6 = 343; then 3 + �4 � 6�2 = 98. Therefore, the

usual variance formula will underestimate the variance ofpb1, and we will reject the null

hypothesis of symmetry too frequently in the presence of excess kurtosis.

We can obtain the formula (40) or its population counterpart

1

n(9 + �6�

�32 � 6�4�

�22 ) (43)

in an alternative way.

Let g(w) = g(w1; w2; : : : ; wk) be a continuous function of random variables w1; w2; w3; : : : ; wk.

A general formula for the asymptotic variance of g(w) is [see, for example, Stuart and Ord

(1994, p. 350)]

V [g(w)] =kX

i=1

g0i(w)V (wi) +kXi6=

kXj=1

g0i(w)g0j(w):Cov(wi; wj) + o(n�1); (44)

where g0i(w) =@g(w)@wi

; i = 1; 2; : : : ; k are in terms of population values. We take g(w1; w2) =

m3=m3=22 . Hence using (44),

V (qb1) =

@g

@m3

!2

V (m2) +

@g

@m2

!2

V (m3) + 2

@2g

@m3@m2

!Cov(m2; m3):

Under symmetry, we utilize �3 = �5 = 0 and have

@g

@m3=

1

�3=22

12

@g

@m2= � 3�3

2�5=22

= 0

@2g

@m3@m2

= � 3

2�5=22

:

Hence,

V (m3) =1

n(�6 � �23 + 9�32 � 6�2�4)

=1

n(�6 + 9�32 � 6�2�4)

Cov(m2; m3) =1

n(�5 � �2�3 + 6�2�1�2 � 2�1�4 � 3�3�2)

= 0:

Therefore,

V (qb1) =

1

�3=22

!21

n(�6 + 9�32 � 6�2�4)

=1

n(9 + �6�

�32 � 6�4�

�22 ); (45)

which is same as (43). Godfrey and Orme (1991) used this approach to develop their test

for skewness, which is asymptotically valid under non-normality.

4 Comparison of the Standard and Adjusted Tests

for Asymmetry: Some Simulation Results

For our simulation study, we generated data under a variety of scenarios. To study the size

properties, we �rst generated data under the normal distribution, for which the standardpb1 provides the ideal test. We used the tabulated critical values for the standard

pb1 test

[see Pearson and Hartley (1976, p. 183)]. Some of these critical values are interpolated. For

our adjusted test RS�c1 in (39), we used the asymptotic �21 critical values. The standard test

is somewhat oversized whereas the adjusted test has a size very close to the nominal levels

13

of 1% and 5%. In fact, the sizes are even lower. This may due to the fact that here we are

using an \unnecessary" adjustment [see Equations (42) and (45)] to the variance ofpb1 and

thereby overestimate the variance which leads to under-rejection. The estimated sizes of the

modi�ed test look better as we increase the sample size.

The next results are for Student's t7-distribution. As explained earlier, for this case, the

standard test grossly underestimates the variance, leading to a false rejection of asymmetry

too often. The quantity in (43) is (9 + �6��32 � 6�4�

�22 )=n = 104=n, which is much larger

than the standard value 6=n. The rejection probabilities for our adjusted test are again close

to the nominal levels 1 and 5% and are in fact, lower than those values. As we noticed for

the data generated under normality, the rejection probabilities, in general, get closer to the

nominal levels as the sample size increases. The following DGP Beta(2,2) is a platykurtic

(i.e., �2 < 3) symmetric distribution. Here, possibly 6=n overestimates the variance, and the

standard test has much lower estimated sizes; the adjusted test also have lower sizes, but

these are much closer to the 1 and 5% values. We then generated data using the procedure of

Ramberg, Dudewicz, Tadikamalla, and Mykytka (1979). Their technique can generate data

for any given level of skewness and kurtosis. We generated data usingp�1 = 0, �2 = 7:0. The

results for this distribution are qualitatively similar to those obtained for the t7 distribution.

Since for this distribution the excess kurtosis is larger than the t7-density, the performance

of the standard test is even worse whereas the behavior of the modi�ed test does not change.

The last two DGP are from symmetric leptokurtic distributions, namely, Laplace and logistic

distributions. Again, the standard test rejects the correct hypothesis of symmetry too often

whereas our modi�cation corrects this over-rejection.

Table 2 provides some simulation results for the power of the tests. The �rst two DGPs are

under positively skewed distributions (�24 and Beta(1,2)), and the estimated probabilities are

the estimated powers. Both the standard and the modi�ed tests have very good power. For

14

Table 1: Estimated Sizes of the Skewness Tests with 5000 Replications

DGP and Standard Test Modi�ed Test

Sample Sizes 1% 5% 1% 5%

DGP: N(0,1) (p�1 = 0; �2 = 3:0)

50 2.12 9.92 0.18 3.66100 2.00 9.90 0.58 4.58200 1.70 10.14 0.50 4.66250 2.16 10.38 0.70 4.74

DGP: t7(p�1 = 0; �2 = 5:0)

50 16.18 31.18 0.26 3.42100 20.28 35.70 0.42 3.42200 25.86 41.70 0.44 3.64250 27.06 42.80 0.38 4.10

DGP: Beta(2,2) (p�1 = 0; �2 = 2:14)

50 0.22 3.30 0.42 3.94100 0.16 2.72 0.50 4.38200 0.04 2.26 0.74 4.28250 0.14 2.26 0.86 4.86

DGP: Generated Data (p�1 = 0; �2 = 7:0)

50 26.2 42.78 0.32 3.22100 34.2 50.82 0.24 3.28200 43.2 57.80 0.36 3.18250 43.5 58.74 0.30 3.66

DGP: Laplace(� = 2) (p�1 = 0; �2 = 6:0)

50 25.72 43.54 0.38 3.82100 31.40 48.78 0.50 3.68200 36.26 52.00 0.38 3.94250 38.60 53.92 0.28 3.62

DGP: Logistic(1,2)(p�1 = 0; �2 = 4:2)

50 12.02 27.14 0.32 3.40100 14.88 30.14 0.36 4.00200 17.92 33.54 0.42 4.30250 19.70 36.72 0.50 3.94

15

Table 2: Estimated Powers of Skewness Tests with 5000 Replications



DGP: �24 (p�1 = 1:41; �2 = 6)

50 71.02 90.38 13.66 50.02100 98.80 99.90 46.80 78.90200 100.00 100.00 74.92 91.08250 100.00 100.00 81.44 92.70

DGP: Beta(1,2) (p�1 = 0:56; �2 = 3:27)

50 13.62 51.72 21.82 58.82100 46.88 85.66 74.60 93.44200 92.00 99.36 99.04 99.96250 98.08 99.94 99.94 100.00

DGP: Beta(2,1) (p�1 = �0:56; �2 = 0:67)

50 14.52 51.78 21.74 59.34100 45.62 85.10 73.74 93.40200 92.10 99.48 99.12 99.94250 98.32 99.92 99.90 100.00

the �2-data, the standard test has higher estimated powers, whereas for the Beta(1,2) data,

the modi�ed test performs better. For the negatively skewed distribution B(2,1), the results

are very similar to those for Beta(1,2) although their kurtosis structures are quite di�erent.

These results reinforce our assertion that the standardpb1 test is not a reliable test

for asymmetry in the presence of excess kurtosis. We will wrongly reject the true null of

symmetry too often. On the other hand, our simple variance-adjusted test work remarkably

well. It has very good �nite sample size and power properties.

16

5 An Adjusted Test for Excess Kurtosis

Here we start with,

d ln f(y)

dy=

�yc0 + c2y2

; (46)

which is the Pearson density [equation (21)] with c1 = 0. By integrating (46), we get

ln f(y) = const: ln(c0 + c2y2):(� 1

2c2);

i.e.,

f(y) = const:(c0 + c2y2)� 1

2c2 ;

which is a generalization of Student's t-density. The normal distribution (no excess kurtosis

with �2 = 3) is a special case of this, with c2 = 0. Let us �rst derive the standard RS test

for H0 : c2 = 0. We have

f(yi) =exp( (�; yi))R1

�1 exp( (�; y))dy; (47)

where we now de�ne (�; y) =R �y

c0+c2y2dy with � = (c0; c2)

0. The log-likelihood function can

be written as,

l(c0; c2) = �n ln�Z 1

�1exp( (�; y))dy

�+

nXi=1

(�; yi)

= l1(�) + l2(�) (say). (48)

Under c2 = 0,

@l1(�)

@c2= �n

R1�1 e

�y2

2c0y4

4c20

dy

R1�1 e

�y2

2c0dy

= � 1

4c20E(y4) = �n�4

4c20@l2(�)

@c2=

nXi=1

Z y3

c20dy =

Pni=1 y

4i

4c20=nm4

4c20:

17

Hence,@l(~�)

@c2=

n

4m22

(m4 � 3m22) =

n

4(m4

m22

� 3) =n

4(b2 � 3): (49)

To obtain the information matrix let us now derive

Jg = Eg

0B@ (@ ln f

@c0)2 (@ ln f

@c0

@ ln f@c2

)

* (@ ln f@c2

)2

1CA : (50)

From (47)

ln f(y) = � (�; y)� lnZ 1

�1exp( (�; y))dy (51)

and hence

@ ln f(y)

@c0

��c2=0

=Z y

c20dy �

R1�1 exp(� y2

2c0) y2

2c20

dyR1�1 exp(� y2

2c0) dy

=y2

2c20� �2

2c20; (52)

and

@ ln f(y)

@c2

��c2=0

=Z y3

c20dy �

R1�1 exp(� y2

2c0) y4

4c20

dyR1�1 exp(� y2

2c0) dy

=y4

4c20� �4

4c20: (53)

Now

Eg

@ ln f(y)

@c0

!2

= Eg

y2

2c20� �22c20

!2

=1

4c40Eg

�y4 + �22 � 2�2y

2�

=1

4�42(�4 + �22 � 2�22)

=1

4�42(�4 � �22); (54)

18

Eg

@ ln f(y)

@c2

!2

=1

16c40Eg(y

8 + �24 � 2�4y4)

=1

16�42(�8 � �24); (55)

Eg

"@ ln f(y)

@c0� @ ln f(y)

@c2

#=

1

8c40Eg[(y

2 � �2)(y4 � �4)]

=1

8c40Eg[y

6 � �4y2 � �2y

4 + �2�4]

=1

8�42[�6 � �4�2]: (56)

Therefore,

Jg =

264

14�4

2

(�4 � �22)1

8�42

(�6 � �4�2)

* 116�4

2

(�8 � �24)

375 : (57)

Evaluating J under the density f(y) � N(0; c0), we have

Jf =

264

12�2

2

18�4

2

(15� 3)�32

* 116�4

2

(105� 9)�42

375

=

264 1

2�22

32�2

32�2

6

375 ; (58)

and

J22f =

12�2

2

3�22

� 94�2

2

=1

2�22� 4�22

3=

2

3: (59)

Combining (49) and (59), the standard RS statistic is

RSc2 =n

16(b2 � 3)2 � 2

3= n

(b2 � 3)2

24: (60)

To derive the adjusted RS statistic, we need the K matrix,

19

Kg = �Eg

264

@2 ln f@c2

0

@2 ln f@c0@c2

* @2 ln f@c2

2

375 : (61)

After some algebra, it can be shown that

Eg

"�@

2 ln f

@c20

#=

2�22�32

+�44�42

� �224�42

� 2�22�32

=�4 � �224�42

: (62)

Note that this K11 term is the same as J11 in (57). Under H0, we have �4 = 3�22, and this

quantity reduces to

Eg

"�@

2 ln f

@c20

#=

3

4�22� 1

4�22=

1

2�22: (63)

Since@2 ln f

@c22

��c2=0

= � y6

3c30� �8

16c40+2�66c30

+�44c20

�44c20

; (64)

we have

K22 = Eg

"�@

2 ln f

@c22

#

=�63�32

+�816�42

� �63�32

� �2416�42

=1

16�42(�8 � �24)

= J22; (65)

as seen from (57). Finally, let us �nd K12. We have

@2 ln f

@c0@c2

��c2=0

= � y4

2c30� �6

8c40+2�44c30

+�44c20

�22c20

; (66)

and hence,

20

K12 = Eg

"�@

2 ln f

@c0@c2

#=

�42�32

+�68�42

� �42�32

� �48�32

=1

8�42(�6 � �4�2)

= J12;

as in (57). Thus, we have

Jg = Kg =

264

�4��224�4

2

18�4

2

(�6 � �4�2)

* 116�4

2

(�8 � �24)

375 : (67)

Under the null hypothesis of no excess kurtosis, i.e., under �4 = 3�22, the above matrix

reduces to

264

12�2

2

18�4

2

(�6 � 3�32)

* 116�4

2

(�8 � 9�42)

375 : (68)

Now we use expression (17) to obtain RS�c2. First, note that since Jg = Kg, K22B22K

22 =

K22. Therefore, (17) can be written as

RS�c2 =1

n~s02 ~K

22~s2; (69)

where K22 = (K22 �K21K�111 K12)

�1. We already had ~s2 =n4(b2 � 3) in (49). Now

K22 �K21K�111 K12 =

1

16�42(�8 � 9�42)�

2�2264�82

(�6 � 3�32)2

=1

32�62[2�8�

22 � 27�62 � �26 + 6�6�

32]: (70)

Let us simplify (70) further by de�ning �2r = �2r+r

�r+12

, r = 1; 2; : : :, i.e., �4 = �6�32

, and

�6 =�8�42

. Thus, we have

K22 �K21K�111 K12 =

1

32[2�8�42� 27� (

�6�32)2 + 6

�6�32]

=1

32[2�6 � 27� �2

4 + 6�4]: (71)

21

Therefore, from (69), the adjusted RS statistic is given by

RS�c2 =n

16(b2 � 3)2:

32

2b6 � 27� b24 + 6b4

= n2(b2 � 3)2

2b6 � 27� b24 + 6b4: (72)

If we impose normality, then �6 = 105; �4 = 15, and we can replace 2b6�27�b24+6b4 by (2�105�27�15�15+6�15) = 48. Then RS�c2 reduces to n(b2�3)2=24 the same as our unadjustedRS test for kurtosis in (60). There is one problem with this adjusted form. The variance

formula (71) does not explicitly take into account the possible presence ofasymmetry through

�1, although it does allow for some other higher moment (sixth and eighths) violations.

Therefore, we try our previous general formula (44) to obtain an alternative expression for

V (b2). After some derivations, we have [see, for example, Stuart and Ord (1994, p. 349)]

V (b2) =1

n[�8�42

+ 64�1 � 8�5�3�42

� 12�6�32

+ 99]: (73)

Utilizing �8 = 105�42 and �6 = 15�32 but still allowing for asymmetry through �3 and �5,

we can write

V (b2) =1

n[24 + 64�1 � 8

�5�3�42

]: (74)

Let us de�ne �2r+1 = �3�2r+3=�r+32 , i.e., �1 = �23=�

32 and �3 = �3�5=�

42. Then we have a

simple form for V (b2),

V (b2) =1

n[24 + 64�1 � 8�3]:

Therefore, the extra quantity due to asymmetry is n�1(64�1 � 8�3). Hence, the RS

statistic for testing excess kurtosis that takes into account the possible presence of asymmetry

can be written as

RS��c2 = n(b2 � 3)2

24 + 64b1 � 8b3: (75)

We will use this form of the adjusted test rather than that given in (72) in our simulation

study. We should, however, note that although we do not study RS�c2 any further since it

22

does not serve our special purpose of taking account of asymmetry, this statistic might be

better than the unadjusted test RSc2 in certain circumstances.

6 Comparison of the Standard and Adjusted Tests

for Excess Kurtosis: Some Simulation Results

In Table 3 we present the simulation results with 5000 replications. The design of our

simulation study is very similar to that in Table 1, except that here we study tests for excess

kurtosis instead of asymmetry. Again, for the standard b2 test [see equation (60)], we use

the tabulated values from Pearson and Hartley (1976, p. 184), and for the adjusted test

[see equation (75)], we use asymptotic �21 values. When the data are generated under the

normal distribution, the standard test has rejection probabilities nearly twice the nominal

sizes of 1 and 5%. The modi�ed test is also oversized for the 1% level but is undersized

when the nominal level is 5%. Under the t-distribution, when the standard test should

have been ideal (since there is no asymmetry), as expected, the test has reasonable power.

The adjusted test uses some redundant adjustments but still its estimated powers are very

close to those of the standard test and, in some cases, are even higher. Therefore, we do

not loose much power in using the adjusted test when the standard test would have been

good enough. The next data set is generated according to Ramberg et al. (1979). This

data set is from a non-normal distribution but with the same structure as the normal, i.e.,

symmetric and with no excess kurtosis. It is not surprising that the results are very similar

to those of the normal distribution. We then used generated data with no excess kurtosis but

with a modest amount of asymmetry (p�1 = 0:85). The results reveal the drawback of the

standard test{it rejects the true null too often, whereas the adjusted test also rejects more

than 1 or 5%, but not excessively. In the last two data sets, we have the same amount of

excess kurtosis (�2 � 3 = 4) but di�erent skewness structures. The estimated powers of the

23

adjusted test seem to be slightly higher than those of the standard test. Also, the adjusted

version appears to be less in uenced by the presence of asymmetry (�1 = 0:5) in terms of

having less uctuations in the estimated powers, as move from the symmetric to asymmetric

data. Although the �nite sample performance of our adjusted test is very good both in terms

of size and power, we should note that it requires the existence of higher moments.

7 An Empirical Illustration

Our empirical illustration is done with some �nancial data. Considerable attention has been

paid to the asymmetry and excess kurtosis of return distributions. Beedless and Simkowitz

(1980), Singleton and Wingender (1986), and Alles and King (1994) have all studied the

symmetry of returns using thepb1 test. DeFusco, Kareles, and Muralidar (1996) correctly

argued that the standardpb1 test is not accurate enough to provide correct information on

the skewness of the distribution. However, they did not suggest any modi�cation to thepb1

test that could be used in the presence of excess kurtosis. Mandelbrot (1963) was probably

the �rst to systematically study the excess kurtosis in the �nancial data, as he noted that the

price changes were usually too peaked and thick tailed compared to samples from the normal

distribution. To take into account this excess kurtosis, Mandelbrot (1963, 1967), Fama

(1965) and others suggested the use of stable distribution. Blattberg and Gonedes (1974)

and Tucker (1992) examined the use of Student's t-distribution. More recently, Premaratne

and Bera (2000) advocated the use of the Pearson type IV distribution, which is a special

case of (21) that takes care of both asymmetry and excess kurtosis in a simple way. However,

there still remains the question of testing for the presence of asymmetry and excess kurtosis

separately while allowing for the presence of the other e�ect. In our empirical illustration we

use tests discussed in the earlier sections.

We considered daily S & P 500 returns from the Center for Research in Security Prices

(CRSP) database from August 27, 1990. We tried two sample sizes, 300 and 500. The results

24

Table 3: Estimated Sizes and Powers of the Two Tests for Excess Kurtosis with 5000 Repli-cations



DGP: N(0,1) (�1 = 0; �2 = 3:0)

50 1.46 9.46 1.92 3.16100 1.80 9.66 2.18 3.60200 1.68 9.58 1.72 3.90250 2.12 10.48 1.70 4.52

DGP: t10 (p�1 = 0; �2 = 4:0)

50 10.72 24.76 7.20 10.92100 16.76 34.56 14.90 21.74200 30.56 51.36 29.76 39.42250 36.52 57.92 35.62 47.12

DGP: Generated Data (p�1 = 0; �2 = 3)

50 1.44 8.16 1.40 2.40100 1.54 9.08 1.62 3.32200 1.26 8.26 1.22 3.32250 1.12 7.84 0.86 3.08

DGP: Generated Data (p�1 = 0:85; �2 = 3:0)

50 3.98 15.46 2.98 4.42100 3.36 11.86 3.34 5.66200 3.38 14.80 2.46 5.58250 3.34 14.40 2.30 5.40

DGP: Generated data (p�1 = 0; �2 = 7:0)

50 11.06 36.70 20.38 29.06100 38.60 65.30 48.70 58.96200 72.47 77.06 91.44 81.70250 88.20 96.18 90.84 94.88

DGP: Generated Data (p�1 = 0:5; �2 = 7:0)

50 09.52 22.56 18.18 26.34100 36.22 52.14 46.14 56.56200 74.66 80.98 90.64 87.98250 84.62 95.00 87.98 93.26

25

are given in Table 4. For the standardpb1 test, the critical values at 1% were 0.329 and

0.255, respectively, for n = 300 and 500 [see Pearson and Hartley (1976, p. 183)]. Therefore,

we rejected symmetry. However, our adjustedpb1 test supported the null hypothesis of

symmetry using �21 values. As we noted in our simulation study, the standard

pb1 test

over-rejects the null of symmetry in the presence of excess kurtosis. The two-sided critical

values for b2 at 1% are (2.46,3.79) and (2.57, 3.60), respectively, for n = 300 and 500 [see

Pearson and Hartley (1976, p. 184)]. The two kurtosis tests gave similar results implying

excess kurtosis in the data for both sample sizes. We get similar results possibly due to either

the absence of asymmetry (as revealed by our adjustedpb1 test) or the strong presence of

excess kurtosis.

Table 4: Use of Tests for Daily Returns of the S & P 500 Index

Observations Test for Asymmetry Test for Kurtosispb1 Adjusted

pb1 b2 Adjusted b2

300 0.376� 3.10 3.86� 12.08�

500 0.264� 1.36 4.62� 57.87�

*Signi�cant at the 1% level.

8 Conclusion

In this paper we suggested some adjustments to the standardpb1 and b2 tests, allowing

for excess kurtosis and asymmetry, respectively. From our simulation results, we noticed

that the adjusted tests performed very well compared to their unadjusted counterparts. The

adjusted tests were quite immune to the misspeci�cations that can arise through thick tail

and the lack of symmetry. Of course, more simulation work and empirical applications are

needed to further con�rm the good properties of our suggested tests.

26

References

A�eck-Graves, J. and B. McDonald (1989), Nonnormalities and tests of asset pricing theo-

ries, Journal of Finance, 44, 889-907.

Alles, L,A, and K.L King (1994), Regularities in the variation of skewness in asset returns,

Journal of Financial Research, 17, 427-438.

Bera, A.K. and C. M. Jarque (1981), An eÆcient large-sample test for normality of observa-

tions and regression residuals, Working Paper in Economics and Econometrics, Number

40, The Australian National University, Canberra.

Bera, A.K. and S. John (1983), Test for multivariate normality with Pearson alternatives,

Communication Statistics{Theory and Methods, 12, 103-117.

Blattberg, R. and N. Gonedes (1974), A comparison of stable and Student's t distributions

as statistical models for stock prices, Journal of Business, 47, 244-280.

Bowman, K. O. and L. R. Shenton (1975), Omnibus contours for departures from normality

based onpb1 and b2, Biometrika, 62, 243-250.

Christie-David, R. and M. Chaudhry (2001), Coskewness and cokurtosis in futures markets,

Journal of Empirical Finance, 8, 55-81.

Cox, D.R. and D.V. Hinkley (1974), Theoretical Statistics, Chapman and Hall, London.

D' Agostino, R.D., and E.S. Pearson (1973), Tests for departure from normality: Empirical

results for the distributions of b2 andpb1, Biometrika, 60, 613-622.

DeFusco, R.A., G.V. Karels and K. Muralidhar (1996), Skewness persistence in U.S. com-

mon stock returns: Results from bootstrapping tests, Journal of Business Finance and

Accounting, 23, 1183-1195.

Fama, E.F. (1965), The behavior of stock market prices, Journal of Business, 38, 34-105.

Fisher, R.A. (1930), The moments of the distribution for normal samples of measures of

departure from normality, Proceedings of the Royal Society, A, 130, 16-28.

Godfrey, L.G. and C.D. Orme (1991), Testing for skewness of regression disturbances, Eco-

nomic Letters, 37, 31-34.

Jarque, C.M. and A.K. Bera (1987) Test for normality of observations and regression resid-

uals, International Statistical Review, 55, 163{172.

27

Kent, J.T. (1982), Robust properties of likelihood ratio tests, Biometrika, 69, 19-27.

Mandelbrot, B. (1963), The variation of certain speculative prices, Journal of Business, 36,

394-419.

Mandelbrot, B. (1967), The variation of some other speculative prices, Journal of Business,

40, 393-413.

Pearson, E.S. (1930), A further development of tests for normality, Biometrika, 22, 239-249.

Pearson, E.S. (1963), Some problems arising in approximating to probability distributions,

using moments, Biometrika, 50, 95-112.

Pearson, E.S. and H.O. Hartly (1976), Biometrika Tables for Statisticians, London, Biometrika

Trust.

Pearson, K. (1895), Contribution to the mathematical theory of evolution-II: Skewed varia-

tion in homogeneous material, Philosophical Transactions of the Royal Society of London,

A 186, 343-414.

Premaratne, G. and A. K. Bera (2000), Modeling asymmetry and excess kurtosis in stock

return data, OÆce of Research Working Paper Number 00-0123, University of Illinois.

Ramberg, J.S., E.J. Dudewicz, P.R. Tadikamalla and E.F. Mykytka (1979), A probability

distribution and its uses in �t, Technometrics, 21, 201-213

Rao, C.R. (1948), Large sample tests of statistical hypotheses concerning several parameters

with applications to problems of estimation, Proceedings of the Cambridge Philosophical

Society, 44, 50{57.

Rao, C.R. (1973), Linear Statistical Inference and Its Applications, John Wiley & Sons, New

York.

Richardson, M. and T. Smith (1993), A test for multivariate normality in stock returns,

Journal of Business, 66, 295-321.

Singleton, J.C. and J. Wingender (1986), Skewness persistence in common stock returns,

Journal of Financial and Quantitative Analysis, 21, 336-341.

Beedless, W.L., and M.A. Simkowitz (1980), Morphology of asset asymmetry, Journal of

Business Research, 8, 457-468.

28

Ser ing, R.J. (1980), Approximation Theorems of Mathematical Statistics. John Wiley &

Sons, New York.

Stuart, A. and J. K. Ord (1994), Kendall's Advanced Theory of Statistics Vol 1: Distribution

Theory, Edward Arnold, London.

Tucker, A. L. (1992), A reexamination of �nite and in�nite-variance distributions as models

of daily stock returns, Journal of Business and Economic Statistics, 10, 73-81.

White, H. (1982), Maximum likelihood estimation of misspeci�ed models, Econometrica, 50,

1{25.

29

Adjusting the Tests for Skewness and Kurtosis for Distributional

Documents