This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
8/7/2019 Sample size determination in estimating a covariance matrix
C o m p u t a t i o n a l S t a t i s t i c s & D a t a A n a l y s i s 5 ( 1 9 8 7 ) 1 8 5 - 1 9 2
N o r t h - H o l l a n d
18 5
S a m p l e s i z e d e t e r m i n a t i o n
i n e s t i m a t i n g a c o v a r i a n c e m a t r i x
P u s h p a L . G U P T A *
Department o f Mathematics , Universi ty of Maine, Orono, M E 04469, USA
R.D. GUPTA * *
Division of Mathem atics, Egineering, Com puter Science, U niversity of N ew Brunswick,
Saint John, N.B. , Canada E2L 4L5
R e c e i v e d 2 2 J u l y 1 9 8 6
R e v i s e d 1 7 J a n u a r y 1 9 8 7
Abstract: The sample s i ze r equ i rement s , fo r e s t ima t ing a cova r i ance ma t r ix wi th a de s i r ed p rec i s ion
in a mul t i va r i a t e norma l popu la t ion , a re i nves t iga t ed . Exp l i c i t fo rmula s fo r t he sample s i ze a re
prov ided in t he un iva r i a t e ca se and in t he mul t i va r i a t e ca se when the cova r i ance ma t r ix i s d i agona l .
In t he se ca se s t ab l e s a re a l so p rov ided fo r spec if i c va lues o f e , and the j o in t co nf iden ce coe f f i c i en t
1 - a . F o r t h e g e n e r a l c a s e , a m e t h o d t o c o m p u t e t h e s a m p l e s i z e i s d e v e l o p e d r e s u l t i n g i n a n
in t egra l equa t ion invo lv ing the cova r i ance ma t r ix . In ca se a p r io r e s t ima te o f t he cova r i ance ma t r ix
i s ava i l ab l e , t he i n t eg ra l equa t ion can be so lved by us ing the a lgor i t hm g iven by Russe l l e t a l .
(1985). Exam ple s a re used to i ll us t r a t e t he e f fec t s o f d imens ions and q ua l i t y o f p r io r e s t ima te s o fcova r i ance ma t r ix on the sample s i ze .
Keywords: Sample s i ze , Cova r i ance ma t r ix , Mul t iva r i a t e norma l d i s t r i bu t ion .
1 . I n t r o d u c t i o n
T h i s p a p e r d e a l s w i t h d e t e r m i n i n g t h e s a m p l e s i z e f o r e s t i m a t i n g a c o v a r i a n c e
m a t r i x i n a m u l t i v a r i a t e n o r m a l p o p u l a t i o n w i t h j o i n t c o n f i d e n c e l e v e l a n d
p r e c i s i o n . T h e p r o b l e m o r i g i n a t e d w h e n t h e f i r s t a u t h o r w a s i n v o l v e d i n a p r o j e c ta t t he U S A F S c h o o l o f A e r o s p a c e M e d i ci ne ( U S A F S A M ) . T h e U S A F S A M a t
B r o o k s A F B h a s b e e n i n t e r e s t e d f o r s e v er a l y e a r s i n th e u s e o f s t a t is t ic a l m e t h o d s
t o d e v e l o p a c o m p u t e r i z e d s y s t e m t o a s s i s t t h e c a r d i o l o g i s t s , w h o m u s t e x a m i n e a
l a r g e n u m b e r o f E K G ' s i n a s i n g l e d a y , i n t h e s c r e e n i n g , d i a g n o s i s a n d s e r i a l
c o m p a r i s o n o f v e c to r c a rd i o g r am s . P a s t e f fo r ts at U S A F S A M i n t h e d ia g n o s i s o f
v e c t o r c a r d i o g r a m s h a s r e l i e d o n a K a r h u n e n - L o r v e a p p r o x i m a t i o n o f t h e s i g n a l
* S u p p o r t e d b y a F a c u l t y S u m m e r R e s e a r c h G r a n t f r o m t h e U n i v e r s i t y o f M a i n e .
* * S u p p o rt e d b y N S E R C R e s ea r ch G r a n t # A - 4 8 5 0 .
186 P.L . Gup ta , R .D . Gup ta / Es t im a t ing a covar iance ma tr ix
( 7 5 0 d i m e n s i o n a l i n 3 - l e ad s y s te m ) t o g e t h e r w i t h l in e a r a n d q u a d r a t i c d i s c r i m i n a -
t i o n i n t h e t r a n s f o r m e d s p a c e w h i c h i s 6 0 d i m e n s i o n a l . T h e c r u x o f t h i s a p p r o a c h
i s, th e r ef o r e, t h e e s t i m a t e o f t h e 60 x 6 0 c o v a r i a n c e m a t r i x o f t h e K a r h u n e n - L o 4 v e
coe f f i c i en t s . I t s qua l i t y can , t he r e fo re , be a sou rce o f conc e rn fo r t he e f f i ca cy o f
t h e e n t i re p r o c ed u r e . T h e q u a l i t y o r a c c u r a c y o f t h e c o v a r i a n c e m a t r i x e s t i m a t e is
a f u n c t i o n o f t h e s a m p l e s i ze a n d t h e u n k n o w n e n tr ie s . I t w a s s u g g e s te d t h a t as a m p l e o f 7 50 is s u f f i c ie n t t o e s t i m a t e a 6 0 × 6 0 c o v a r i a n c e m a t r i x w i t h r e a s o n a -
b l e a c c u r a c y . T h i s f i g u r e i s a p p e r e n t l y n o t b a s e d o n a n y t h e o r e t i c a l c o n s i d e r a -
t i o n s a n d s e e m s t o b e - , l o w a s i s e v i d e n t b y t h e s a m p l e s i z e r e q u i r e m e n t f o r t h e
s i x t y d i m e n s i o n a l i n d e p e n d e n t c a s e ( se e T a b l e 2 ).
T h e p r o b l e m o f e s t i m a t i n g t h e v a r i a n c e (O 2) of a n o r m a l d e n s i t y a r i s e s i n
m a n y e x p e r i m e n t a l s i tu a ti o n s. A s a n e x a m p l e (G r e e n w o o d a n d S a n d o m i r e [4 ]), a
se r i e s o f r ada r p u l se s is t o be s en t ou t t o a t a rge t a nd the s t r e ng th o f t he r e tu rn
s i g n a l m e a s u re d . H o w m a n y r e a d i n g s u n d e r i d e n t ic a l c o n d i t io n s s h a l l b e t a k e n s o
t h a t t h e s t a n d a r d d e v i a t i o n o f t h e r e t u r n s i g n a l s t r e n g t h s s h a l l , w i t h 8 0 % c o n f i -
dence , be w i th in 10% o f t he t r ue va lue?G r e e n w o o d a n d S a n d o m i r e [4] p r e s e n t e d a g r a p h i c a l a p p r o a c h f o r o b t a i n in g
t h e s a m p l e si ze r e q u i r e d t o e s t i m a t e v a r i a n c e o f a n o r m a l d e n s i t y w i t h i n a g i v e n
p e r c e n t o f it s t r u e v a l u e. G r a y b i l l a n d C o n n e l l [ 2] i n s te a d , h a v e g i v e n a t w o s t e p
s a m p l i n g p r o ce d u r e t o e s t im a t e t h e v a r i a n c e w i t h i n a g i v en n u m b e r o f u n i ts . T h e
n u m b e r o f u n i ts a n d t h e c o n f id e n c e l ev e l a r e s p e c if i ed in a d v a n c e . T h o m p s o n a n d
E n d r i s s [ 1 0 ] h a v e a l s o g i v e n a m e t h o d f o r e s t i m a t i n g t h e s a m p l e s i z e i n t h e
u n i v a r i a t e c as e. T h e i r m e t h o d d e p e n d s o n t h e l a rg e s a m p l e d i s t r i b u t i o n o f
e s t im a t o r . O t h e r w o r k , d e a l i n g w i t h e s t i m a t i n g v a r ia n c e , i n c l u d e s G r a y b i l l a n d
M o r r i s o n [ 3 ], L e o n e , R u t e n b e r g a n d T o p p [ 5] , T a t e a n d K l e t t [9 ] a n d G r a y b i l l [1 ].
F o r t h e s a k e o f c o m p l e t e n e s s , i n S e c t i o n 2 , a b r i e f d i s c u s s i o n i s g i v e n t o f i n dthe s ample s i ze n fo r t he un iva r i a t e ca se fo r a g iven e ( t he r e l a t i ve e r ro r ) and a
g iven a (whe re 1 - a i s t he con f iden ce coe f f i c ien t ) .
I n S e c ti o n 3, w e d e v e l o p t h e p r o c e d u r e s f o r d e t e r m i n i n g t h e s a m p l e s i ze i n t h e
m u l t i v a r i a t e s i t u a t i o n w h e r e t w o c a s e s a r e s t u d i e d . I n c a s e 1 , t h e c o v a r i a n c e
m a t r i x Z i s t a k e n t o b e d i a g o n a l w h i l e i n c a s e 2 i t i s a n y g e n e r a l m a t r i x . T a b l e 2
i s p r epa red fo r t he ca se I when p = 2 , 5 , 10 , 20 , 40 , 60 . Fo r ca se 2 t ab l e s canno t
b e p r e p a r e d a s th e r e s u l t i s i n t h e f o r m o f a n i n t e g r a l e q u a t i o n i n v o l v i n g Z .
H o w e v e r , i f a p r i o r e s t i m a t e o f N i s a v a i l a b l e , o n e c a n u s e t h e a l g o r i t h m g i v e n b y
Russe l l e t a l . [ 7 ] t o so lve t he i n t eg ra l equa t ion . The qua l i t y o f p r io r e s t ima te has
a n i n t i m a t e e f f e c t o n t h e s a m p l e s i z e w h i c h i s i l l u s t r a t e d b y s o m e e x a m p l e s .
T h r o u g h o u t t h e p a p e r p d e n o t e s t h e d i m e n s i o n , e t h e r e l a t i v e er r o r a n d 1 - a
t h e j o i n t c o n f i d e n c e c o e f f ic i e n t w h e n p > 2 .
2 . U n i v a r i a t e c a s e
L e t X 1 , X 2 , . . . , X n b e a r a n d o m s a m p l e f ro m N ( ~ , o 2 ). L e tn
$ 2 = E ( X , - X ) 2 / n - 1 .
i=1
8/7/2019 Sample size determination in estimating a covariance matrix
190 P.L . Gup ta , R .D . Gup ta / Es t im a t ing a covar iance ma tr ix
w h e r e
e = e ~ / ( n - 1 ) ( [O la ], I O 1 2 [ , - - - , I O p p l ) t .
Re wr i t in g in in te g ra l fo rm, we h a v e
f_ IV 1-1 /2e < Y < e ( 2 . if ) p ( p + I ) /4 e - y ' V l y / 2 d y ~ ( 1 - o g ) . ( 3 . 7 )
I f a p r io r es t imate o f ~ is ava i lab le , the eva lua t ion o f the in tegra l in (3 .7 ) can be
a c h ie v e d b y a n a lg o r i th m re c e n t ly g iv e n b y Ru s s e l l , F a r r i e r a n d Ho we l l [7 ] .
Remark . In case some of the o i j ' s a re ze ro , we wi l l remove those o i j ' s f rom vec 2 :
a n d th e c o r re s p o n d in g S u ' s f ro m v e c S a n d c a r ry o u t th e c a lc u la t io n a s b e fo re .
S ince (3 .7 ) depends on ~ ; , a tab le fo r the n va lues cannot be p repared . The
s i tu a t io n h e re i s q u i t e s imi la r to th e s a mp le s i z e d e te rmin a t io n in e s t ima t in g th e
p ro p o r t io n o f a b in o m ia l p o p u la t io n . T h e q u a l i ty o f p r io r e s t ima te a n d d im e n s io no f 2 : h a v e p ro fo u n d e f fe c t o n th e s a mp le s i z e . T h e e f fe c t o f d ime n s io n o f 2 : c a n
b e s e e n b y th e f a c t th a t th e d ime n s io n o f V in c re as e s s h a rp ly , r e s u l tin g in a s h a rp
increase in the sample s ize . The e ffec t o f the qua l i ty o f the es t imate o f 2 : can be
seen by the fo l lowing examples .
E x a mp le s . L e t u s s u p p o s e a p r i o r e s t i m a t e o f t h e v a r i a n c e - c o v a r i a n c e m a t r i x 2 ;
o f a b iv a r i a t e n o rma l d i s t r ib u t io n i s g iv e n a s
( 4 5 )5 9 "
T h e n
V =32 40 50)
40 61 90 ,
50 90 162
e = e x /- n - 1 ( 4 , 5 , 9 ) .
Equa t ion (3 .7 ) can be wri t ten as
f 9 e ¢ ~ i f seCt-=1 f4e~z-Y [ g [ - 1 / 2
- 9 ex~ -2]- d - 5 ex/-n~-] -d - 4 e ~ /- n- -z T ( 2 ~ ) 3 / 2e - y ' V - l y / 2 d y i d y 2 d y 3 = 1 - o~ (3 .8 )
o r
_ _ _
w h e r e
R =
e -y 'R-1y/2 d y I d y 2 d y 3 = 1 - a
1 0 .905357 0 .694 444 )
0.905357 1 0.905357 ,
0 .694444 0.905357 1
I n - 1 I n - 1h l = e ~- , h 2 = 5 e ~-~ , ~ n - - 1h 3 = e 2
(3 .9)
8/7/2019 Sample size determination in estimating a covariance matrix