93941 NPS55-79-020 NAVAL POSTGRADUATE SCHOOL Monterey, California STATISTICAL METHODS OF PROBABLE USE FOR UNDERSTANDING REMOTE by SENSING DATA Donald P. Gave r October 19 79 Approved for public release; distribution unlimited n, -""^red for: FEDDOCS Postgraduate School D208.14/2:NPS-55-79-020 re Y ' Ca • 9 3940 brought to you by CORE View metadata, citation and similar papers at core.ac.uk provided by Calhoun, Institutional Archive of the Naval Postgraduate School
46
Embed
NAVAL POSTGRADUATE SCHOOL · PORTSDIVI •TESCHO( 93941 NPS55-79-020 NAVALPOSTGRADUATESCHOOL Monterey,California STATISTICALMETHODSOFPROBABLEUSE FORUNDERSTANDINGREMOTE by SENSINGDATA
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
* *
PORTS DIVI
•TE SCHO(
93941
NPS55-79-020
NAVAL POSTGRADUATE SCHOOL
Monterey, California
STATISTICAL METHODS OF PROBABLE USE
FOR UNDERSTANDING REMOTE
by
SENSING DATA
Donald P. Gave r
October 19 79
Approved for public release; distribution unlimited
n,-""^red for:
FEDDOCS Postgraduate SchoolD208.14/2:NPS-55-79-020 reY '
Ca• 9 3940
brought to you by COREView metadata, citation and similar papers at core.ac.uk
provided by Calhoun, Institutional Archive of the Naval Postgraduate School
Rear Admiral T. F. Dedman Jack R. BorstingSuperintendent Provost
This report was prepared by:
UNCLASSIFIED
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)
REPORT DOCUMENTATION PAGE1. REPORT NUMBER
NPS55-79-020
2. GOVT ACCESSION NO
READ INSTRUCTIONSBEFORE COMPLETING FORM
3. RECIPIENT'S CATALOG NUMBER
4. TITLEf«ndSubr/l/e)
Statistical Methods of Probable Use for Under-standing Remote Sensing Data
5. TYPE OF REPORT A PERIOD COVERED
Technical
6. PERFORMING ORG. REPORT NUMBER
7. AUTHORfsJ
D. P. Gaver
8. CONTRACT OR GRANT NUMBER^
9. PERFORMING ORGANIZATION NAME AND ADDRESS
Naval Postgraduate SchoolMonterey, Ca. 93940
10. PROGRAM ELEMENT, PROJECT. TASKAREA a WORK UNIT NUMBERS
H. CONTROLLING OFFICE NAME AND ADDRESS
Naval Postgraduate SchoolMonterey, Ca. 93940
12. REPORT DATE
October 197913. NUMBER OF PAGES
3814. MONITORING AGENCY NAME 4 ADDRESSf// different from Controlling Office) 15. SECURITY CLASS, (of thla report)
Unclassified
15a. DECLASSIFI CATION /DOWN GRADINGSCHEDULE
16. DISTRIBUTION ST ATEMEN T (of this Report)
Approved for public release; distribution unlimited.
17. DISTRIBUTION STATEMENT (of the abatract entered In Block 20, If different from Report)
18. SUPPLEMENTARY NOTES
19. KEY WORDS (Continue on reverse aide If necessary and Identify by block number)
StatisticsRobustnessRemote SensingRegression
CensoringSmoothingExtreme Values
20. ABSTRACT (Continue on reverae aide If necessary and Identity by block number)
This report outlines several new statistical approaches to data problemslikely to be encountered when remote sensing methods are used. Themethods described are robust regression, smoothing, and modeling andestimation of ice pressure ridge characteristics.
dd ,;FORMAN 73 1473 EDITION OF 1 NOV 65 IS OBSOLETE
S/N 0102-014- 6601 |
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered)
STATISTICAL METHODS OF PROBABLE USE
FOR UNDERSTANDING REMOTE SENSING DATA
Donald P. Gaver
Naval Postgraduate SchoolMonterey, CA 9 3940
1. INTRODUCTION
Statistical methodology has long been a familiar tool for
use in understanding our natural environment. Classical examples
of applications of statistics are seen in weather forecasting,
in evaluation of attempts at weather modification by cloud seed-
ing, and in descriptions of the fluctuations in the sea surface.
Now the accessibility of new and extensive data from a variety
of remote sensing sources, such as earth orbiting and geostationary
satellites, again calls for the development and application of
appropriate statistical methodology. Classical methods of statis-
tics and of probability modeling frequently must be adapted to
the new needs. The process of adaptation will proceed most
efficiently if statisticians work cooperatively with the scientists
actually obtaining data and studying the associated natural
phenomena. Conferences such as PRIMARS I are of great value in
promoting the necessary interchange of information and the stimulus
to approach novel and difficult problems in a realistic manner.
This paper describes new approaches to the analysis of
data, in particular to quite "noisy" data of the sort that is
likely to be encountered when observing the natural environment.
The descriptions given will necessarily be brief, but an attempt
will be made to show how the methods and viewpoints presented may
be applied to problems arising in remote sensing.
2. ROBUST METHODOLOGY: REQUIREMENTS AND POSSIBILITIES
Many scientists who have closely examined real data
have encountered occasional, or even frequent, anomalous behavior.
Apparent anomalies in data may be with respect to either
(a) preconceptions as to "proper" data behavior, these perhaps
being buttressed by (physical) theory, o£
(b) the nature of the general pattern of the data, especially
those data points in the immediate neighborhood, e.g.,
in time or space.
2.1. Plots
In simple circumstances graphical plots will quickly reveal
those points that are blatant anomalies. For instance suppose that
one wishes to investigate data concerning the relationship between
wind velocity and whitecap cover in the ocean. Theory may suggest
a specific relationship, e.g. that white-cap cover, C, be nearly
a cubic function of wind velocity v, so that it will be tempting
to plot C vs v and note an appearance as shown on Fig. 1; there
solid black dots represent (simulated) raw data. Since the eye
2finds it difficult to distinguish curves of the form C = av ,
3 7/2 •
C = av , C = av ', etc., froir. one another, and yet is sensitive to
3departures from linearity, a graph of C vs v suggests itself, bu1
is not included here. A plot on log-log paper may be still better.
As presented, the data conforms in general to the theorized relation-
ship or scaling, with the obvious exception of the circled point to
the right. Such an anomalous point, or points, represents a challenge
both to statistical technology and to the ultimate user of the data.
Statistical technology assumes the responsibility for revealing
the presence of such points, and, if possible, for providing a
meaningful and useful summary of the remaining points. It falls
to the consumer or ultimate user of the data, preferably with
the help of a subject-matter specialist (physicist or oceanographer)
to interpret the apparently anomalous maverick—or exotic, or
outlying—data point: is it
(i) an evidence of the failure of the relation C = av , say
for large velocities,
or is it
(ii) an outright error in data recording, and to be disregarded?
being just two possible options.
Note that simple graphs are invaluable for pointing out
extreme outliers in simple, one explanatory variable, situations.
If more variables are required, informative plots are more diffi-
cult without the use of more statistical technology. We next
show that classical, least-squares, technology may be quite mis-
leading, but that replacements are available. See Mosteller and
Tukey (19 77) , abbreviated MT hereafter.
2.2. Fits and Residual Plots
Suppose that one wishes to summarize data such as that
in Fig. 1 by fitting the relationship C = av , i.e., determining
the parameter a from the data. The classical and automatic way
of doing so is to apply least squares; computer programs are uni-
versally available, even for handheld calculators (the TI 59, or
HP 67) . What are we likely to find? A least-squares line (treat-
3ing w = v as the independent variable presents C = aw; one
can also plot and fit C '* = (a) ' v, and there may be reasons
for this choice) is quite apt to fatally misrepresent the situation,
responding much too sensitively to the single (here encircled)
outlying value, and straying systematically away from the main
body of the data; see the points represented by o in Fig. 1.
An alternative method for fitting, described in MT, is
less susceptible to outlier influence— is far more robust to
departures from basic assumptions— than is the ordinary least
squares (OLS) method. This new method*, termed biweight fitting ,
is carried out by a procedure that uses the OLS computation itera-
tively. In the course of the computations weights are auto-
matically developed that reduce the influence of the encircled
value of Fig. 1, permitting the fit to more closely approximate
the main body of the data. We now describe and illustrate the
biweight fitting procedure as it is adapted to the problem of
determining the parameter a in the relation y. vs_ ax..
Biweight Fitting Calculation
(1) Compute the kth (k = 1,2,3,...) iterative estimate of a,
denoted by a by solving
n) (y. - a x. ) x.w.
(k-1)= ,
to obtain
n(k-1)
a
y y . x. w.(k> _ i=i x x *
(k-1)(2) the weights, w. , are of this form:
w (k-1)1 -
(k-1) \2y. - a x.
cSTFTTif (') < 1
= if (•) > 1
where (•) refers to the term [(y.-a (k 1)x. ) /S
(k~ 1)]
;
(k-1)S is a scale factor (robust replacement for the
standard deviation) that may be computed in the following manner
s t(3) The k-1 iterated value of the scale factor is
c (k-l) ,. n (k-1) ,,S = median! |y. - a x.|>,
c being a constant of value 6, or 9;
(4) the first value, a , of the iterative sequence can be
obtained by equalizing all weights (w. = 1) , which is
equivalent to OLS ; alternatively, one can utilize a "robust
start," suggestions for which can be found in MT .
The iteration is carried on until the difference between successive
values is small; usually 4 to 8 iterations is sufficient. The
resulting a-estimate can be denoted by a.
Following the fitting it is informative to plot the
residual values:
r. = y- - ax. =y. - y. , i=l,2,...,n,
y. being shorthand for the predicted y value. In case there
is a single outlier, as in Fig. 1, the fitted line will tend to
hug the major point cloud, and a histogram of the residuals will
dramatically reveal the presence of the outlier, suggesting
further investigation. A plot of r. vs y. is also useful. See
MT for further suggestions
.
2.3. Numerical Illustration
The following are a set of (simulated) whitecap percentages
and corresponding wind velocities. Alongside are values for
white cap coverage estimated by OLS and by the biweight procedure
.
Veloci ty Cover("Actual"
2 0.011
7 0.63
10 1.30
15 3.89
18 2.89
21 8.16
24 25.7
Cover(OLS Estimate)
0.011
0.49
1.43
4.82
8.33
13.2
19.7
Cover(Robust Estimate)
0.0067
0.29
0.84
2.83
4.88
7.76
11.6
It is clear from the above table, and perhaps clearer from Fig. 1,
that the OLS solution, in its attempt to fit the point C(24) = 25,
systematically and considerably over-estimates the points at v = 15
and above. The biweight estimator performs much better, allowing
a closer fit to all data other than C(24) . A residual plot brings
attention to bear on that point.
Since the values of "Actual Cover" were actually constructed
3by forming 0.00 8v and adding Gaussian random noise with value pro-
3portional to C(v), and since the sequence of values of 0.008v
were 0.0064, 0.27, 0.80, 2.7, 4.67, 7.41, 11.06, we cannot fault
the manner in which the biweight procedure functioned in this
example and are encouraged to use it more widely.
2.4. Possible Application to Remote Sensing Data
In a paper in this conference proceedings by Depriest
(1979), and in Fleming (1979), a problem arising from partial
cloud cover contamination of remote sensing data is described
and addressed. This problem has the following origin. A series
of measurements are made on a physical quantity (sea surface
temperatures) but are contaminated. That is, in the case of
sea surface temperatures, if no clouds are present the measure-
ments are approximately normally distributed around y (the true
temperature) . However, if clouds are present a fraction of the
measurements are made artificially smaller, cloud temperatures
being lower than those at earth surface. The problem is to esti-
mate y. Techniques for doing so are described by Depriest (1979)
and by Fleming (19 79) . We describe a possible alternative approach
that uses robust regression. Operational characteristics of
the two procedures have not yet been compared.
(1) Arrange the measurements in order: y < y < y < ... <J- £• J
yn-l < yn*The lar9est observations may well appear
similar to the largest order statistics of a normal distri-
bution with (unknown) mean y and standard deviation a
(sometimes assumed known, although caution is in order)
,
while the smaller ones are likely to depart systematically.
(2) Carry out a preliminary plot of
yk XS*" 1
(f^I ) , k = n, n-1, n-2,
where * (p) is the inverse function of the unit normal
recall that if
* / \ f / 1 2
.
dz* (y) =
J expC-j z )——- ,
-°° /2tt
is the unit normal distribution/ then the solution of the
equation $ (y) = p gives
y(p) = $~ (p)
;
$, and hence $ , are widely tabulated. Alternatively,
use Arithmetic Probability Paper. If y, is an ordered
observation from a normal population, then the plot should
appear straight, while a systematic departure from linearity
indicates a departure from normality. Suppose departures
begin to occur at k = D; sometimes D may be greater than
n/2 . One may first eye-fit a straight line to the points
k = n, n-1, ... , D. Then y ,~ = u (estimated temperature).
i.e. the value of the fitted line at n/2 should give a
reasonable value for y.
(3) Going further in a formal direction, one may wish to fit a
line to the data points. Here a biweight fit should behave
well, tending to be oblivious to spurious (cloud contamina-
tion) points. One can proceed to fit the relation
ykvs u + ax
k
with
-1 kXk
= ° ^n+T* ' k = n, n-1, n-2, ... ;
a start using the eye-fit to points k = n, n-1, ... , D
may be worthwhile. Finally, quote the estimate
y = med yk= | (y + y (n/2)+1 >, n even;
= y (n+l)/2'n odd '
where
yk= V + ax
k.
The above procedure seems worth further investigation and refine-
ment. One important step may be to adjust for the effect of
correlation between order statistics when carrying out the
regression.
10
3. SMOOTHING DATA
If one plots certain environmental data, e.g. monthly
total rainfall, or perhaps daily maximum temperature, at a
particular location, systematic regularities seem to appear,
but may be masked by noise. Often there is a seasonal pattern,
i.e. one that is roughly cyclic in nature. Attempts to fit such
a pattern with polynomials is doomed to failure, and selection
of a set of sines and cosines that does well (Fourier series)
may lead to many terms. Some method of smoothing the original
series that lays bare the regularities is to be desired. After
such is made available, one can study the residuals around it.
Spectral analysis or some such formal procedure may then be of
use.
Classical smoothing procedures involve some form of moving
average, and are susceptible to the python-swallowing-the pig
difficulty: imagine using the linear smoothing operation
1) Along a sampling line (e.g. airplane flight path, or straight
submarine track) ice ridges seem to appear in accordance
with a stationary Poisson process, so if R(x) is the number
of such ridges encountered over a distance x, then approxi-
mately
P{R(x) = n} = e"Xx (X* }
, , n = 0,1,2,...n •
where A > is the density of ice ridges.
2) The probability distribution of ridge "sail heights" (or
"keel depths") may be approximated by the forms F(y) = 1-e2
or 1-e y; the best-fitting distribution may well depend
upon the method of observation (averaging properties).
For further details see work referenced in Weeks et a_l. (1979) .
Now it may be of interest to compute the distribution
of the maximum sail height, or keel depth, that one is to encounter
over a course of length x. This is very simple, given the
particular distributions of -sail number and size and furthermore
13
assuming independent between ridge heights. Let H(x) be the
maximum sail height; then
n-Ax (Ax)
r_ .
x,n
P{H(x) £ y> =I e"
AX A^_[F (y)l
n=0 n '
since all of the Poisson-distributed heights must be less than y
in order for the maximum to be below y.
Sum out to obtain
P(H(x) <_ y) = exp{-Ax[l-F(y) ]}
Depending upon which distribution is picked for ridge heights,
we get
a) P(H(x) £ y} = exp(-Axe~yy )
2
b) P(H(x) <_ y) = exp(-Axevy
)
These closely resemble classical extreme value distributions.
Note that if logs are taken simplicity occurs:
a') £n p{H(x) £ y) = -Axe' 'VY.
£n(-£n P(H(x) <_ y}) = Zn(Xx) - yy
2
b') £n P{H(x) £ y} = -Axe"Vy
;
£n(-£n P{H(x) y}) = £n(Ax) - vy2
14
If either of these formulas are to be used for practical purposes,
values of the parameters must be obtained. In order to estimate
parameters X, u, \> in the above models from data one naturally
thinks of the method of maximum likelihood. Suppose that we
have observed R(x) = n ridges of heights y 1# y 2, •• ' ¥n
*
Then the maximum likelihood estimates are
x * 1 * _ _1_t u ~ T ' v
vX - n ' M - - ' v
y 2
where as usual we have put
k 1 f k
i=l
Hence our estimates are of the form
a") est £n(-£n P{H(x) <_ y} ) = in A + in x - yy
b") est ln(-ln P{H(x) <_ y} ) = In A + £n x - vy
If rather large samples are available and if distributional assump-
tions are well satisfied one may feel comfortable with conventional
standard errors based on Fisher information and normality; see
Cramer (1946). On the other hand, it is of interest to apply
the jackknife technique (see R. G. Miller (1974) for a review) to
obtain estimates of the variance of estimate due particularly to
the ridge heights. To carry out the calculation, (i) compute
v ,, = n/y ; then (ii) compute
n-1v(-j) 2 2 2
" 2 T~y, + y-> '+•••+ y. , + o + v. .
+•••+ y
15
for j = 1,2,..., n; then (iii) compute the pseudovalues
v. = nv ., - (n-1) v. . ), and (iv) average to obtain a jackknifed3 all v~D
por n
int estimate vtv = (1/n) ) . . v., and its varianceJK L j=l j
Then we can estimate the standard error of the probability predictior
e.g. b") by computing
S.E. = (Var[est £n(-£n P(H(x) < y})])1/2
A similar calculation is easily performed for model a) ; details
are omitted. From the above results, approximate confidence inter-
vals may be constructed for the probability of encountering a
(maximum) ridge sail height less than y in magnitude.
Fairly recent theoretical results of Efron and Hinkley
(1978) suggest that if a traditional maximum likelihood approach
is taken, one is better off using observed Fisher information
rather than expected Fisher information in order to establish an
approximate standard error in either case a) or b) . However,
work of Reeds (1978) suggests that use of the jackknife in
conjunction with maximum likelihood yields results that tend
16
to be rather independent of the basic model chosen. Both of
these suggestions must be validated by further work, a good
deal of which will necessarily involve Monte Carlo simulation.
Such work should be of great importance and interest to those
who must assess the probabilities of extreme, rare, events, and
who furthermore wish to provide some reasonably valid estimates
of the error of their estimates.
Acknowledgment . The writer is much indebted to LCDR C. F.
Taylor, Jr., for his assistance in example robust regression
computations. He is also indebted to the Office of Naval
Research for support of this research.
17
REFERENCES
Cramer, H. (1946), Mathematical Methods of Statistics , PrincetonUniv. Press, Princeton, N.J.
Depriest, D. (1979) , "Consideration using a truncated normaldistribution for remote sensing data." Paper presentedat PRIMARS-1 Conference.
Efron, B., and Hinkley, D - (1978). "Assessing the accuracy ofthe maximum likelihood estimator: observed versus expectedFisher information," Biometrika 65 , No. 3, pp. 457-488.
Fleming, H. E. (1979). Application of the truncated normal dis-tribution technique to the derivation of sea surface tem-peratures," to appear in Remote Sensing of Atmospheres andOceans (1980), ed. by A. Deepak; Academic Press.
Miller, R. G. (1974). "The Jackknife— a review." Biometrika 61,No. 1, pp 1-16.
Mosteller, F. , and Tukey, J. W. (1977) , Data Analysis andRegression . Addison-Wesley Publishing Co., Reading, Mass.
Reeds, J. A. (1978). "Jackknifing maximum likelihood estimates."Annals of Statistics 6, No. 4, pp. 727-739.
Tukey, J. W. (1977). Exploratory Data Analysis , Addison-WesleyPublishing Co., Reading, Mass.
Weeks, N. F., Tucker, W. B., Frank, M. , and Fungcharoen, S . (1979)."Characterization of the surface roughness and floe geometryof the sea ice over the continental shelves of the Beaufortand Chukchi seas." In "Sea Ice Processes and Models, Proc.AIDJEX/ICSI Sympos." (R.S. Pritchard, ed.), University ofWashington Press (in press)
.
18
25
WHITECAP COVERAGE vs. WIND SPEED( SIMULATED DATA )
• DATAO ORDINARY LEAST-SQUARESD ROBUST (Bl WEIGHT) FIT
FIT (a = 0.00114 )
(a = 0.00837 )
®
20 -oIDe><LU>o<J
a.<oUJ
I
15
a
10
o
a
a
I—0- ti-
2a
o 15
Fig. 1
20 25 30' velocity \
INITIAL DISTRIBUTION LIST
No. o Copies
Defense Documentation Center 2
Cameron StationAlexandria, VA 22314
Library Code 2
Code 0142Naval Postgraduate SchoolMonterey, CA 9 3940
Library Code 55 1
Naval Postgraduate SchoolMonterey, CA 9 3940
Dean of Research 1
Code 012ANaval Postgraduate SchoolMonterey, CA 93940
Naval Postgraduate SchoolMonterey, CA 9 39 40
Attn: A. Andrus, Code 55 1
D. Barr, Code 55 1
D.P. Gaver, Code 55 2 5
P. A. Jacobs, Code 55 1
P.A.W. Lewis, Code 55 1
P. Milch, Code 55 1
R. Richards, Code 55 1
M. G. Sovereign, Code 55 1
R. J. Stampfel, Code 55 1
R. R. Read, Code 55 1
Mr. Peter Badgley 1
ONR Headquarters, Code 102B800 N. Quincy StreetArlington, VA 22217
Dr. James S. Bailey, Director 1
Geography Programs,Department of the NavyONRArlington, VA 22217 1
19
DISTRIBUTION LISTNo. of Copies
STATISTICS AND PROBABILITY FROGRAM 1CFFICE OF NAVfL RESEARCHCOCE 426ARLINGTONVA 22217
CFFICE CF NAVAL RESEARCHN'Ek YORK AREA CFFICE115 BROACWAY - 5TF FLOORATTN: OR. ROBER GRAFTONNEW VORK, NY 1DDD3
DIRECTORCFFICE OF NAVAL RESEARCH ERANCH OFF536 SOUTH CLARK STREETATTN: DEPUTY AND CHIEF SCIENTISTCHICAGO, IL 60605
LI ERARYNAVAL OCEAN SYSTEMS CENTERSAN DIEGOCA
92152
NAVY LIBRAFYNATIONAL SPACE TECHNOLOGY LABATTN: NAVY LIERARIANBAY ST. LGL'ISMS 29522
NAVAL ELECTRONIC SYSTEMS COMMAND ,1NAVELEX 22CNATIONAL CENTER NO. 1ARLINGTONVA 20360
DIRECTOR NAVAL R.EAEARCF LABORATORYATTN: LIBRARY (ONRL)CCCE 202 c
WASHINGTON, Z.C.20275
TECHNICAL INFORMATION CIVISIONNAVAL RESEARCH LABORATORY
WASHINGTON, C. C.20375
20
DISTRIBUTION LIST NO . Of Copies
OFFICE CF rAVAL RESEARCH -1
SAN FRANCISCO AREA CFP1CE760 MAFKCT STREETSAN FRANCISCC CALIFORNIA 94102