www.StratAG.ie Outlier Detection and the Estimation of Missing Values Martin Charlton and Paul Harris National Centre for Geocomputation National University of Ireland Maynooth Maynooth, Co Kildare, IRELAND ESPON 2013 Programme Workshop Managing Time Series and Estimating Missing Values 6 May 2010 Luxembourg
44
Embed
Www.StratAG.ie Outlier Detection and the Estimation of Missing Values Martin Charlton and Paul Harris National Centre for Geocomputation National University.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
www.StratAG.ie
Outlier Detection and the Estimation of Missing Values
Martin Charlton and Paul Harris
National Centre for GeocomputationNational University of Ireland Maynooth
Maynooth, Co Kildare, IRELAND
ESPON 2013 Programme WorkshopManaging Time Series and Estimating Missing Values
6 May 2010Luxembourg
www.StratAG.ie
Outline
• Time Series
• ESPON DB data issues
• Detecting exceptional values
• Estimation of missing values
• Case study
www.StratAG.ie
1: Time Series
www.StratAG.ie
What is a time series?
• A variable which is measured sequentially in time at fixed sampling intervals is known as a time series
• The behaviour of such series can be modelled
• The main features of time series are trend and (sometimes) seasonal variation
• Observations which are close together in time tend to be correlated
www.StratAG.ie
Air Passengers 1949-1960
Time
Pa
sse
ng
ers
(1
00
0's
)
1950 1952 1954 1956 1958 1960
10
02
00
30
04
00
50
06
00
A time plot of the number of air passengers per month between January 1949 and December 1960 in the USA reveals a rising trend
There is also a seasonal pattern of travel within each year. More people travel in the summer than the winter.
www.StratAG.ie
Time
ag
gre
ga
te(A
P)
1950 1952 1954 1956 1958 1960
20
00
50
00
1 2 3 4 5 6 7 8 9 10 11 12
10
04
00
Aggregating the series annually reveals the rising trend, and the boxplot shows that more people travel in the summer months.
www.StratAG.ie
Forecasting: 1
Holt-Winters filtering
Time
Ob
serv
ed
/ F
itte
d
1950 1952 1954 1956 1958 1960
10
02
00
30
04
00
50
06
00
There are many modelling and forecasting techniques.
Here we use the Holt Winters procedure to model the series behaviour…
The fit is quite promising
www.StratAG.ie
Forecasting: 2
Time
1950 1955 1960 1965
10
02
00
30
04
00
50
06
00
70
08
00
And if the growth of the US air traffic during the first 4 years of the 1960s follows the pattern of the previous 12…
the forecast is for some 800 million passengers by 1965
www.StratAG.ie
Models
• There are a wide variety of different models, including– Basic stochastic models (like Holt Winters)– Stationary models (AR, MA, ARMA)– Non-stationary models (ARIMA, ARCH)– Spectral analysis (based on the Fourier
transform)– Multivariate models (two or more series are
involved)
www.StratAG.ie
2: ESPON DB Data Issues
www.StratAG.ie
Some typical data… household income
The NUTS2 regions in Austria are the Länder – here we have short time series concerning disposable income of private households from 1995 to 2007. Each series has only 13 elements
We might normalise these by the population to reach a comparable ‘per capita’ figure
www.StratAG.ie
Short series…
• We should be aware that there is an interaction between the amount of data available and what can be done with it
• Paas, Kusk, Schlitte and Võrk’s 2007 analysis of income convergence in selected countries of the EU using NUTS3 data had this to say:
www.StratAG.ie
George Box, 1976, Science and Statistics
• Models include not just the analytical tools that others might use, but those which we use to examine the data for outliers and estimating values
• ‘Wrong’ for Box includes models that fail to encapsulate the process under investigation
www.StratAG.ie
ESPON Tigers
• Long time series tend to be for large areal units, such as countries, or major administrative regions – the MAUP may well also be a tiger
• Smaller regions…– shorter series– incomplete series– a long time period between elements
(decennial censuses) in the case of very small units
www.StratAG.ie
3: Detecting Exceptional Values
www.StratAG.ie
Exceptional values
• Two types:1. Logical errors (e.g. negative unemployment rate)2. Statistical outlier (e.g. unusually high