Top Banner
Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queen’s University, Belfast)
14

Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Mar 28, 2015

Download

Documents

Dylan Wiley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Methods of interpolating data to create long-run time series

Ian Gregory (University of Portsmouth)

&

Paul Ell (Queen’s University, Belfast)

Page 2: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

50 Ancient Counties Union/Registration Counties Admin. Counties S/M C

330 D’ricts

650 PL Unions/Registration Districts

1500 Local Govt. Districts

15,000 Parishes/Wards

No

Of U

nits

100,000 EDs 1801 1841 1881 1921 1961

“Minor” changes: Registration Districts (1840-1910): 400 Local Govt. Districts (1890s-1972): 4,000 Parishes (1876-1972): 20,000

Administrative Units in England and Wales from 1801

Page 3: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

The Newport area, 1911

Page 4: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Creating a standard geography• Areal Weighting:

– Assumption – Variable y is homogeneously distributed across the source zones

– Using this:

– BUT: Very unrealistic assumption.

s

sst^

tA

yAy

Page 5: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Other sources of information (1)• 1. Dasymetric technique:

– There were 15,000 parishes as opposed to 600/1,500 districts

– Total population is available at this scale

– Assumptions:• The distribution of y follows the distribution of the total population

• Parish-level population is homogeneously distributed

– Problem: • Most districts in towns and cities consist of only one parish.

– 1911, 30% of pop lived in districts that consisted of only one parish

Page 6: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Other sources of information (2)• 2. Data from target districts as ancillary information:

– Can provide information on the distribution of source zone data

– EM algorithm is used

– E.g. • 1. Sub-divide target zones into rural and urban

• 2. Assume that rural and urban targets have the same population densities

• 3. Allocate y to targets using this assumption

• 4. Find the average population density of rural and urban target districts

• 5. Go back to stage three using the new population densities and repeat until the algorithm converges

– Can use y for the target districts or total population at parish level as ancillary information

– Relies on having relevant information for target districts

Page 7: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Other sources of information (3)• 3. Combined technique

– Brings together the dasymetric technique and the EM algorithm

– Makes use of all available information

– Tests all the assumptions

Page 8: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Choice of technique

Total Pop. Tot. Female Males 15-24 No Car Farmers

RMS Max RMS Max RMS Max RMS Max RMS Max

A Weight 26.3 143.3 27.2 149.1 26.3 130.6 52.3 359.4 4.4 28.5

Dasymetric 13.5 83.2 13.8 88.0 13.7 74.9 35.2 280.2 14.0 63.6

EM-District 7.5 28.8 6.4 28.7 6.8 37.8 11.0 55.7 5.4 28.5

EM-Parish 5.6 28.6 5.8 31.0 5.1 19.8 11.4 79.5 14.4 80.2

Comb-Dist 3.6 22.9 4.1 23.5 2.8 14.0 9.9 48.5 13.9 62.8

Comb-Par 3.4 17.8 3.4 16.3 3.9 23.8 6.5 38.3 16.9 73.7

Based on aggregating 1991 EDs to form pseudo-parishes and districts

Conclusions:• No one technique for all variables• Careful choice of technique reduces error significantly Using regression techniques can help determine which is most appropriate• Error will still be appear in the interpolated data

Page 9: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Predicting error• Possible techniques:

1. Space – where target zones consist of many large fragments of source zones they are error prone

2. Attribute – error is most prevalent when data have been allocated from urban zones to rural ones

3. Time – error will cause “unrealistic” changes in population

Page 10: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)
Page 11: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Using population change to locate error

Total Population of Water Orton, 1851-1951

0.0

200.0

400.0

600.0

800.0

1,000.0

1,200.0

1,400.0

1,600.0

1,800.0

2,000.0

1851 1861 1871 1881 1891 1901 1911 1921 1931 1951

Pop. Change in Water Orford, 1851-1951

-0.100

-0.050

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

1850

s

1860

s

1870

s

1880

s

1890

s

1900

s

1910

s

1920

s

1930

/40s

Water Orton – parish on the edge of Birmingham 1901-1951, Water Orton (1951: Pop. 1,841, area 2.3km2, pop. den 796 p/km2) 1861-1891, part of Aston: (1891: Pop. 250,000, area 57km2, pop. den 4,300p/km2) 1851, Water Orton: (1851, Pop. 190, area 2.6km2, pop. den 73 p/km2)

Pop. Change = (y2-y1)/(y2+y1)1851: Est. Pop: 182 Actual Pop: 190

Page 12: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Using population change to locate error Birmingham 1951: Pop. 1,100,000, area 210km2, pop. den. 5,235p/km2

1931: Pop. 1,000,000, area 187km2, pop. den. 5,367p/km2

1891: Pop. 246,000, area 12.2km2, pop. den. 20,123p/km2

1851: Pop. 919, area 0.94km2, pop. den. 977p/km2

Total Population of Birmingham, 1851-1951

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1851 1861 1871 1881 1891 1901 1911 1921 1931

Pop. Change in Birmingham, 1851-1951

0.000

0.100

0.200

0.300

0.400

0.500

0.600

Page 13: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Using population change to locate error Castle Bromwich – parish on the edge of Birmingham 1951, Castle Bromwich (1951: Pop. 4,356, area 4.7km2, pop. den 927p/km2) 1921-1931, part of Birmingham: (1931: Pop. 1,000,000, area 187km2, pop. den 5,367p/km2)

1861-1911, part of Aston: (1891: Pop. 250,000, area 57km2, pop. den 4,300p/km2) 1851, Castle Bromwich: (1851, Pop. 6426, area 18.7km2, pop. den 344p/km2)

Total Population of Castle Bromwich, 1851-1951

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

1851 1861 1871 1881 1891 1901 1911 1921 1931

Pop. Change in Castle Bromwich, 1851-1951

-1.000

-0.800

-0.600

-0.400

-0.200

0.000

0.200

0.400

0.600

0.800

1.000

Page 14: Methods of interpolating data to create long-run time series Ian Gregory (University of Portsmouth) & Paul Ell (Queens University, Belfast)

Conclusions• Can interpolate data to create long-run time-series

• Choice of best technique will depend on nature of the variable– No “one size fits all” technique

• All techniques will create some error

• What to do about error:– Attempt to smooth it out

– Explicitly incorporate it into an analysis