Complementing Conventional with Innovative Data Sources Arturo Martinez Jr. Statistician Asian Development Bank International Workshop on Sustainable Development Goals Indicators 26 to 29 June 2018
Complementing Conventional with Innovative Data Sources
Arturo Martinez Jr.
Statistician
Asian Development Bank
International Workshop on Sustainable Development Goals Indicators
26 to 29 June 2018
Outline of Discussion
Conventional Data and SDGs
Overview of SAE methodologies
Complementing with Innovative Data
Data for Development TA
SDG Monitoring
• Many countries in Asia and the Pacific have examined the link between their respective development targets and SDGs � strategic priorities
• Countries have also done data availability assessment � (i) immediately available, (ii) can be made available using conventional data sources, (iii) needs more effort
Conventional Data Sources for SDGs
• Administrative Reporting Systems- Education- Health- Trade and Industry (e.g., business permits, letters of credits)- Civil Registration
• Censuses- Population and Housing- Agriculture- Enterprise
• Surveys- Households- Enterprises- Farm Holdings
Uses of Conventional Data Sources
• About 20-50% of SDG indicators can be derived from household surveys and censuses
0% 20% 40% 60% 80%
SDG 01
SDG 02
SDG 03
SDG 04
SDG 05
SDG 06
SDG 07
SDG 08
SDG 09
SDG 10
SDG 11
SDG 12
SDG 13
SDG 14
SDG 15
SDG 16
SDG 17
Proportion of SDG Indicators derived from HH Surveys
Limitations of Conventional Data Sources
Limitations of Conventional Data Sources
- 10,000 20,000 30,000 40,000
BGD (2010-11)
PHL (2015)
VNM (2010)
THA (2015)
MYS (2012)
NPL (2010-11)
LKA (2012-13)
AFG (2013-14)
KHM (2016)
MMR (2009-10)
GEO (2016)
PNG (2009-10)
LAO (2007-08)
TKM (2003)
ARM (2016)
MNG (2011)
TLS (2011)
FJI (2008-09)
BTN (2012)
SLB (2012-13)
VUT (2010)
MDV (2009-10)
WSM (2013-14)
TON (2009)
FSM (2005)
KIR (2006)
PLW (2006)
COK (2005-06)
TUV (2010)
Total Number of Households ('000)
Population - HH
- 20,000 40,000 60,000 80,000
Sample Size
Sample Size - HH
Source: Compiled data from ADB Portal
of Statistics Resources and International
Household Survey Network
Sample size of latest household surveys vs. total number of households of select ADB DMCs
Availability of Disaggregated Data
Source: Serrao (2017) – Presentation on SDG Data Compilation
Situation in Asia and the Pacific
• Major surveys and censuses in some countries conducted only if donor funds are available in many countries
– donor dependence 70-80% budget in some countries
• Poor coverage and quality of administrative reporting systems
• both economic and social increasing the dependence on surveys
• For disaggregated data, surveys alone may not sufficient
• administrative data such as from civil registration and administrative registries need strengthening for long term sustainability
Small Area Estimation
• Let’s focus on indicators that are compiled through surveysG but surveys are unable to provide reliable estimates at very granular level!
• A small area is small geographical area or a population segment for which reliable statistics of interest cannot be produced due to survey’s limitations
• Small area estimation techniques borrow strength from other ‘auxiliary information’
Limitations of SAE methods
• Good auxiliary data usually required
• Gap between census and survey years can increase model error
• Only time invariant variables can be added in the model if the gap between census and survey periods is too wide
Big data can help address the limitations of SAE
Variety
Velocity
Volume
Big data can be much
faster in providing
granular auxiliary data
than conventional data
sources.
How can big data be useful for data disaggregation?
Source: Marchetti et al. (2015) – Small Area Model-Based Estimators Using Big
Data Sources
• To create proxy indicators
• To generate new covariates for small area models
• To validate small area estimates
Create Proxy Indicators
Video source: https://www.youtube.com/watch?v=rXFVejLDGAA
Create Proxy Indicators
Create Proxy Indicators
R² = 0.36670
1
2
3
4
5
0 2 4 6 8 10 12
ln (
Po
ve
rty R
ate
(%
))
ln (sum of nighttime lights)
Relationship between Poverty and Nighttime Lights - PHL 2009
R² = 0.44450
1
2
3
4
5
0 1 2 3 4 5
ln (
Po
ve
rty R
ate
(%
))
ln (average of nighttime lights)
Relationship between Poverty and Nighttime Lights - PHL 2009
R² = 0.35780
1
2
3
4
5
0 2 4 6 8 10 12
ln (
Po
ve
rty R
ate
(%
))
ln (sum of nighttime lights)
Relationship between Poverty and Nighttime Lights - PHL 2012
R² = 0.48160
1
2
3
4
5
0 1 2 3 4 5
ln (
Po
ve
rty R
ate
(%
))
ln (average of nighttime lights)
Relationship between Poverty and Nighttime Lights - PHL 2012
Create Proxy Indicators
R² = 0.0004
0
10
20
30
40
50
60
-2000 0 2000 4000 6000 8000
Dif
f, P
ove
rty R
ate
(%
)
Diff, sum of nighttime lights
Difference - 2012 and 2009 Poverty and Nighttime Lights , PHL
R² = 0.0002
0
10
20
30
40
50
60
-10 -5 0 5 10 15 20 25D
iff,
Po
ve
rty R
ate
(%
)
Diff, sum of nighttime lights
Difference - 2012 and 2009 Poverty and Nighttime Lights , PHL
Generate new covariates
New RAI Estimates at the Subnational Level
SDG 9.1.1. Proportion of the rural population who live within 2 km
of an all-season road
• Indicator can be generated
using satellite images
• Method to calculate rural
access index (RAI)
requires the following data:• Population distribution
(e.g. Landscan and
WorldPop gridded
population maps)
• Road network (e.g.
OpenStreetMap)
• Road condition (e.g.
satellite image, user
reports from apps)
Validate Small Area Estimates
Source: Marchetti et al. (2015) – Small Area Model-Based Estimators Using Big
Data Sources
• Socioeconomic indicators derived from big data can be compared to similar measures obtained from survey data
• E.g. poverty estimates based on satellite image vssmall area estimates based on household expenditure surveys
Other uses of Big Data: Now-casting Food Prices in Indonesia Using Social Media Signals
Source: UN Global Pulse
Other Uses of Big Data: Algal Bloom Early Warning Alert System
Source: Group on Earth Observations, 2017
Key Considerations
• Some types of big data may not be representative of the whole population of interest (self-selection bias)
• Other types of big data are held by private sector
• Needs a different technological infrastructure
Data for Development Technical Assistance
Aims to build the capacity of DMCs in compiling disaggregated data for select indicators of the SDGs using combination of traditional and innovative forms of data in accordance with the SDGs’ “leave no one behind” principle’s granular data requirements.
ADB is collaborating with UNESCAP, PARIS21
and other development partners
Data for Development Technical Assistance
Country-Specific Case Studies on Data
Disaggregation and Big Data Analytics
• Issue: What is the benefit of
complementing conventional with
innovative data sources?
Data for Development Technical Assistance
�Technical Manual on Disaggregation of Official Statistics and SDGs
�Strategically-designed training workshops targeted to NSO staff
�Online Course Modules on SAE and Big Data Analytics
Thank you very much!
email: [email protected]
ADB's Statistics Capacity Building Efforts and Some Lessons
First statistics capacity building project in 1970s (for Singapore on national accounts)
Approximately 100 technical assistance projects on various topics since then
Statistics management and strengthening of national statistical systems
Development of statistics master plan
Strengthening of selected areas in statistics (national accounts, financial statistics, social statistics, etc. )
Improving data collection strategies (household surveys, administrative reporting system, dissemination practices)
Established partnerships with other development agencies in the region.
ADB's Statistics Capacity Building Efforts and Some Lessons
International Comparison Programme for Asia and the Pacific
Updating and Constructing the Supply and Use Tables for Selected Developing Member Economies
Statistical Business Registers (SBR) for Improved Information on Small, Medium-Sized, and Large Enterprises
Evidence and Data for Gender Equality (EDGE)
Innovative data collection methods for agricultural and rural statistics
Implementing Information and Communication Technology Tools to Improve Data Collection and Management of National Surveys in Support of the Sustainable Development Goals