The Big Data Lab for Interdisciplinary Spatially-Enabled Science (BLISS)
Goal
Setting up a large database of time series of changes in land use for most of the agricultural
areas of the planet
Agenda
●The problem●The challenge●The Solution
The problem
Mato Grosso, Brasil, May 8 – Jun 9, 1984
Mato Grosso, Brasil, Jun 10 – Jul 12, 1985
Mato Grosso, Brasil, Jul 12 – Aug 13, 1986
Mato Grosso, Brasil, May 8 – Jun 9, 1988
Mato Grosso, Brasil, Aug 13 – Sep 14, 1989
Mato Grosso, Brasil, Jul 12 – Aug 13, 1990
Mato Grosso, Brasil, Jul 12 – Aug13, 1991
Mato Grosso, Brasil, Aug 12 – Sep 13, 1992
Mato Grosso, Brasil, Jun 10 – Jul 12, 1993
Mato Grosso, Brasil, Jul 12 – Aug 13, 1994
Mato Grosso, Brasil, Jul 12 – Aug 13, 1995
Mato Grosso, Brasil, Jun 9 – Jul 11, 1996
Mato Grosso, Brasil, Jun 10 – Jul 12, 1997
Mato Grosso, Brasil, Jun 10 – Jul 12, 1998
Mato Grosso, Brasil, Jun 10 – Jul 12, 1999
Mato Grosso, Brasil, Jun 9 – Jul 11, 2000
Mato Grosso, Brasil, Jul 12 – Aug 13, 2001
Mato Grosso, Brasil, Jul 12 – Aug 13, 2003
Mato Grosso, Brasil, Jun 9 – Jul 11, 2004
Mato Grosso, Brasil, Jul 12 – Aug 13, 2005
Mato Grosso, Brasil, May 9 – Jun 10, 2006
Mato Grosso, Brasil, Jun 10 – Jul 12, 2007
Mato Grosso, Brasil, Jun 9 – Jul 11, 2008
Mato Grosso, Brasil, Jul 12 – Aug 13, 2009
Mato Grosso, Brasil, Jun 10 – Jul 12, 2010
“Remote sensing images describe landscape dynamics”
What's in an image?
2010 2011
Deforestation event detection: images and time series
Vegetation index time series
Área 1
Área 2
Área 3
source: Victor Maus (INPE)
Time series analysis of land change
Forest
PastureForest
Forest Agriculture
The data
Earth observation satellites and geosensor webs provide key information about global change…
…but that information needs to be modelled and extracted
EO data is now free…and bigImage source: NASA
Sentinels: 3 Tb/day
Is free data download our answer?
Currently, users download one snapshot at a time
Data Access Hitting a Wall
How do you download a petabyte?You don’t! Move the software to the archive
Landsat/TM (August 2007)
MODIS (November 2007)
How hard is to use MODIS?
Detection of deforestation and degradation in MODIS requires much expertise (low-resolution artifacts)
The challenge
Daily warnings of newly deforested large areas
Real-time Deforestation Monitoring
Evaluation of automated methods in one image only!
Real-time Deforestation Monitoring: how to make progress?
The practices of the research community do not match the needs of the end-users!
Real-time Deforestation Monitoring: how to make progress?
Where we want to get to
Remote visualization and method development
Big data EO management and analysis
40 years of Earth Observation data of land change accessible for analysis and modelling.
30 years of EO experience Powerful analysis engine (R)EO database tech (Terralib)Time series EO analysis
SciDB: innovative DMAS for big
arrays
INPE + IFGI
What we know
What we know we don’t know 1: Data
How to put all EO data together? How to work with different ST resolutions?Different satellites have different calibrationsGeometric and radiometric problems
How to organize scientific data in array databases?How to match data semantics to arrays?What’s the equivalent of transaction? What about concurrency control? How to support worldwide users?
What we know we don’t know 2: databases
What are good tools for space-time modelling of EO data?How to combine time series with spatial statistics?How to do space-time object and event detection? How to develop a library of methods for SciDB-R env?
What we know we don’t know 3: methods
What we know we don’t know 4: applications
How best to use ST EO data for global forest studies?How best to use ST EO data for global food studies?
The technology
Nature
“A few satellites can cover the entire globe, but there needs to be a system in place to ensure their images are readily available to everyone who needs them. Brazil has set an important precedent by making its Earth-observation data available, and the rest of the world should follow suit.”
The technology
R: The lingua franca for data analysis
Database
Array databases: all data from a sensor put together in a single array
Xy
t
result = analysis_function (points in space-time )
y
SciDB Architecture: “shared nothing”
Large data is broken into chunks Distributed server process data in parallel
Chunks
1 1 2
5 8 13
34 55 89
233 377 610
0
3
21
144
1
5
0
3
55 89
377 610
1 2
8 13
34
233
21
144
The Proposed Solution
Software goes where the data is!
SciDB: array database for big scientific data
Free satelliteimages
R: Powerful data analysis methods
Global Land Observatory: describing change in a connected
world
Unique repository of knowledge and data about global land change
40 years of LANDSAT + 12 years of MODIS + SENTINELs + CBERS
Free satelliteimages
Global Land Observatory: describing change in a connected
world
Methods for land change for forestry and
agriculture uses
59
ST arrays allows new questions:Are biofuels replacing food production in Brazil?
source: B. Rudorff, INPE
RFields
Current GIS architecture
Single data source, single data schema, layer-oriented view
Distributed architecture for GIS
Consumer
Broker
Provider(s)
- Catalog of available data sets - Location and access information - Data sets meta-data
I need:Rainfall of the Brazilian Amazon from 1999 till 2005
Data set 1 WGS84 (SciDB) Data set 2 SAD69 (GeoTIFF)
Get Data Set
Data Set
SOA
Huge diversity of Geospatial Data...
What is the data about?What is the data format?Where to find the data?
Data Discovery
ConsumerR Package (RGIS?)
- Creates an abstraction layer between users and data sources
- Provides direct access to geospatial data types (Coverage, Time Series Trajectories)
Broker(RDF Triple Store)
- Manages available data sources and their data sets- Stores data sets meta-data- Provides thematic, temporal and geospatial filters- Links data sets to other repositories (meta data enhancement)- Provides credentials for accessing data sources.
Main Challenge:- Generic vocabulary for describing data sets / data sources
Generic Fields data type:new, add obs,
domain, extent, value, combine, neigh,
apply, select, filter, reducereference systems
Generic Fields data type for Big Spatial Data
● GI representation as arrays
● Use of ADBMS routines to GI (server-side processing)
● Keep existing interfaces to data (R & Terralib, Terraview)
MOD09Q1
250 mts spatial resolution
8 days temporal resolution
4800 x 4800 pixels 3 bands (red, nir, qc) 13 years of data (since
2000) HDF format tiles
MODIS tiles
HDFs
REGION HDFs HDF Size (TB)
Binary size (TB)
Binary size ijt (TB)
Amazon(8 tiles x 46 weeks x 13years)
4784 0.30 2.41 4.81
South America(24 x 46 x 13) 14352 0.91 7.22 14.44World land area(225 x 46 x 13) 134550 8.53 67.67 135.33
Data loading
Export
MODISHDF
SciDB bin Load 1D
Array
InsertRedimApply
3DArray
1.3 min / HDF
0.8 min / HDF● Ubuntu server 12 LTS● Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz● 24 cores● RAM 125 GB
Estimated loading time
REGION HDFs Time
Amazon(8 tiles x 46 weeks x 13years)
4784 2.6 days
South America(24 x 46 x 13)
14352 7.9 daysWorld land area(225 x 46 x 13)
134550 74.2 days
SciDB performance
References
Damien Arvor et al. “Analyzing the agricultural transition in Mato Grosso, Brazil, using satellite-derived indices.” In: Applied Geography 32.2 (2012), pp. 702 –713. ISSN: 0143-6228. DOI : http://dx.doi.org/10.1016/j.apgeog.2011.08.007. URL: http://www.sciencedirect.com/science/article/pii/S0143622811001603.
Robert Battle and Dave Kolas. “Enabling the geospatial Semantic Web with Parliament and GeoSPARQL.” In: Semantic Web 3.4 (2012), pp. 355–370. URL: http://dblp.uni-trier.de/db/journals/semweb/semweb3.html#BattleK12.
J. Beddington. “Food, energy, water and the climate: A perfect storm of global events.” In: Sustainable development UK 9 (2009).
Mark Broich et al. “Time-series analysis of multi-resolution optical imagery for quantifying forest cover loss in Sumatra and Kalimantan, Indonesia.” In: International Journal of Applied Earth Observation and Geoinformation 13.2 (2011), pp. 277 –291. ISSN: 0303-2434. DOI: http://dx.doi.org/10.1016/j.jag.2010.11.004. URL: http://www.sciencedirect.com/science/article/pii/S0303243410001340.
Gilberto Camara et al. “Fields as a Generic Data Type for Big Spatial Data.” In: Steffen Fritz et al. “Highlighting continued uncertainty in global land cover maps for the user community.” In: Environmental Research Letters 6.4 (2011), p. 044005. URL: http://stacks.iop.org/1748-9326/6/i=4/a=044005.
J. Gray et al. “Scientific data management in the coming decade.” In: ACM SIGMOD Record 34.4 (2005), pp. 34–41.
P. Griffiths et al. “A Pixel-Based Landsat Compositing Algorithm for Large Area Land Cover Mapping.” In: Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 6.5 (2013), pp. 2088–2101. ISSN: 1939-1404. DOI: 10.1109/JSTARS.2012.2228167.
Patrick Griffiths et al. “Using annual time-series of Landsat images to assess the effects of forest restitution in post-socialist Romania.” In: Remote Sensing of Environment 118.0 (2012), pp. 199214. ISSN : 0034-4257. DOI: http : / / dx . doi . org / 10 . 1016 / j . rse . 2011 . 11 . 006. URL: http://www.sciencedirect.com/science/article/pii/S0034425711004019.
M. C. Hansen et al. “High-Resolution Global Maps of 21st-Century Forest Cover Change.” In: Science 342.6160 (2013), pp. 850–853.
Manolis Koubarakis et al. “Building Virtual Earth Observatories Using Ontologies and Linked Geospatial Data.” In: Proceedings of the 6th International Conference on Web Reasoning and Rule Systems. RR’12. Vienna, Austria: Springer-Verlag, 2012, pp. 229–233. ISBN : 978-3-642-33202-9. DOI: 10.1007/978-3-642-33203-6_21. URL: http://dx.doi.org/10.1007/978-3-642-33203-6_21.
J.G. Masek et al. “A Landsat surface reflectance dataset for North America, 1990-2000.” In: Geoscience and Remote Sensing Letters, IEEE 3.1 (2006), pp. 68–72. ISSN: 1545-598X. DOI: 10.1109/LGRS.2005.857030.
Ian McCallum et al. “A spatial comparison of four satellite derived 1km global land cover datasets.” In: International Journal of Applied Earth Observation and Geoinformation 8.4 (2006), pp. 246 –255. ISSN: 0303-2434. DOI: http : / / dx . doi . org / 10 . 1016 / j . jag . 2005 . 12 . 002. URL : http//www.sciencedirect.com/science/article/pii/S0303243405001212.
Edzer Pebesma. “spacetime: Spatio-Temporal Data in R.” In: Journal of Statistical Software 51.7 (Nov. 2012), ISSN : 1548-7660. URL: http://www.jstatsoft.org/v51/i07.
Stephen G. Perz. “Grand Theory and Context-Specificity in the Study of Forest Dynamics: Forest Transition Theory and Other Directions.” In: The Professional Geographer 59.1 (2007), pp. 105–114. ISSN: 1467-9272. DOI : 10.1111/j.1467-9272.2007.00594.x. URL: http://dx.doi.org/10.1111/j.1467-9272.2007.00594.x.
Toshihiro Sakamoto et al. “A crop phenology detection method using time-series {MODIS} data.” In: Remote Sensing of Environment 96.3–4 (2005), pp. 366 –374. ISSN: 0034-4257. DOI: http//dx.doi.org/10.1016/j.rse.2005.03.008. URL: http://www.sciencedirect.com/science/article/pii/S0034425705001057.
Michael Stonebraker et al. “The architecture of SciDB.” In: 23rd International Conference on Scientific and Statistical Database Management (SSDBM 2011). Ed. by Judith Bayard Cushing, James French, and Shawn Bowers. Vol. 6809. Lecture Notes in Computer Science. Springer, 2011, pp. 1–16.
Armel Thibaut Kaptue Tchuente, Jean-Louis Roujean, and Steven M. De Jong. “Comparison and relative quality assessment of the GLC2000, GLOBCOVER, {MODIS} and {ECOCLIMAP} land cover data sets at the African continental scale.” In: International Journal of Applied Earth Observation and Geoinformation 13.2 (2011), pp. 207 –219. ISSN: 0303-2434. DOI: http://dx.doi.org/10.1016/j . jag . 2010 . 11 . 005. URL: http://www.sciencedirect.com/science/article/piiS0303243410001352.
P. Vitousek et al. “Human domination of Earth’s ecosystems.” In: Science 277 (2007), pp. 494–500.
Xiaoyang Zhang et al. “Monitoring vegetation phenology using {MODIS}.” In: Remote SensingEnvironment 84.3 (2003), pp. 471 –475. ISSN: 0034-4257. DOI: http://dx.doi.org/10.1016/S0034-4257(02)00135-9. URL: http://www.sciencedirect.com/science/article/pii/S0034425702001359.
Zhe Zhu, Curtis E. Woodcock, and Pontus Olofsson. “Continuous monitoring of forest disturbance using all available Landsat imagery.” In: Remote Sensing of Environment 122.0 (2012). Landsat Legacy Special Issue, pp. 75 –91. ISSN : 0034-4257. DOI: http : / / dx . doi . org / 10 . 1016 / jrse . 2011 . 10 . 030.