Series espacio-temporales, NDVI, MODIS, Componentes Miguel ... · Series espacio-temporales, NDVI, MODIS, HMM, Componentes cuasiperiódicos, Fenología, Detección de zonas incendiadas

Series espacio-temporales, NDVI, MODIS, HMM, Componentes cuasiperiódicos, Fenología, Detección de zonas incendiadas

Miguel Antonio García Ferrández

Departamento de Matemática Aplicada

Análisis de series de datos de

teledetección de índices de vegetación

Memoria presentada por el licenciado en Ciencias

Matemáticas Miguel Antonio García Ferrández para optar al

grado de Doctor por la Universidad de Alicante

Alicante, 2015

Francisco Rodríguez Mateo, Profesor Titular de la Universidad de

Alicante,

HACE CONSTAR:

Que el trabajo descrito en la presente memoria, titulado: “Análisis de

series de datos de teledetección de índices de vegetación”, ha sido

realizado bajo su dirección por Miguel Antonio García Ferrández en la

Universidad de Alicante y reúne todos los requisitos necesarios para su

aprobación como Tesis Doctoral.

Alicante, diciembre de 2014

Dr. Francisco Rodríguez Mateo

It’s not the same to solve for a hundred, as to solve for a thousand

E.W. Dijkstra, in

Structured Programming

O.-J. Dahl, E.W. Dijkstra, C.A.R. Hoare

90% of the scheduled time solves 90% of the problem;

the remaining 10% of the problem takes another 90% of the time

Murphys Laws of Computer Science

La diferencia entre manjar y bazofia son tres minutos de cocción

Primera Ley de García, de la Cocina con Pasta

Índice

Agradecimientos iii

Capítulo 1

Objetivos y resumen de la tesis

Capítulo 2

Modelos ocultos de Markov para la determinación de parámetros

fenológicos

Capítulo 3

Ajuste de funciones lineales a trozos a series temporales de índices de

vegetación

Capítulo 4

Análisis tiempo-frecuencia de series de datos de índices de vegetación

mediante componentes cuasi-periódicos

Capítulo 5

Detección de áreas afectadas por incendios forestales

Conclusiones generales

Conceptos básicos sobre modelos ocultos de Markov

Agradecimientos

El trabajo desarrollado en esta tesis se inició en el marco del proyecto “Análisis

espacio-temporal de datos de índices de vegetación derivados del sistema de

teledetección MODIS. Aplicación a la determinación de parámetros fenológicos y al

seguimiento de la regeneración post-incendio en la Comunidad Valenciana”

(GVPRE/2008/310), financiado por la Generalitat Valenciana.

El grupo de investigación de “Análisis de datos y modelización de procesos en Biología

y Geociencias”, en cuyo seno se ha realizado el trabajo, ha recibido financiación de la

Universidad de Alicante a través de las sucesivas convocatorias anuales de ayudas a

grupos de investigación (VIGROB-162).

Algunos de los trabajos presentados en esta memoria se han desarrollado en

colaboración con investigadores de otros grupos de investigación, de la Universidad de

Alicante y de otras universidades y centros de investigación. En los apartados de

Agradecimientos de los capítulos correspondientes se especifican los proyectos en

cuyo marco se han llevado a cabo estas colaboraciones y las correspondientes

entidades financiadoras.

Capítulo 1:

Objetivos y estructura de la tesis

El análisis de la dinámica de la vegetación a partir de datos de teledetección constituye

actualmente una herramienta esencial en agricultura, ecología y otras ciencias

ambientales. Desde el año 2000 se dispone de los datos de observación por satélite de

distintas variables de ecosistemas terrestres proporcionados por los sensores MODIS

(Moderate Resolution Imaging Spectroradiometers). Entre los productos derivados, se

encuentran índices de vegetación como los denominados NDVI (Normalized Difference

Vegetation Index) y EVI (Enhanced Vegetation Index), disponibles cada 16 días con una

resolución de 250mx250m, que pueden ser utilizados, entre otras aplicaciones, para

analizar la evolución de cultivos, realizar cartografías de tipos funcionales de

vegetación o estudiar diversos parámetros fenológicos.

El desarrollo de métodos efectivos de análisis de este tipo de series de datos espacio-

temporales representa una cuestión clave en las distintas aplicaciones y constituye el

objetivo general de este trabajo de tesis. Entre las aplicaciones que pueden derivarse,

se plantea la utilización de los métodos desarrollados en dos aspectos específicos de

interés para el estudio del medio ambiente en la Comunidad Valenciana, como son la

determinación de parámetros fenológicos de distintos tipos de vegetación y el análisis

de la dinámica de la regeneración de la vegetación después de incendios forestales.

Los modelos ocultos de Markov (HMMs), que ya habían sido utilizados previamente

por el grupo de “Análisis de datos y modelización de procesos en Biología y

Geociencias” de la Universidad de Alicante en el contexto del análisis del patrón

espacial de la vegetación, ofrecen la posibilidad de incorporar en la estructura del

modelo utilizado información a priori sobre la dinámica del tipo de vegetación que se

quiere modelizar, permitiendo así el desarrollo de modelos específicos para distintos

tipos de vegetación y que estén adaptados a las condiciones ambientales locales.

En el Capítulo 2 se explora la utilización de los HMMs para la determinación de

parámetros fenológicos, mediante su aplicación en dos zonas contrastadas de la

Cominidad Valenciana. El objetivo es poder definir, de forma más precisa que con los

ajustes generales de tipo polinómico usuales, indicadores fenológicos como las fechas

de inicio y fin de los periodos de aumento y disminución de biomasa fotosintética,

permitiendo con ello la aplicación de este tipo de análisis para estudiar de las posibles

modificaciones en estos parámetros en relación con covariables de interés (p. ej.,

datos de precipitación, temperatura o altitud) y las posibles tendencias que puedan

estar relacionadas con fenómenos de cambio global.

Capítulo 1

El ajuste a una serie de datos de índices de vegetación como el NDVI de modelos de

tipo HMMs supone la existencia de distintos estados de la vegetación, en distintos

periodos de tiempo de la serie, caracterizados, cada uno de ellos, por una cierta

distribución de probabilidad que determina la variación en los valores observados de

NDVI a lo largo del tiempo. Un modelo de referencia simple, que correspondería

esencialmente a considerar los valores medios de incremento o disminución del NDVI,

consistiría en un modelo continuo lineal a trozos, de modo que según el estado de la

vegetación se tendría una variación media constante del NDVI, con cambios de

pendiente entre segmentos contiguos al cambiar de un estado al siguiente.

En el Capítulo 3 se considera el problema del ajuste de modelos lineales a trozos a

series de datos con un alto número de puntos y con la posibilidad de un alto número, a

priori desconocido, de puntos de cambio. El problema se aborda mediante un

algoritmo de tipo iterativo, que puede ser adaptado en función del tipo de información

previa que se tenga sobre el sistema analizado o de la complejidad del modelo que se

quiera ajustar.

En el estudio de la dinámica de la vegetación es preciso separar las posibles

tendencias, debidas por ejemplo a la regeneración natural tras un incendio forestal o

tras un periodo de estrés hídrico, de las oscilaciones estacionales, que pueden no estar

definidas de forma precisa. Una característica que es preciso tener en cuenta de los

datos de teledetección como los suministrados por MODIS es el enorme volumen de

datos de los que puede disponerse para los análisis, pues las series temporales están

repetidas para cada pixel, en general con una alta correlación entre píxeles próximos.

Sólo si se dispone de métodos especialmente eficientes para tratar de forma conjunta

con un número elevado de datos es posible aprovechar esta redundancia espacial,

pues de otra forma, incluso para los análisis en teoría más simples, se termina abocado

al análisis de series exclusivamente temporales utilizando valores correspondientes a

promedios espaciales.

En el Capítulo 4 se presenta un modelo para series de datos que incluyen

componentes seculares y componentes cíclicas no constantes, denominadas

componentes cuasi-periódicas. El objetivo es obtener estimaciones más ajustadas a la

realidad que las proporcionadas por los modelos de análisis espectral con

componentes periódicas constantes, permitiendo con ello el análisis de las relaciones

entre las variaciones en los parámetros que definen las variaciones estacionales y las

covariables o factores ecológicos de interés.

Objetivos y estructura de la tesis

Los incendios forestales constituyen una de las perturbaciones naturales comunes en

los ecosistemas mediterráneos y uno de los problemas ambientales principales en

zonas como el sureste español, donde la frecuencia de incendios se ha incrementado

notablemente en el último medio siglo, debido fundamentalmente a factores

relacionados con la actividad humana, como el aumento de combustibles debido al

abandono de zonas agrícolas o el incremento de igniciones provocadas o debidas a

descuidos o actividades agrícolas o recreativas.

Los métodos de análisis de la dinámica de la vegetación, como los mencionados

anteriormente, pueden ser aplicados a zonas incendiadas, permitiendo estudiar la

dinámica de la regeneración post-incendio en relación con distintos factores

ambientales. Aunque desde hace algunos años existen en lugares como la Comunidad

Valenciana registros oficiales de zonas incendiadas, el análisis extensivo de las

condiciones de regeneración requiere la identificación de zonas incendiadas a partir de

los propios datos de teledetcción, para lo que se han desarrollado distintos tipos de

algoritmos y métodos.

En el Capítulo 5 se presenta un método en dos fases para la detección de áreas

incendiadas, que puede ser aplicado de forma eficiente en zonas extensas. En primer

lugar se explica el funcionamiento del método con un ejemplo detallado y a

continuación se analizan sus propiedades mediante su aplicación en una amplia zona

de la Comunidad Valenciana y la comparación entre las zonas incendiadas detectadas

con el método y las registradas en la base de datos de incendios de la Dirección

general de Prevención, Extinción de Incendios y Emergencias de la Generalitat

Valenciana. El objetivo es disponer de un método eficiente, que pueda ser utilizado de

forma automática o semiautomática en áreas amplias con distintos tipos de

vegetación, de modo que se facilite el estudio de los factores ambientales que afectan

a la regeneración vegetal tras los incendios forestales.

En el capítulo final del trabajo se presenta un resumen de conclusiones generales, en

donde se recapitulan los distintos resultados obtenidos en los distintos apartados. Por

último, se incluye un apéndice en donde se exponen los conceptos básicos sobre

HMMs, con el fin de facilitar la comprensión del trabajo presentado en el Capítulo 2.

Los trabajos recogidos en los Capítulos 2 a 5 han sido presentados en distintos

congresos internacionales y se han traducido en distintas publicaciones, como se

indica a continuación.

Capítulo 1

García, M.A.; Moutahir, H.; Bautista, S. y Rodríguez, F. Determination of phenological

parameters from MODIS derived NDVI data using hidden Markov models. En:

Proceedings of SPIE, Volume 9229. D.G. Hadjimitsis, K. Themistocleous, S.

Michaelides, G. Papadavid, eds. SPIE Press - The International Society for

Optical Engineering, 2014, Article number 92291K.

García, M.A. y Rodríguez, F. An Iterative Algorithm for Automatic Fitting of Continuous

Piecewise Linear Models. WSEAS Transactions on Signal Processing, 4(8): 474

– 483, 2008.

García, M.A. y Rodríguez, F. HANDFIT: An Algorithm for Automatic Fitting of

Continuous Piecewise Regression, with Application to Feature Extraction from

Remote Sensing Time Series Data. En: New Aspects of Signal Processing,

Computational Geometry and Artificial Vision. Mastorakis, N.E.; Demiralp, M.;

Mladenov, V.; Bojkovic, Z. (eds.), WSEAS Press, 2008, pp. 28-33.

García, M.A. y Rodríguez, F. Analysis of MODIS NDVI time series using quasi-periodic

components. En: Proceedings of SPIE, Volume 8795. D.G. Hadjimitsis, K.

Themistocleous, S. Michaelides, G. Papadavid, eds. SPIE Press - The

International Society for Optical Engineering, 2013, pp. 879523-1 - 879523-8.

García, M.A.; Alloza, J.A.; Bautista, S. y Rodríguez, F. Detection and analysis of burnt

areas from MODIS derived NDVI time series data. En: Proceedings of SPIE,

Volume 8795. D.G. Hadjimitsis, K. Themistocleous, S. Michaelides, G.

Papadavid, eds. SPIE Press - The International Society for Optical Engineering,

2013, pp. 879521-1 - 879521-9.

García, M.A.; Alloza, J.A.; Mayor, A.G.; Bautista, S. y Rodríguez, F. Detection and

mapping of burnt areas from time series of MODIS derived NDVI data in a

Mediterranean region. Central European Journal of Geosciences, 6(1): 112-

120, 2014.

Capítulo 2:

Modelos ocultos de Markov para la determinación

de parámetros fenológicos

OBJETIVOS

En este capítulo se explora la utilización de los HMMs para la determinación de

parámetros fenológicos, mediante su aplicación en dos zonas contrastadas de la

Cominidad Valenciana. El objetivo es poder definir, de forma más precisa que con los

ajustes generales de tipo polinómico usuales, indicadores fenológicos como las fechas

de inicio y fin de los periodos de aumento y disminución de biomasa fotosintética,

permitiendo con ello la aplicación de este tipo de análisis para estudiar de las posibles

modificaciones en estos parámetros en relación con covariables de interés (p. ej.,

datos de precipitación, temperatura o altitud) y las posibles tendencias que puedan

estar relacionadas con fenómenos de cambio global.

RESUMEN

Las características fenológicas de la vegetación son elementos fundamentals para

comprender la respuesta de la vegetación en distintos scenarios de cambio climático,

así como indicadores de procesos activos de aridez. La determinación de los

parámetros fenológicos para distintos tipos de vegetación en áreas grandes puede

ayudar a evaluar los impactos actuales o futuros del cambio climático en los

ecosistemas, especialmente en los más vulnerables. Los datos de teledetección, como

los proporcionados por MODIS, han sido usados para extraer características

fenológicas de series de datos de índices de vegetación, normalmente mediante el uso

de técnicas de suavizado y ajuste de modelos polinomiales. En el trabajo presentado

en este capítulo se utilizan modelos ocultos de Markov (HMMs, por sus iniciales en

inglés) para determinar parámetros fenológicos a partir de series de datos de NDVI

procedentes de MODIS en una región semiárida mediterránea. Se aplican diferentes

tipos de HMMs en areas seleccionadas con comunidades vegetales bien definidas y se

discute el potencial de los HMMs para el análisis fenológico automático a gran escala.

Modelos ocultos de Markov

INTRODUCTION

The determination of the phenological parameters of vegetation communities,

characterising the vegetation dynamics, at different scales, from local and regional to

global scopes, constitutes a key topic in understanding the functioning of terrestrial

ecosystems. Describing and analysing phenological characteristics for different types

of vegetation in large areas help evaluate current impacts of climate change and other

perturbations in ecosystems, as well as predict vegetation responses in different

climate change scenarios1,2.

Remote sensing from different space-borne sensors such as AVHRR, Landsat or MODIS

has been used to describe land cover phenology, by extracting phenological

characteristic from time series data of vegetation indices such as the normalized

difference vegetation index (NDVI)3-8.

A wide variety of methods have been used to analyze remote sensed NDVI time series

data in order to extract phenological metrics that describe the phenological traits of

the vegetation dynamics. Most frequently, some type of smoothing method is first

applied to reduce the usually high level of noise present in the raw data, and then

different strategies based on polynomial fittings or spectral analyses are used to

characterize seasonal changes, thus providing phenological parameters as the onset of

the growing season9-13.

Hidden Markov models (HMMs) are a modelling technique for sequential data analysis,

originally developped in the field of automatic speech recognition, where they

constitute a basic and widespread tool14. These models have pervaded other scientific

and technical fields, as climatology15 or bioinformatics16-17. The application of HMMs to

phenological analysis of remote sensed vegetation indices was proposed two decades

ago18, but alternative methods of analysis were preferred, and there was almost no

other example of application until very recently19.

The objective of this work was to apply different types of HMMs to determine

phenological parameters from MODIS derived NDVI time series data in a semiarid

Mediterranean region, analysing their potentials for automatic phenological analysis in

different types of vegetation.

Capítulo 2

METHODS AND RESULTS

The tutorial by Rabiner14 presents a general introduction to HMMs and the algorithms

to analyse them. Further details can be found in the monographs by Cappé et al.20 and

MacDonald and Zucchini21. In the next subsection we briefly present the main aspects

of the particular HMMs used in this work.

Elements and structure of the HMMs applied to NDVI data for phenological analysis

Given a sequence of observations, tY , which may consists in a time series of NDVI

values for a given pixel, we assume the existence of a corresponding sequence of

hidden states, tS , which may represent the different phenological states of the

vegetation (Fig. 1).

Figure 1. Scheme of the structure of a first order HMM. Temporal sequence of

hidden states (St) and corresponding observed states (Yt). Lack of a direct

arrow connecting two states implies their conditional independence.

For instance, assuming a simple phenological model with three states (Fig.2, left and

centre), representing growing, S+ , declining, S− , or stationary periods, 0S , with,

respectively, general positive, negative or null changes in NDVI values, there is not a

perfect correspondence between the unobserved states of the vegetation and the

observed NDVI values at particular moments. Thus, the sequence of hidden states has

to be inferred from the sequence of observations, or emissions, as they are usually

denoted.

Figure 2. Topologies of the HMMs used in this work. Three completely

connected states (left), three states with forbidden transitions or blocking

(centre), and four states with blocking (right).

To carry out this inference, it is needed a plausible model describing the transitions

between (hidden) states and the probabilities of the emissions given the states. We

assumed first order HMMs, which means that the transitions between states depend

only on the current state, and not on the past history. In the case of continuous

emissions, as is the case with NDVI values or changes, appropriate density functions

have to be selected, which we assumed to be normal. The so called topology of the

HMM is then complete by specifying the number of states and the forbidden

transitions, if any, between them.

The different HMM topologies used in this work are presented in Fig. 2. The first

model, denoted 3S, (Fig. 2, left) allows for all possible transitions between its three

states, while the second one, 3SB, (Fig. 2, centre) only allows for transitions between

different states from stationary, or baseline, to upward, then to downward, and then

back to baseline. The last model, 4SB, (Fig. 2, right) includes two different stationary

states, connecting the upward and downward states.

Once the topology has been selected, the parameters of the model can be fitted using

an expectation maximization algorithm, and then the sequence of hidden states can be

inferred using a dynamic programming algorithm known as Viterbi algorithm14,20,21.

NDVI data and vegetation classes

MODIS data were downloaded from the NASA website (currently accessed through the

Reverb data gateway, http://reverb.echo.nasa.gov/reverb/). We used the NDVI 16-

Capítulo 2

days composite band from a time series of MOD13Q1 MODIS/Terra product at 250m

resolution (tile h17v05), from February 2000 to December 2012.

We selected a large area in the centre of the Valencia province, in the Valencia region

in Eastern Spain, and automatically classified the pixels in the area in clusters according

to the similarities in their distributions of NDVI changes (Fig. 3).

Figure 3. Cluster analysis based on similarities of NDVI changes in an area in

the centre of the Valencia province, Eastern Spain. Different colours represent

different clusters. Dark blue at right corresponds to the Mediterranean Sea.

Phenological analysis of NDVI time series data using HMMs

For the analysis carried out in this work, two different zones were selected. The first

one, composed of three clusters with rather similar dynamics, comprised a zone of rice

crops located in the Albufera area. Two of these clusters are displayed in Fig. 4, the

third one corresponding to the border of the zone, with a slightly less well-defined

dynamics. The second zone in this study corresponds to natural vegetation with a

much more diffused dynamics and much larger variations (Fig. 5).

The zone with rice crops depicts a highly coherent vegetation dynamics (Fig. 4), and

hence it is expected that any reasonable method should provide sufficiently good

results.

Figure 4. Top: Two selected clusters (white) from Fig. 3 corresponding to rice

crops in the Albufera area. Bottom: For each of the selected clusters, mean

NDVI (x104) and percentiles time series values, reconstituted from their

distributions of NDVI changes.

Figure 5. Selected clusters (white) corresponding to natural vegetation in the

Valencia province (left), and mean (red) and reconstituted from the fitting of

the four states HMM (blue) NDVI (x104) time series values (right).

Capítulo 2

Table 1. Estimated values of the parameters for HMM models 3S and 3SB

fitted to changes in NDVI (x104) for rice crops in the Albufera area.

Model Transition probabilities Emissions mean and

standard deviation From: To:

3S 0S S+ S− µ̂ σ̂

0S 0.8809 0.0545 0.0646 1.2388 127.2428

S+ 0.1618 0.8363 0.0019 576.5619 345.7381

S− 0.1387 0.0063 0.8550 -447.7662 231.3246

3SB 0S − S+ S− µ̂ σ̂

0S 0.9021 0.0979 0 -4.9107 117.6908

S+ 0 0.8420 0.1580 439.5495 383.9313

S− 0.1360 0 0.8640 -368.2144 271.1641

Though, it also gives the opportunity to compare different methods, as the different

HMM models here considered, which may shed light on fine differences between

them. We will focus the analysis on the onset of the growing season, but other

phenological parameters are also readily obtained from the sequence of inferred

hidden states.

The fitted values of the parameters for the different models considered, 3S, 3SB and

4SB are presented in Tables 1 and 2. For each hidden state, the emissions, i.e., the

changes in NDVI values, are modelled as a normal distribution with the corresponding

mean and standard deviation.

Table 2. Estimated values of the parameters for HMM model 4SB fitted to

changes in NDVI (x104) for rice crops in the Albufera area (RC) and a zone

with natural vegetation in the Valencia province (NV).

Transition probabilities Emissions mean and standard

deviation From: To:

RC 0S − S+ 0S + S− µ̂ σ̂

0S − 0.8948 0.1052 0 0 -16.0189 118.8479

S+ 0 0.8168 0.1832 0 562.3302 352.8487

0S + 0 0 0.8023 0.1977 42.3842 133.4238

S− 0.1365 0 0 0.8635 -424.4757 240.7890

NV 0S − S+ 0S + S− µ̂ σ̂

0S − 0.7323 0.2677 0 0 -6.1832 51.0317

S+ 0 0.7039 0.2961 0 143.9279 99.8844

0S + 0 0 0.1310 0.8690 3.1719 51.6534

S− 0.3190 0 0 0.6810 -143.8033 99.4484

The values for models 3S and 3SB for the zone of rice crops are presented in Table 1.

The values of the fitted parameters for model 4SB for both zones, rice crops and

natural vegetation, is presented in Table 2. Compared with the corresponding values

for the rice crops, the parameters in the zone with natural vegetation reflect much less

Capítulo 2

defined states, with higher probabilities of changes between different states, and

much smaller absolute values of changes in the upward and downward states.

Figure 6. Start of season (SOS), as MODIS time sections, estimated using HMM

models 3S (green) and 4SB (blue), for rice crops in the Albufera area for the

years 2000-20012.

Comparison of the estimated dates for the onset of the growing period obtained

applying models 3S and 4SB are showed in Fig. 6. Although both estimations are highly

correlated in each pixel, with a global correlation coefficient of 0.89ρ = , the more

complex model 4SB coherently and significantly provide earlier detections of the start

of the growing season.

CONCLUSIONS

In this work, different HMM models were applied in two types of vegetations in a

Mediterranean area to define phenological characteristics from remote sensed NDVI

MODIS derived time series data.

HMMs can be efficiently applied with large sets of data, and they can provide

consistent results when homogeneous sets of pixels, in terms of their vegetation

dynamics, are modelled. This condition can be fulfilled by either previous classification

from land cover and vegetation mapping or by automatic clustering of pixels based on

the distributions of NDVI changes.

The estimated parameters of the HMMs, transition probabilities and means and

standard deviations of emissions, reflect the strengths of the different phenological

states, and could be used to compare the dynamics of vegetation communities

affected by perturbations or experimental treatments.

The use of HMMs allows the incorporation of previous knowledge of the system in its

modelling, by appropriately selecting the number of states or the blocked transitions,

thus providing more specific models for different vegetation communities, that may

result in a better determination of the parameters of interest.

Once a suitable HMM model is selected, the two steps involved in the estimation

process, estimation of parameters and inference of the sequence of hidden states, are

carried out at different levels of data processing. For the estimation of the parameters,

a whole set of pixels, and also their whole time series values, can be used, thus

providing highly consistent estimates. However, the inference of the hidden states

proceeds at the pixel level, and it can also be carried out in a yearly basis, thus allowing

for individual pixel analysis in relation with particular environmental factors.

More complex types of HMMs than those used in this work could also be applied, for

instance to include specific modelling of the duration of the different phases of the

seasons by using higher order or semi-Markov models, and they could also result in

more accurate estimations. Work in progress includes the comparison of the

estimations provided by different HMMs with the phenological parameters obtained

with other well-established smoothing and curve fitting methods.

ACKNOWLEDGEMENTS

This work was supported by the research projects FEEDBACK (CGL2011-30515- C02-

01), funded by the Spanish Ministry of Innovation and Science, and CASCADE

(GA283068), funded by the EC 7FP.

REFERENCES

[1] Cleland, E. E., Chuine, I., Menzel, A., Mooney, H. A. and Schwartz, M. D.,

“Shifting plant phenology in response to global change,” Trends in Ecology &

Evolution 22, 357-365 (2007).

Capítulo 2

[2] Galford, G.L., Mustard, J.F., Melillo, J., Gendrin, A., Cerri, C.C. and Cerri, C.E.P.,

“Wavelet analysis of MODIS time series to detect expansion and intensification

of row-crop agriculture in Brazil,” Remote Sensing of the Environment 112, 576-

587 (2008).

[3] Moulin, S., Kergoat, L., Viovy, N. and Dedieu, G., “Global-scale assessment of

vegetation phenology using NOAA/AVHRR satellite measurements,” Journal of

Climate 10, 1154-1170 (1997).

[4] Zhang, X., Friedl, M.A., Schaaf, C.B., Strahler, A.H., Hodges, J.C.F., Gao, F., Reed,

B.C. and Huete, A., “Monitoring vegetation phenology using MODIS,” Remote

Sensing of the Environment 84, 471-475 (2003).

[5] Fontana1, F., Rixen, C., Jonas, T., Aberegg1, G. and Wunderle1, S., “Alpine

grassland phenology as seen in AVHRR, VEGETATION, and MODIS NDVI time

series - a comparison with in situ measurements,” Sensors 8, 2833-2853 (2008).

[6] Fisher, J. I. and Mustard, J. F., “Cross-scalar satellite phenology from ground,

Landsat, and MODIS data,” Remote Sensing of the Environment 109, 261-273

(2007).

[7] Ganguly, S., Friedl, M. A., Tan, B., Zhang, X. and Verma, M. “Land surface

phenology from MODIS: Characterization of the Collection 5 global land cover

dynamics product,” Remote Sensing of Environment 114, 1805-1816 (2010).

[8] Hmimina, G., Dufrêne, E., Pontailler, J.-Y., Delpierre, N., Aubinet M., Caquet, B.,

de Grandcourt, A., Burban, B., Flechard, C., Granier, A., Gross, P., Heinesch, B.,

Longdoz, B., Moureaux, C., Ourcival, J.-M., Rambal, S., Saint André, L. and

Soudani, K., “Evaluation of the potential of MODIS satellite data to predict

vegetation phenology in different biomes: An investigation using ground-based

NDVI measurements,” Remote Sensing of Environment 132, 145-158 (2013).

[9] Jönsson, P. and Eklundh, L., “Seasonality extraction and noise removal by

function fitting to time-series of satellite sensor data,” IEEE Transactions on

Geoscience and Remote Sensing 40, 1824-1832 (2002).

[10] Jönsson, P. and Eklundh, L., “TIMESAT– a program for analyzing time-series of

satellite sensor data,” Computers & Geosciences 30, 833-845 (2004).

[11] Bradley, B. A., Jacob, R. W., Hermance, J. F. and Mustard, J. F., “A curve fitting

procedure to derive inter-annual phenologies from time series of noisy satellite

NDVI data,” Remote Sensing of Environment 106, 137-145 (2007).

[12] Moody, A. and Johnson, D., “Land-surface phenologies from AVHRR using the

discrete Fourier transform,” Remote Sensing of Environment 75, 305-323 (2001).

[13] Wagenseil, H. and Samimi, C., “Assessing spatio-temporal variations in plant

phenology using Fourier analysis on ndvi time series: results from a dry savannah

environment in Namibia,” International Journal of Remote Sensing 27(16), 3455-

3471 (2006).

[14] Rabiner, L.R., “A tutorial on hidden Markov models and selected applications in

speech recognition,” Proceedings of the IEEE 77(2), 257-286 (1989).

[15] Bellone, E. Hughes, J.P. and Guttorp, P., “A hidden Markov model for

downscaling synoptic atmospheric patterns to precipitation amounts,” Climate

Research 15, 1-12 (2000).

[16] Baldi, P., Chauvin, Y., Hunkapiller, T. and McClure, M.A., “Hidden Markov models

of biological primary sequence information,” Proceedings of the National

Academy of Science USA 91, 1059-1063 (1994).

[17] Winters-Hilt, S., “Hidden Markov model variants and their application,” BMC

Bioinformatics 7(Supl. 2), S14, (2006).

[18] Viovy, N. and Saint, G., “Hidden Markov Models Applied to Vegetation Dynamics

Analysis Using Satellite Remote Sensing,” IEEE Transactions on Geoscience and

Remote Sensing 32(4), 906-917 (1994).

[19] Shen, Y., Wu, L., Di, L., Yu, G., Tang, H., Yu, G. and Shao, Y., “Hidden Markov

Models for Real-Time Estimation of Corn Progress Stages Using MODIS and

Meteorological Data,” Remote Sensing 5, 1734-1753, (2013).

[20] Cappé, O., Moulines, E. and Rydén, T., [Inference in Hidden Markov Models],

Springer, (2005).

[21] MacDonald, I. L. and Zucchini, W., [Hidden Markov and Other Models for

Discrete-Valued Time Series], Chapman and Hall, London, (1997).

Capítulo 3:

Ajuste de funciones lineales a trozos a series

temporales de índices de vegetación

OBJETIVOS

En este capítulo se considera el problema del ajuste de modelos lineales a trozos a

series de datos con un alto número de puntos y con la posibilidad de un alto número, a

priori desconocido, de puntos de cambio. El problema se aborda mediante un

algoritmo de tipo iterativo, que puede ser adaptado en función del tipo de información

previa que se tenga sobre el sistema analizado o de la complejidad del modelo que se

quiera ajustar.

RESUMEN

Los modelos lineales continuos a trozos representan herramientas útiles para extraer

las características básicas sobre los patrones de variación en series de datos complejas.

En el trabajo presentado en este capítulo se desarrolla un algoritmo iterative para el

ajuste de modelos de regresión lineal continua a trozos con estimación automática de

los puntos de cambio. El algoritmo necesita partir de unos valores inicializales en

cuanto al número y posición de los puntos de cambio, que se pueden obtener

mediante diferentes métodos, y a continuación procede mediante un ajuste iterativo

similar a los desplazamientos del método de Newton para la obtención de raíces de

funciones. El algoritmo se puede aplicar a altos volúmenes de datos, con muy rápida

convergencia en la mayoría de los casos, permitiendo la simplificación del modelo en

cuanto a la reducción del número de puntos de cambio al identificar puntos

suficientemente próximos. Se presentan ejemplos de aplicaciones para la extracción

de características fenológicas a partir de series de datos de índices de vegetación por

teledetección.

Ajuste de funciones lineales a trozos

INTRODUCTION

Remote sensing of vegetation dynamics, soil properties, and other ecosystem variables

and indicators constitutes a key tool in ecology, agriculture and environmental studies

at several temporal and spatial scales1,2. Many different techniques can be used to

analyze this kind of data3-5, and the development of efficient methods to identify

patterns and extract features from remote sensing derived spatio-temporal data series

is a key point in the applications6.

Time series of vegetation indices, such as the normalized difference vegetation index

(NDVI), are derived products from data of Earth observing systems like the Moderate

Resolution Imaging Spectroradiometers (MODIS)7 on the Terra platform. MODIS

derived NDVI data are available from the year 2000, every 16 days for a global grid of

pixels with a maximum resolution of 250 m, and they are just an example, as

paradigmatic as it could be, of many different fields where huge amounts of time

series data are produced and need to be analyzed with efficient methods capable of

extracting their main features, some of which may be readily noticeable to a human

observer.

Figure 1. NDVI time series values (x 104) for an area of semiarid vegetation in southeast

Spain. Time values (abscissas) are number of days, starting from 01/01/2000.

A common characteristic in NDVI time series is the presence of different regions with

increasing and decreasing trends, which correspond to periods of growth and decline

of vegetation. This pattern is well-defined in Figure 1, which shows data -four years

period- for a set of contiguous pixels in a semiarid area of the Valencia region

Capítulo 3

(southeast Spain), although it may not be so clear in the individual curves, i.e., in the

data series for each pixel.

A simplified model of the functional dependence suggested by this type of data is

shown in Figure 2, a continuous polygonal which characterizes the sequence of growth

rate changes, providing the position of the change-points and the slopes of the linear

segments.

Figure 2. NDVI time series values from Figure 1 with a continuous piecewise linear

model fitted to the data.

The problem of fitting a continuous piecewise linear model to a series of data is

referred to as piecewise8 or segmented9 regression, linear regression with multiple

structural changes10 or regimes, or in the case of two segments as broken-line or two-

phase11,12 regression. In the classical statistical framework, this problem has been

tackled as a particular case of nonlinear regression13, or with specific approaches

aimed at minimizing the sum of squares of errors, yielding least squares estimates of

the parameters, or maximum likelihood estimates in the case of independent

identically distributed normal errors14-16 or under particular hypothesis on the error

structure9,17 .

When the number and positions of the change-points are known, the estimation of the

model is straightforward. Segmented linear models with change-point estimation

without the continuity requirement are special cases of model trees18-19, where

induction methods such as Quinlan's M520 are well-designed for predictive

performance with many regressors. The restrictions imposed by the continuity

condition, and the discontinuities in the derivative implied by a polygonal model, make

the estimation of the model by minimizing some form of risk function much more

difficult. Some authors use approximate smooth models to avoid these problems21,22,

while the more direct algorithms are mainly based on grid search16 or some form of

greedy exploration of the possible change-points15. Other computationally intensive

approaches include bayesian23 and fuzzy methods24.

Piecewise linear models are usually approximations to complex real phenomena, that

allow to extract the basic features of the data, and so find application in many

different fields, as in economy10, ecology25 or cancer research26. The objective of this

work is to efficiently fit a continuous polygonal model to large datasets, with a

computational approach that does not intend to yield a global optimum for some

measure of adjustment, but to capture in an objective manner the main trends of the

data, providing estimates for the relevant parameters in the problem considered.

DESCRIPTION OF THE METHOD

The method proposed, which we denote with the acronym HANDFIT, standing for

Hinges Adjustment by Newton-like Displacements FIT, consists of two phases. First, an

initial guess about the number and positions of the change-points or hinges is made,

for which various alternatives suited for different particular problem are considered.

Then, an iterative process to displace these hinges, analogous to Newton method for

root finding, is applied. Several parameters of the algorithm can be adjusted so that

the type of features that are of main interest be extracted, although a completely

automated functioning is also possible.

Iterative Adjustment of the Hinges

Assume we have an initial estimation of the abscissas of the increasing sequence of

change-points, { } 1i i nx

, in a fixed domain with endpoints 0a x= and 1nb x += . The

iterative algorithm proceeds in two steps. Firstly, for each interval [ ]1,i ix x + a line

segment is adjusted to the data by ordinary least squares, although any other fitting

method could be used as well (Figure 3).

Capítulo 3

Figure 3. A section of the NDVI data from Figure 1, showing a step of the fitting

process, where for the current selection of change-points (indicated by vertical lines) a

line is fitted in each segment (bars) and their intersections obtained as new change-

points (crossed circles).

Secondly, the intersection points between consecutive segments are computed, so

that their abscissas define the new change-points, and the process is repeated until

convergence is achieved (Figure 4). The stopping criterion defining convergence is

simply that a fixed threshold for the displacements of the hinges not be exceeded, and

for sound choices of the threshold it is usually reached after only a few iterations.

Figure 4. Continuous piecewise linear model fitted to the data from Figure 3, given by

the final convergent solution of the algorithm.

The second step in the algorithm is essentially a Newton iteration, as the new points

are the intersection of affine varieties, which in the simplest Newton method, the

Newton-Raphson algorithm to compute zeros of real functions, are the tangent line

and the X-axis. Newton-like algorithms are in most cases very fast, but it is well-known

that these type of methods may produce abrupt jumps, which in our problem could

yield non-admissible values when the ordering of the hinges is not preserved (Figure

Figure 5. Example of data where non-admissible values would be produced at an

iteration step.

To tackle this eventuality, if ˆix is to be the abscissa of the new i-hinge, the relative

increments,

ˆ ˆˆ ˆ, ; , ;i i i i

i i i ii i i i

x x x xx x x x

x x x x+ +

− −> <− −

are always transformed using a sigmoid function (Figure 6), with a dampening

coefficient that can be adjusted to successfully avoid any problems. This cautionary

safeguard has the cost of increasing the number of iterations, but it results in a more

robust algorithm, with still a fast speed of convergence for most data. It should be

clear, however, that convergence can not be guaranteed for any arbitrary dataset, as

Capítulo 3

no sound continuous linear model can be expected to fit a data series resulting from a

process with intrinsic discontinuities.

Figure 6. Example of sigmoid function used to correct for possible jumps during the

iteration process.

Depending on the data and the number and positions of the starting points, two

consecutive hinges might get close enough to consider that they should be identified,

and this is what the algorithm does when a proximity threshold is crossed, which can

also be automatically defined in terms of the minimum number of different data points

in a segment for the least square adjustment be considered sound. Thus, the algorithm

can automatically correct to a certain extent an excess of hinges in the initial set, and

hence results in models that are not of much higher complexity than that suggested by

the data.

Initial Selection of the Change-Points

Different strategies can be applied to select the initial set of change-points, according

to the type of data and the particular features of interest, although the iterative step

leads from many distinct reasonable elections of the starting points to the same final

convergent solution, as will be discussed in the next section.

For the data in Figure 1, where the pattern is essentially a sequence of periods with

alternate growing and decay behavior, a plausible election would be those points

where the mean slope of the curves changes in sign, i.e., where it passes trough zero.

Figure 7. Up: Local slopes for the NDVI data series in Figure 3, showing their medians

and the gaussian kernel used to filter them. Down: Smoothed curve of the medians. The

values of the abscissas corresponding to zeros of the filtered medians are displayed,

and the change-points of the final solution indicated by vertical lines.

In Figure 7 (up), the cloud of slopes for 120 similar curves is displayed, showing their

medians as robust estimations of the slopes. Although there are many points where

the curve of the medians changes in sign, in the three sections marked in the graph,

which correspond to the segments of a final convergent fitted model, the second one

consists essentially in positive values, whereas in the other two the values of the

medians are mostly negative. Considering all the zeros in this curve as initial change-

Capítulo 3

points would produce a model of very high complexity, despite the limited reduction in

the number of hinges that the iteration step of the algorithm is capable of perform. A

more reasonable election is obtained filtering the medians using a gaussian kernel

filter, as the one shown in the same figure, and working with the curve of the

smoothed medians (Figure 7, down). The zeros of the filtered median give values close

to the final iterated solution obtained from a subjective selection of the starting points,

and very similar results are obtained if the data are averaged and the smoothed slope

of the mean curve is used instead.

Consider, however, the NDVI data presented in Figure 8, corresponding to rice crops,

also in the Valencia region. Here the dynamics of the vegetation is more complex, as

besides the clearly defined evolution of the crop, there are also other periods that

correspond to phases of harvesting, growing of natural vegetation and preparation of

the fields for the new season. All the pixels show similar and synchronized behaviors,

since they are subjected to the same labors at specific moments in time, and so the

curves are much more better defined than those in Figure 1, allowing for a more

detailed description than just the incresing/decreasing pattern.

Figure 8. NDVI time series values ($\times 10000$) for an area of rice crops in

southeast Spain. Time values (abscissas) are number of days, starting from

01/01/2000.

In Figure 9, a simple model that represents fairly well the periods of growth and decay

of the crops is fitted to the data. In Figure 10, a different model of higher level of

complexity, in terms of the number of change-points considered, is presented. This last

model reflects better than the previous one the transitions between the periods of

growth and decay, as well as the phases between consecutive cropping seasons. Both

models are final equilibrium solutions of the iterative algorithm for different elections

of the number and positions of the initial change-points. The rationale for deciding

between these two models depends on the kind of features that we are interested in,

either the basic characterization of the crop dynamics, as in Figure 9, or a more

detailed description of the whole vegetation dynamics as in Figure 10. Hence, this

decision must be set by the analyst according to the objectives of the study, although it

can be incorporated into the algorithm, either in an explicit or implicit way, trough the

the method used for the selection of the initial points and with the setting of the

different parameters modulating the outcome of the algorithm, as window sizes of the

smoothing filters or thresholds levels.

Figure 9. NDVI time series values from Figure 8, with a continuous piecewise linear

model fitted to the data. Note the variations in the slopes of contiguous segments,

which do not restrict to a positive/negative sequence.

Capítulo 3

Figure 10. NDVI time series values from Figure 8, with a continuous piecewise linear

models of higher complexity than that of Figure 9 fitted to the data.

In any case, in this more general context the previously discussed method relying on

the change in sign of the slopes is clearly inappropriate, and the selection of the

starting points could instead be based on the distribution of the curvatures (Figure 11).

In Figure 11 (up), the cloud of curvatures for the data in Figure 8 is shown, where it has

been obtained using a moving window of suitable amplitude, to compute the

curvatures for each curve and abscissa fitting a second degree polynomial to the points

inside the window. The medians of the curvatures have been smoothed with a

gaussian kernel, as shown in Figure 11 (down), and the zeros of the derivative of the

filtered curvature function, corresponding to the most extreme values above some

threshold, have been selected (Figure 12).

Figure 11. Up: Local curvatures for the data in Figure 8. Down: Medians of the

distribution of local curvatures (dots), and continuous smoothed median after filtering

with a gaussian kernel.

Capítulo 3

Figure 12. Positions of the ten most extreme values of the local curvatures for the NDVI

data in Figure 8.

For equally spaced data, curvatures can be computed in an very efficient way using a

Savitzki-Golay type method27 to adjust the quadratic polynomials for each position of

the moving window. In case that for the specific data quadratic polynomials were not

flexible enough to detect the zones of interest, polynomials of higher degree could be

used, and the computational effort would be comparable if a Savitzki-Golay strategy

could be employed, i.e., if abscissas were uniformly spaced.

A robust and computationally efficient alternative, to avoid computing curvatures

through fitting of second or higher degree polynomials, is to consider the angles

between lines adjusted to contiguous sets of points (Figure 13). Using a double-sized

moving window, two lines are fitted to the points at the left and the right of each

abscissa, and the angles between these lines are computed (Figure 13 up). Then, they

can be post-processed as in the previous method (Figure 13 down), and the most

extreme values selected with some threshold criteria (Figure 14).

Figure 13. Up: Local angles for the data in Figure 8. Down: Medians of the distribution

of local angles (dots), and continuous smoothed median after filtering with a gaussian

kernel.

Capítulo 3

Figure 14. Positions of the ten most extreme values of the local angles for the NDVI

data in Figure 8.

Sensitivity to initial conditions

Although a variety of methods can be used to determine the starting set of hinges, we

note that many different selections of the initial points lead to the same final result,

which is one of a very restricted set, that of the fixed points for the Newton-like

iteration algorithm. To illustrate this behavior, for the data presented in Figure 3,

where two change-points seem to provide a sound model, an exhaustive search on the

initial positions of the two change-points was performed.

For each position of the two change-points, when a simple piecewise linear model is

fitted, i.e., when each segment is optimally fitted to their data by ordinary least

squares without requiring the continuity condition, a measure of the magnitude of the

jumps across contiguous segments gives an idea of the regions that can sustain a

continuous model (Figure 15), the zeros of this function being the points sought by the

algorithm. If we apply the iterative algorithm from any pair of these initial points, a

convergent solution is eventually reached, and the set of the positions of the change-

points in the possible final solutions is shown in Figure 16. The regions defined by the

set of initial points that result in the same final solution are presented in Figure 17. The

larger central region in this figure, comprising more than half of the possible elections

for the initial change-points, result in the solution shown in Figure 4, which intuitively

provides a sound model for the data. In fact, this is also the two change-points model

with the minimum global error of fit (Figure 18).

Figure 15. Index of discontinuity (root mean squares of jumps between contiguous

segments) as a function of the positions of the two change-points when fitting a

piecewise linear model to the data in Figure 3.

Capítulo 3

Figure 16. Abscissas of the two change-points in the set of the final iterated solutions

for the data in Figure 3, resulting from an exhaustive search of the initial positions for

the change-points.

Figure 17. For the data in Figure 3, regions of initial positions for the change-points

that result in the same final solution.

Figure 18. Global errors of fit for the models corresponding to the different final

solutions in Figure 17. The values displayed correspond to the models with the two

lowest global errors.

A different question that can be raised is the sensitivity of the algorithm to small

variations in the original data. In real applications, data are measured with a certain

degree of error, and any feature extraction algorithm should not give much different

outcomes for close data inputs. To explore the robustness of the algorithm to random

perturbations of the data, we selected a section of two curves from data in Figure 8

and added gaussian noise to the positions of the data points. The final positions of the

change-points given by the algorithm were consistent, determining models exhibiting

similar behaviors (Figure 19).

Capítulo 3

Figure 19. Sensitivity of the final positions of the hinges to random perturbation of the

data. Two sections of data series from Figure 8 were perturbed, adding independent

gaussian noise to the x and y coordinates of the points, with standard deviations as

indicated in the figure, producing 50 replications. For each cluster of the final change-

points, elipsoids determined by three standard deviations of their distributions are

shown.

DISCUSSION

The algorithms presented in this work provide a computationally efficient method to

fit continuous piecewise linear models to data series when the number of points is

high and many change-points have to be considered, and can be an alternative to

methods based on exhaustive or grid searches aimed at minimizing some global risk

function.

Our objective in fitting these kinds of models is to extract the main features, in terms

of different growth regimes, present in the data, and in this context it is clear that

some kind of a priori information, either explicit or implicit, has to be used to define

what a trait of interest is. Consider, for instance, the data presented in Figure 8. As

shown in Figure 9 and Figure 10, models of different levels of complexity can be fitted,

depending wether the interest lies essentially in the succession of grothw/decline

seasons or a more detailed description is sought.

Our primary envisaged application for these type of models is in the analysis of remote

sensing vegetation data, as exemplified along the paper, although there are many

other fields where continuous piecewise linear models are sound models and can

provide basic description of the patterns of growth exhibited by the data, and where

efficient algorithms are needed to cope with high volumes of data. However, it should

be kept in mind that no algorithm for continuous piecewise regression can be

successful when the data does not reflect the continuity properties needed in these

models (Figure 20).

Figure 20. Example of simulated data produced by a discontinuous model.

There are several options and parameters in the algorithms that can be set to fine tune

the method, and obtain the type of model more adequate for the data in

consideration. Besides the different options for the selection of the initial points, the

size of the moving windows used to compute local curvatures or angles, the shape of

the smoothing kernels and the values of the thresholds to select the most extreme

values determine the number and positions of the initial change-points. Nevertheless,

any sound choice for these parameters would lead to very similar sets of starting

points, as exemplified by comparing Figure 12 and Figure 14. Moreover, as discussed in

the previous section, the final set of change-points are the points of equilibrium of the

iteration process, and so there is no need for an intensive effort to optimally

Capítulo 3

determine the positions of the initial points, as most of them will usually lead to the

same final solution.

Finally, it should be clear that the method can be run in a completely automated way.

The choice between different models with the same number of change-points can be

based on the global error of fit, while some suitable model selection criteria28 taking

into account the number of parameters of the model, such as AIC29 or BIC30, can be

employed to select between models with different number of change-points.

ACKNOWLEDGEMENTS

This work has been partly funded by grants from Valencia Regional Government

(GVPRE/2008/310, Conselleria de Educación, Generalitat Valenciana) and University of

Alicante (VIGROB-162).

REFERENCES

[1] M.H. Ismail and K. Jusoff, Estimating Forest Area using Remote Sensing and

Regression Estimator, WSEAS Trans. Signal Processing 3(1), 2007, pp. 88--94.

[2] H. Diofantos, T. Leonidas, H. Marinos, V. Photos, P. Papadopoulos, C. Chris and L.

Athina, Satellite Remote Sensing for Water Quality Assessment and Monitoring -

an Overview on Current Concepts, Deficits and Future Tasks, WSEAS Trans. Signal

Processing 3(1), 2007, pp. 67--73.

[3] A. Kulkarni and S. Mccaslin, Knowledge Discovery from Satellite Images, WSEAS

Trans. Signal Processing 2(11), 2006, pp. 1523--1530.

[4] V. Barrile and G. Bilotta, Object-oriented analysis applied to high resolution

satellite data, WSEAS Trans. Signal Processing 4(3), 2008, pp. 68--75.

[5] D. Hadjimitsis, I. Evangelou, A. Retalis, A. Lazakidou and C. Clayton, Classification

of Satellite Images for Land-Cover Changes using an Unsupervised Neural

Network Algorithm, WSEAS Trans. Signal Processing 1(2), 2005, pp. 155--162.

[6] L. Bruzzone, P.C. Smits and J.C. Tilton, Foreword special issue on analysis of

multitemporal remote sensing images, IEEE Trans. Geosc. Rem. Sens. 41, 2003,

pp. 2419--2420.

[7] V.V. Salomonson, W.L. Barnes, W.P. Maymon, H. Montgomery and H. Ostrow,

MODIS: Advanced facility instrument for studies of the Earth as a system, IEEE

Trans. Geosc. Rem. Sens. 35, 1989, pp. 145--153.

[8] V.E. McGee and W.T. Carleton, Piecewise Regression, J. Amer. Statist. Assoc. 65,

1970, pp. 1109--1124.

[9] H.P. Piepho and J.O. Ogutu, Inference for the Break Point in Segmented

Regression with Application to Longitudinal Data, Biom. J. 5, 2003, pp. 591--601.

[10] J. Bai and P. Perron, Estimating and testing linear models with multiple structural

changes, Econometrica 66, 1998, pp. 47--78.

[11] D.V. Hinkley, Inference about the intersection in two-phase regression,

Biometrika 56, 1969, pp. 495--504.

[12] D.V. Hinkley, Inference in Two-Phase Regression, J. Amer. Statist. Assoc. 66,

1971, pp. 736--743.

[13] G.A.F. Sebert and C.J. Wild, Nonlinear Regression, Wiley, New York 1989.

[14] D.M. Hawkins, Point estimation of the parameters of piecewise regression

models, Appl. Statist. 25, 1976, pp. 51--57.

[15] D.J. Hudson, Fitting segmented curves whose join points have to be estimated, J.

Amer. Statist. Assoc. 61, 1966, pp.~1097--1129.

[16] P.M. Lerman, Fitting Segmented Regression Models by Grid Search, Appl. Statist.

29, 1980, pp. 77--84.

[17] T.-S. Lee, Estimating coefficients of two-phase linear regression model with

autocorrelated errors, Statist. Probab. Lett. 18, 1993, pp. 113--120.

[18] A. Ciampi, Generalized regression trees, Comput. Statist. Data Anal. 12, 1991, pp.

57--78.

[19] I.H. Witten and E. Frank, Data Mining, Elsevier, San Francisco 2005.

[20] J.R. Quinlan, in Proceedings AI'92, p. 343, World Scientific, Singapore 1992.

[21] A. Tishler and I. Zang, A New Maximum Likelihood Algorithm for Piecewise

Regression, J. Amer. Statist. Assoc. 76, 1981, pp. 980--987.

Capítulo 3

[22] G. Chiu, R. Lockhart and R. Routledge, Bent-Cable Regression Theory and

Applications, J. Amer. Statist. Assoc. 101, 2006, pp. 542--553.

[23] B.P. Carlin, A.~E. Gelfand and A.F.M. Smith, Hierarchical Bayesian analysis of

change point problems, Appl. Statist. 41, 1992, pp. 389--405.

[24] J.-R. Yu, G.-H. Tzeng and H.-L. Li, General fuzzy piecewise regression analysis with

automatic change-point detection, Fuzzy Sets and Systems 119, 2001, p. 247--

[25] J.D. Toms and M.L. Lesperance, Piecewise regression: A tool for identifying

ecological thresholds, Ecology 84, 2003, pp. 2034--2041.

[26] B. Yu, M.J. Barrett, H.-J. Kim and E.J. Feuer, Estimating joinpoints in continuous

time scale for multiple change-point models, Comput. Statist. Data Anal. 51,

2007, pp. 2420--2427.

[27] A. Savitzky and M.J.E. Golay, Smoothing and Differentiation of Data by Simplified

Least Squares Procedures, Anal. Chem. 36, 1964, pp. 1627--1639.

[28] K.P. Burham and D.R. Anderson, Model Selection and Inference. A Practical

Information-Theoretic Approach}, Springer-Verlag, New York, 1998.

[29] H. Akaike, Information theory and an extension of the maximum likelihood

principle, in: B.N. Petrov and F. Csáki (eds.), 2nd International Symposium on

Information Theory, Akadémia Kiado, Budapest, 1973, pp. 267--281.

[30] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6, 1978, pp. 461–

Capítulo 4:

Análisis tiempo-frecuencia de series de datos de

índices de vegetación mediante componentes cuasi-

periódicos

OBJETIVOS

En este capítulo se presenta un modelo para series de datos que incluyen

componentes seculares y componentes cíclicas no constantes, denominadas

componentes cuasi-periódicas. El objetivo es obtener estimaciones más ajustadas a la

realidad que las proporcionadas por los modelos de análisis espectral con

componentes periódicas constantes, permitiendo con ello el análisis de las relaciones

entre las variaciones en los parámetros que definen las variaciones estacionales y las

covariables o factores ecológicos de interés.

RESUMEN

Las series de datos temporales de NDVI muestran usualmente comportamientos

cíclicos debidos a las características fenológicas de la vegetación. Se han utilizado de

forma efectiva diferentes formas de análisis de Fourier para describir datos de NDVI

procedentes de teledetección, que permiten el ajuste simultáneo de componentes

seculares y cíclicos. No obstate, pueden existir variaciones en las frecuencias y/o en las

fases de los componentes cíclicos, que no son recogidas en los análisis espectrales

típicos. En el trabajo presentado en este capítulo, se analizan series de datos

temporales de NDVI procedentes de MODIS considerando components seculars y

cuasi-periódicos, que se ajustan a los datos mediante un algoritmo de suavizado

basado en análisis espectral de tipo tiempo-frecuencia. El algoritmo funciona

especialmente en el caso de datos equiespaciados, permitiendo el análisis de

variaciones temporales en las frecuencias y fases de los componentes cíclicos. Se

presentan ejemplos de aplicaciones en diferentes tipos de vegetación y condiciones

em la Comunidad Valenciana.

Análisis tiempo-frecuencia con componentes cuasi-periódicos

INTRODUCTION

The description and analysis of vegetation dynamics at regional and global scales

constitutes a topic of major importance to understand the functioning of terrestrial

ecosystems and their interactions with factors such as climate change1 or human

driven natural ecosystems degradation2. Remote sensing has been used for the past

three decades to extract phenological characteristic by analyzing time series data of

vegetation indices such as the normalized difference vegetation index (NDVI), derived

from several space-borne sensors such as AVHRR, Landsat or MODIS3-6. Its capacity to

properly characterize the phenology of the vegetation has been assessed with ground

data7.

The use of remotely sensed time series of vegetation indices has produced global

descriptions of land cover dynamics8, and has provided a valuable tool for landscape

analysis, as in the classification of vegetation types9, vegetation ecology, as in the

analysis of the relations between vegetation dynamics and climatic factors10,11, and the

study of vegetation recovery after disturbances such as wildfires12,13.

Different methods have been used to analyzed NDVI time series data, to reduce the

noise present in the original data by applying some smoothing method, extract

phenological metrics that describe the phenological traits of the vegetation, or to best

characterize the temporal behavior of the NDVI data corresponding to the underlying

vegetation dynamics14-17. Seasonal characteristics of the vegetation produce

oscillations with approximate annual and semiannual periods in NDVI data, and hence

different types of harmonic, Fourier or spectral analysis have been effectively used to

extract the cyclic components of NDVI time series18-22.

In Mediterranean landscapes, as in the Valencia region in Southeast Spain, it is normal

to find a high interspersion of different types of vegetations, even at reduced scales,

showing a wide variability in their phenological dynamics (Fig. 1). Also, high regional

and local variations in climatic factors, including the presence of droughts or extended

dry seasons, may produce alterations in the typical seasonal oscillations, that may

affect not only their amplitudes but also the onset and duration of the growing season.

The objective of this work was to devise, and efficiently implement, a method to

analyze MODIS derived NDVI biweekly time series data that explicitly consider the

presence of variations in the parameters defining the cyclic components which model

Capítulo 4

the oscillations of the data, to account for variations in the data that are not the result

of inter-years trends, and hence facilitate the analysis of the relations between the

variations in the seasonal characteristics of the vegetation and different environmental

factors.

Figure 1. Cluster analysis of the annual dynamics of the vegetation, as derived

from similarities between NDVI time series, for an area in the Valencia region.

Map of dynamics classes (left) and characteristic NDVI annual curves for the

six classes considered (right, colors refer to the corresponding class in the

METHODS AND RESULTS

NDVI time series usually exhibit behaviors that may result from the combination of

trends, only apparent when several years are analyzed, and intra-year oscillations with

a certain amount of variability in their amplitudes and phases. To better study this type

of temporal variations, a specific model will be proposed, and then appropriate

algorithms to efficiently analyze high volumes of data will be developed.

Formulation of the model with quasi-periodic components

Consider as a model for NDVI time series a time dependent function ( )Y t expressed

in the form

( ) ( ) ( ) sin( ( ) ( )) ( ),n

k k kk

Y t s t A t t t t tω θ ε=

= + ⋅ ⋅ − +∑ (1)

that is, the superposition of a slowly varying function ( )s t , what will be called trend or

secular term, and a sum of almost periodic functions, corresponding to the terms

( ) sin( ( ) ( ))k k kA t t t tω θ⋅ ⋅ − , which will also be called cyclic, periodic, oscillating or

sinusoidal components.

Here, ( )k tω and ( )k tθ are almost constant functions, which will be called,

respectively, the quasi-frequencies and quasi-phases of the periodic signal, or simply

frequencies and phases if there is no confusion. Frequencies ( )k tω are ordered from

lower to higher values, so that, for NDVI data from areas including vegetation types

with usual annual and semiannual phenological cycles, when time is measured in years

the first two frequencies will be 1( ) 2tω π≅ and 2( ) 4tω π≅ , and higher

frequencies can also be considered to better fit some data.

The functions ( )kA t are the amplitudes of the corresponding cyclic components, and

it will be assumed that they vary much more slowly than the mean value of the lower

frequency 1( )tω , a condition that will also be assumed to be satisfied by the functions

( )k tω , ( )k tθ , and ( )s t .

The model includes the error term ( )tε , incorporating measurement errors, random

variations, or any other source of deviations from the deterministic part of the model,

which is the model to be fitted to the data. Once a model is fitted, it can be used to

replace the original data values for different purposes, producing a smoothing of the

NDVI time series.

Consider the NDVI time series shown in Figure 2, corresponding to four years data for

an individual pixel. A classical linear harmonic analysis with polynomial trend, i.e., a

model of the type

( ) sin(2 / ) ( ),g n

jj k k

H t c t a k t T tπ φ ε= =

= ⋅ + ⋅ ⋅ ⋅ − +∑ ∑ (2)

Capítulo 4

is not able to capture variations in frequencies or phases, so that the shape of the

estimated cyclic component is necessarily constant, as shown in Figure 2, where a

model with two periodic terms, corresponding to the annual and semiannual

frequencies, has been fitted (Fig. 2, left), and the resulting cyclic component extracted

(Fig. 2, right).

Figure 2. Left: NDVI (x104) time series (time in days from January 2000) for an

area in Millares (Valencia province) and a model with a quadratic trend and a

two harmonic terms fitted to the data. A linear trend is also shown for

reference. Right: Cyclic component resulting from adding the two harmonic

terms of 1-year and ½-year periods.

Fitting a model with quasi-periodic components produces the results shown in Figure

3, where, as in the previous model (Figure 2), two basic frequencies have been

considered in the fitted model (Fig. 3, left), resulting in a cyclic component exhibiting

variations similar to those present in the data (Fig. 3, right).

Figure 3. NDVI (x104) data of Figure 2 and a fitted model with a quadratic

trend and two quasi-periodic components (left). The cyclic part of the fitted

model is shown on the right subfigure.

Figure 4. Harmonic terms with approximate annual (left) and semiannual

(right) periods corresponding to the fitted model presented in Figure 3.

The individual quasi-periodic components of the cyclic part of the fitted model (Fig. 3,

right) are shown in Figure 4, where clear variations in both annual (Fig. 4, left) and

semiannual (Fig. 4, right) oscillations are present. These variations can be decomposed

in changes in the amplitudes of the quasi-periodic components (Fig. 5, left), and

changes in their corresponding phases (Fig. 5, right).

Capítulo 4

Figure 5. For the model fitted in Figure 3, with the two cyclic components

shown in Figure 4, amplitudes (left) and phases (right) of the approximate

annual (blue) and semiannual (green) quasi-periodic terms. Phases are

periodic, and the origin of phases in the right subfigure is arbitrary.

The root mean square errors (RMS) of the fitted models, 325.7 for the classical

harmonic and 270.9 for the quasi-periodic, showed, as expected, better results for the

more flexible model. Also, since the variation in the cyclic part may affect the

estimation of the trend, a stronger quadratic effect in the secular component was

present in the classical harmonic model.

To better compare the fitting strengths of the models considered above, a model with

only one annual quasi-periodic component was fitted to the data, obtaining still a good

adjustment, with an RMS of 292.0 (Fig. 6). As will be explained next, the greater

flexibility and fitting capacity of the method can be achieved with a computational cost

not much higher than the classical harmonic analysis.

Figure 6. Comparison between the fitted model of Figure 3 (top), which

included two variable harmonic components, and a model fitted to the same

data with only one annual quasi-periodic term (bottom).

Basic algorithm

The basis of the method is the use of a time-frequency, or local Fourier, analysis. First,

to extract the cyclic component, for each data point ( )Y t a local Fourier analysis is

carried out in a window centered in t . A few periodic components with frequencies

multiple of a basic frequency 1ω are considered, the size of the moving window being

one or two periods of the basic wave with frequency 1ω . The Fourier components,

with frequencies 1ω , 12ω , and so on, are obtained using appropriate envelopes for

spectral analysis, but any other method to obtain the spectral components, like the

FFT, can also be used. Then, the cyclic part of the signal ( )Y t is estimated, as

Capítulo 4

represented in Figure 7, and only the values of these cyclic components at time t are

to be saved (red dots in Fig. 7, bottom).

Figure 7. Top: Data included in the window around the point to be analyzed

(red dot, abscissa marked by a vertical line). Bottom: First two harmonic terms

in the local Fourier analysis and their values at the abscissa of the point shown

in the left subfigure (red dots). Composite wave (red line) is shown for

comparison with the raw data in the left subfigure.

As a result of the first part of the algorithm, quasi-periodic functions ( )kV t , with

approximate frequencies multiple of 1ω , are obtained. Next, a basis of functions

including the ( )kV t is completed with an appropriate basis of functions for the secular

component, e.g., powers of t if a polynomial trend is to be fitted, obtaining a matrix of

basis functions

cyclic basis secular basis

... ... ... ... ... ... ...

( ) ( ) ... 1 ...

... ... ... ... ... ... ...

V t V t t t

X. (3)

Finally, a simple linear model Y b= ⋅X is fitted to the data vector Y by ordinary

least squares. For sound fitting models, i.e., if the selected frequencies are appropriate

for the data being analyzed, the coefficients for the cyclic terms in this linear model are

close to unity. In any case, the final cyclic components are those obtained in this last

step, with only differences in amplitudes, if any, with the previously obtained ( )kV t .

An efficient algorithm for equispaced data

The first step of the basic algorithm described above is in general computationally very

costly, since a Fourier analysis has to be carried out for each data point, including all

surrounding points inside the moving window, resulting in an algorithm of complexity

( )O N W× , where N is the number of data points and W the number of points in

the window. However, for equispaced data, as is the case for nominal dates in NDVI

biweekly time series, an efficient algorithm, of complexity ( )O N , can be devised.

Let ( )Y t be measured at the equispaced points { }0 0 0, , 2 , ...t t t h t h∈ + + . For each

frequency kω , create the unitary complex vector

( ) ( ) ( )0 0 0exp ( ) , exp ( ) , exp ( 2 ) , ... ,k k ki t i t h i t hτ ω ω ω= − ⋅ − ⋅ + − ⋅ +

and compute the pointwise product

[ ]0 0 1 0 2 0( ), ( ), ( 2 ), ... .Y Y t Y t h Y t hτ τ τ τ⋅ = ⋅ ⋅ + ⋅ +

Then, apply a moving window average to Yτ ⋅ , with an appropriate window size and

repeat an adequate number of times. As a result, a smoothed vector ( )k w Yα τ= ⋅ is

obtained, where the function w denotes the smoothing operator on the points inside

the window. Finally, the values of the cyclic component associated with the frequency

kω are obtained from the real part of the pointwise product of kα and τ ,

Capítulo 4

( )( ) 2 Re .k kV t α τ= ⋅ ⋅

Also, the variable amplitudes and phases of the quasi-periodic component are

obtained, respectively, as the absolute value of 2 kα⋅ and the argument of kα . In this

algorithm, the key point is the application of the moving average, and for equispaced

data it can be computed in ( )O N time. Thus, the algorithm can be efficiently applied

to large sets of data (Fig. 8).

Figure 8. Average values (x104) of the amplitudes of the annual (left) and

semiannual (right) quasi-periodic components for models fitted to NDVI time

series in an area comprising most of the Valencia province. The area (center-

right) with very high values of both amplitudes corresponds to a zone of rice

crops.

CONCLUSIONS

NDVI time series data, as exemplified by maps of annual dynamics in the Valencia

region (Fig. 1), show complex behaviors, including oscillations that may not be properly

analyzed by using classical spectral analysis with constant values of the parameters

defining the fitted models.

A general model, including secular and cyclic components, was proposed to take into

account temporal variations in the oscillations of NDVI time series, by considering time

dependent parameters defining the characteristics of the cyclic part of the models.

Thus, frequencies, phases and amplitudes of the periodic functions included in the

model were allowed to slowly vary in time, producing quasi-periodic components that

better represented the behavior of the real data.

Models with quasi-periodic components were fitted to some examples of NDVI time

series data using a time-frequency analysis approach, where in order to extract the

cyclic component of the model a local Fourier analysis has to be carried out for each

point in the data series and including all points in a moving window of appropriate size.

The type of time-frequency analysis needed to fit models with quasi-periodic

components would imply in general a high computational cost. For equispaced time

series data, as when nominal dates are consider for biweekly NDVI data, an efficient

algorithm was proposed, facilitating the application of the method to large volumes of

The values of the parameters defining the quasi-periodic components, and their

variations in time, could provide a useful tool to investigate relations with different

types of vegetation, external factors such as climate or land use changes, as well as

their interactions. Work in progress includes the application of the method in different

types of vegetations and conditions in extensive areas of the Valencia region, to assess

the potential of the method to help analyze different problems in landscape analysis

and vegetation ecology.

ACKNOWLEDGEMENTS

01), funded by the Spanish Ministry of Innovation and Science, and GVPRE/2008/310,

funded by the Valencia Regional Government (Generalitat Valenciana).

REFERENCES

[1] Cleland, E. E., Chuine, I., Menzel, A., Mooney, H. A. and Schwartz, M. D.,

“Shifting plant phenology in response to global change,” Trends in Ecology &

Evolution 22, 357-365 (2007).

[2] Galford, G.L., Mustard, J.F., Melillo, J., Gendrin, A., Cerri, C.C. and Cerri, C.E.P.,

“Wavelet analysis of MODIS time series to detect expansion and intensification

of row-crop agriculture in Brazil,” Remote Sensing of the Environment 112, 576-

587 (2008).

Capítulo 4

[3] Moulin, S., Kergoat, L., Viovy, N. and Dedieu, G., “Global-scale assessment of

vegetation phenology using NOAA/AVHRR satellite measurements,” Journal of

Climate 10, 1154-1170 (1997).

[4] Zhang, X., Friedl, M.A., Schaaf, C.B., Strahler, A.H., Hodges, J.C.F., Gao, F., Reed,

B.C. and Huete, A., “Monitoring vegetation phenology using MODIS,” Remote

Sensing of the Environment 84, 471-475 (2003).

[5] Fontana1, F., Rixen, C., Jonas, T., Aberegg1, G. and Wunderle1, S., “Alpine

grassland phenology as seen in AVHRR, VEGETATION, and MODIS NDVI time

series - a comparison with in situ measurements,” Sensors 8, 2833-2853 (2008).

[6] Fisher, J. I. and Mustard, J. F., “Cross-scalar satellite phenology from ground,

Landsat, and MODIS data,” Remote Sensing of the Environment 109, 261-273

(2007).

[7] Hmimina, G., Dufrêne, E., Pontailler, J.-Y., Delpierre, N., Aubinet M., Caquet, B.,

de Grandcourt, A., Burban, B., Flechard, C., Granier, A., Gross, P., Heinesch, B.,

Longdoz, B., Moureaux, C., Ourcival, J.-M., Rambal, S., Saint André, L. and

Soudani, K., “Evaluation of the potential of MODIS satellite data to predict

vegetation phenology in different biomes: An investigation using ground-based

NDVI measurements,” Remote Sensing of Environment 132, 145-158 (2013).

[8] Ganguly, S., Friedl, M. A., Tan, B., Zhang, X. and Verma, M. “Land surface

phenology from MODIS: Characterization of the Collection 5 global land cover

dynamics product,” Remote Sensing of Environment 114, 1805-1816 (2010).

[9] Geerken, R., Zaitchik, B. and Evans, J. P., “Classifying rangeland vegetation type

and fractional cover of semi-arid and arid vegetation covers from NDVI time-

series,” International Journal of Remote Sensing 26, 5535-5554 (2005).

[10] Li, Z. and Kafatos, M., “Interannual variability of vegetation in the United States

and its relation to El Niño/Southern Oscillation,” Remote Sensing of Environment

71, 239-247 (2000).

[11] Sarkar, S. and Kafatos, M., “Interannual variability of vegetation over the Indian

sub-continent and its relation to the different meteorological parameters,”

Remote Sensing of Environment 90, 268-280 (2004).

[12] van Leeuwen, W. J. D., “Monitoring the effects of forest restoration treatments

on post-fire vegetation recovery with MODIS multitemporal data,” Sensors 8,

2017-2042 (2008).

[13] van Leeuwen, W.J.D., Casady, G., Neary, D., Bautista, S., Alloza, J.A., Carmel, Y.,

Wittenberg, L., Malkinson, D. and Orr, B., “Monitoring post-wildfire vegetation

response with remotely sensed time-series data in Spain, USA and Israel,”

International Journal of Wildland Fire 19, 75-93 (2010).

[14] Chen, J., Jönsson, P., Tamura, M., Gu, Z., Matsushita, B. and Eklundh, L., “A

simple method for reconstructing a high-quality NDVI time series data set based

on the Savitzky–Golay filter,” Remote Sensing of Environment 91, 332-344

(2004).

[15] Jönsson, P. and Eklundh, L., “Seasonality extraction and noise removal by

function fitting to time-series of satellite sensor data,” IEEE Transactions on

Geoscience and Remote Sensing 40, 1824-1832 (2002).

[16] Jönsson, P. and Eklundh, L., “TIMESAT– a program for analyzing time-series of

satellite sensor data,” Computers & Geosciences 30, 833-845 (2004).

[17] Bradley, B. A., Jacob, R. W., Hermance, J. F. and Mustard, J. F., “A curve fitting

procedure to derive inter-annual phenologies from time series of noisy satellite

NDVI data,” Remote Sensing of Environment 106, 137-145 (2007).

[18] Moody, A. and Johnson, D., “Land-surface phenologies from AVHRR using the

discrete Fourier transform,” Remote Sensing of Environment 75, 305-323 (2001).

[19] Jakubauskas, M. E., Legates, D. R. and Kastens, J. H., “Harmonic analysis of time

series AVHRR NDVI data,” Photogrammetric Engineering and Remote Sensing 67,

461-470 (2001).

[20] Wagenseil, H. and Samimi, C., “Assessing spatio-temporal variations in plant

phenology using Fourier analysis on ndvi time series: results from a dry savannah

environment in Namibia,” International Journal of Remote Sensing 27(16), 3455-

3471 (2006).

[21] Verbesselt, J., Hyndman, R., Newnham, G. and Culvenor, D., “Detecting trend and

seasonal changes in satellite image time series,” Remote Sensing of Environment

114, 106-115 (2010).

Capítulo 4

[22] Verbesselt, J., Hyndman, R., Zeileis, A. and Culvenor, D., “Phenological change

detection while accounting for abrupt and gradual trends in satellite image time

series,” Remote Sensing of Environment, 114, 2970-2980 (2010).

Capítulo 5:

Detección de áreas afectadas por incendios

forestales

OBJETIVOS

En este capítulo se presenta un método en dos fases para la detección de áreas

incendiadas, que puede ser aplicado de forma eficiente en zonas extensas. En la

primera parte del capítulo se explica el funcionamiento del método con un ejemplo

detallado y a continuación, en una segunda parte, se analizan sus propiedades

mediante su aplicación en una amplia zona de la Comunidad Valenciana y la

comparación entre las zonas incendiadas detectadas con el método y las registradas en

la base de datos de incendios de la Dirección general de Prevención, Extinción de

Incendios y Emergencias de la Generalitat Valenciana. El objetivo es disponer de un

método eficiente, que pueda ser utilizado de forma automática o semiautomática en

áreas amplias con distintos tipos de vegetación, de modo que se facilite el estudio de

los factores ambientales que afectan a la regeneración vegetal tras los incendios

forestales.

RESUMEN

En la primera parte de este capítulo se presenta un método en dos fases para la

detección automática de áreas quemadas a partir de series de datos quincenales de

índices de vegetación procedentes de MODIS. Para cada pixel en el area, se ajustan

modelos de diversa complejidad a las subseries de datos anterior y posterior al punto

considerado. Las discrepancias o saltos entre ambos modelos que exceden un cierto

umbral se utilizan para definir pixels semilla, a partir de los cuales se extienden y

definen los grupos de pixels correspondientes al area quemada potencial. Se construye

un algoritmo computacionalmente eficiente mediante una aproximación de tipo filtro

digital.

Detección de áreas quemadas

INTRODUCTION

Fire constitutes a key factor in the functioning and shaping of Mediterranean

ecosystems1, with fire regimes affected by anthropogenic and climatic factors2.

Accurate mappings of burnt areas as a result of wildfires are essential to analyze the

spatiotemporal distribution of wildfires and its relation with different environmental

factors, as well as to monitor the recovery of vegetation and the effect of restoration

treatments3-8.

Fire scars can be confidently mapped from visual comparison of pre- and post-fire high

resolution aerial photographs or satellite imagery of the zone that includes the

wildfire. Remote sensing from MODIS and other space-borne sensors has been

increasingly used in the last decades for automatic or semiautomatic detection of

active or past wildfires, usually from daily records of a suitable combination of

reflectance bands, allowing the analysis of wildfires distribution at regional and global

scales9-18.

Some methods to detect fire scars from remotely sensed time series data incorporate

two different phases, first detecting possible abrupt changes in the temporal data for

some pixels, usually by jointly considering several spectral indices, and then using

some algorithm for region growing to finally delimitate the perimeter of the fire

scar19,20. The objective of the present work was to test some simple two-phase

algorithms and variations for automatic or semiautomatic detection of burnt areas

from MODIS 250m biweekly NDVI time series data for the Valencia region, Southeast

Spain.

METHODS AND RESULTS

The first step for detecting fire scars from NDVI time series poses the general problem

of identifying change points or jumps in a noisy time series, corresponding to times,

pre- and post-fire, where the value of the NDVI drops as a consequence of vegetation

burning (Fig. 1). This apparently simple problem needs to be tackled taking into

account the high noise to signal ratio characteristic of NDVI series for individual pixels,

and in a computationally efficient way which would allow its practical application to a

very high number of pixels. Once this problem is solved, an initial set of pixels

Capítulo 5

exhibiting sufficiently high jumps can be obtained, acting as seeds for the second

phase of the algorithm, where the potential burnt area is delimited.

Figure 1. NDVI (x104) time series for an area in Bixquert Valley (Valencia

province) where a wildfire occurred in June 2005 (vertical line), and different

models fitted to pre- and post-fire data.

Change points detection in NDVI time series

Consider as a sample illustration the data shown in Fig. 1, corresponding to NDVI

values for an area in the Valencia region burned in 2005. Different models, such as

polynomials of different degrees, or models including constant or variable cyclic terms,

can be fitted to pre- and post-fire data, and discrepancies between the parameters of

these models can provide a measure of the magnitude of the jump, expectedly more

robust than just the change between the last pre-fire and the first post-fire values.

For an individual pixel in a region, where the presence of a wildfire is unknown, all the

points in the series are considered potential change points, and they are consecutively

analyzed outside some minimal border zones. Then, separate models can be fitted to

the data left and right to each potential change point, and discrepancies between

models computed. When the maximum value of these discrepancies exceeds some

appropriate threshold, a potential fire affecting that pixel, corresponding to the

extreme discrepancy date, is considered.

The application of this procedure to a very high number of pixels would be

computationally very costly, but an efficient algorithm can be devised for equispaced

data, as is the case for nominal dates in NDVI biweekly series, and for a wide family of

fitting models, including combinations of polynomials and trigonometric terms. In this

setting, discrepancies between parameters of the fitted models, or between their

predicted values, are linear functions of the data, and they can be computed with an

appropriate digital filter, as the convolution of the vector of data with a suitable vector

of weights, in a similar way to the classical Savitzky-Golay smoothing method21.

This process is related to wavelet analysis, and it is equivalent to it for certain fitting

models and particular wavelet base functions. Thus, for example, if a constant model is

fitted to n points at each side of the change point, the difference between both models

is obtained by applying a Haar wavelet with amplitude n+n (Fig. 2).

Figure 2. Haar filter output (green) and traveling Haar pulse (red, see text) for

the data in Figure 1.

The detection accuracy of the method using fitting models with polynomials of varying

degrees, with and without harmonic terms, was evaluated for the period 2000-2012 in

most of the Valencia region, where a record of wildfires with approximate perimeters

from field surveys was available. In most cases, best results were obtained with a

simple constant model of 1-year amplitude, reflecting the dominance of an annual

phenological cycle of the vegetation, more complex models being affected by the high

noise level of MODIS data at the individual pixel level.

Capítulo 5

In some areas where a clear biannual cycle is present, this simple approach would

produced false positives, but this effect can be corrected by considering both the

difference in 1-year NDVI averages and the difference between consecutive points, by

simply multiplying them, what we called a traveling Haar pulse (Fig. 2).

Burnt areas delimitation

The steps of the algorithm to map fire scars will be illustrated with its application to an

area in Simat de la Valldigna (Valencia province) affected by a wildfire in September

2000, close to a zone that was burnt in July 2005. For each pixel in the area, the value

of the detected maximum drop in the series of NDVI values is shown in Fig. 3, and the

map of nominal dates corresponding to those drops is presented in Fig. 4.

Figure 3. Maximum drops in NDVI (x104) for each pixel in an area near Simat

de la Valldigna (Valencia province), and approximate perimeters of September

2000 (center) and July 2005 (right-down corner) wildfires.

Figure 4. Dates for the maximum drops shown in Figure 3.

Figure 5. Smoothing of the maximum drops shown in Figure 3.

Capítulo 5

Next, in order to reduce the noise in the data, a smoothed version of the map of

maximum drops is computed (Fig. 5), using a Gaussian kernel with a short amplitude

(Fig. 6).

Figure 6. Kernel applied to the data in Figure 3 to obtain the smoothed map of

maximum drops shown in Figure 5.

From the map of smoothed maximum drops (Fig. 5), a few NDVI curves corresponding

to individual pixels in the proximity of the local maxima in the map, are selected (Fig.

7), as the curves with maximum real, non smoothed, change in NDVI values in a very

reduced neighborhood of the local maxima. Then, for each curve, data at both sides of

the putative change point are modeled using polynomial and harmonic terms, fitted to

the data by ordinary least squares. As a heuristic rule, derived from wide previous

experience, the degree of the polynomial in time is increased in one unit for each two

years of data in the partial data series.

Figure 7. Selected NDVI (x104) data series corresponding to pixels with

detected maximum drops in the proximities of the local maxima in the map of

smoothed maximum drops shown in Figure 5, and models fitted to the partial

series of pre- and post-change data.

Next, the selected curves are ranked by the magnitude of their maximum NDVI change

(Fig. 8), and only those with drops exceeding a certain threshold are retained. For the

set of spatial and temporal data analyzed, a value of 0.13 in the detected maximum

drop in NDVI was selected as the threshold. The selection of the threshold value can be

based on the distribution of detected maximum drops, or it may be derived from

cross-validation from a set of independently mapped fire scars. The location of the

four finally selected curves, on the contour map of maximum drops, is shown in Fig. 9.

Capítulo 5

Figure 8. Maximum drops in NDVI (x104) values for the set of curves in Figure 7

ranked by their magnitude. Drops in actual values (blue) and in the smoothed

series (green). A threshold with a value of 1300 is marked (horizontal line).

Figure 9. Contour plot for the map of smoothed maximum drops shown in

Figure 5, with the positions of the selected curves (1 to 4) presented in Figure 7

whose maximum changes in NDVI exceed the threshold shown in Figure 8.

The objective of the selection process is to obtain a small set of pixels that will act as

seeds in an extension clustering algorithm, to define spatially contiguous clusters

through some measure of similarity between each pixel in the region and those used

as seeds. A simple effective approach consists of clustering spatially contiguous pixels

with detected maximum drops in NDVI values corresponding to dates similar to those

of the seed curves (Fig.10).

Figure 10. Clusters of contiguous pixels defined by similarity of their dates of

maximum drops in NDVI with those of the pixels used as seeds (see Figure 9).

Pixels belonging to the same cluster defined by the date of the putative wildfire show,

as expected, similar behaviors in their temporal series, or curves, of NDVI values, with

a high correlation between the NDVI curve of each pixel and the curve of the seed pixel

used to obtain the cluster (Fig. 11).

Capítulo 5

Figure 11. Correlations between NDVI data series of pixels in each cluster

shown in Figure 10 and NDVI series of the pixel used as seed to define the

cluster. Different colors identify different clusters and magnitudes of the

correlations are given by color intensities (grey scale).

CONCLUSIONS

In the Mediterranean Valencia region, for the temporal period 2000-2012 analyzed,

burnt areas as a consequence of wildfires were efficiently detected from 250 m

resolution MODIS derived NDVI biweekly data series. A two-phase algorithm, first

identifying seed pixels with maximum NDVI drops above some threshold and then

clustering contiguous pixels with an extension algorithm provided good results.

Detection of abrupt changes in NDVI as a consequence of vegetation burning was best

achieved by considering jumps between 1-year averages for partial series before and

after each possible change-point, or by using the product of the jumps in 1-year

averages and the drops in NDVI values of two consecutive points. No improvement

was achieved by fitting more complex models to the partial series, as the high

variability of MODIS data for individual pixels degraded the accuracy of the method.

Delimitation of fire scars was based on propagating clusters from the seed pixels to

spatially connected pixels with similarities between their dates of detected maximum

drops, which would correspond to the date of the wildfire. Pixels in the same cluster

presented high correlation between their NDVI data series, and the maps of the

detected burnt areas showed a good agreement with the approximate perimeters

resulting from field surveys.

The algorithms of the method can be efficiently implemented, allowing its application

to large regions (Fig. 12). Work in progress include carrying out detailed comparisons

of the accuracy of the method with a curated database of wildfire perimeters and

other remote sensing fire mapping products and algorithms, exploring variations in the

automatic selection of thresholds in NDVI jumps to facilitate its use in regions with

different types of vegetation, and optimizing the modifications needed to best map

areas with fire recurrence.

Figure 12. Map of detected maximum drops in NDVI values for most part of

the Valencia province.

Capítulo 5

ACKNOWLEDGEMENTS

This work was supported by the research projects FEEDBACK (CGL2011-30515-C02-01),

funded by the Spanish Ministry of Innovation and Science, and GVPRE/2008/310,

funded by the Valencia Regional Government (Generalitat Valenciana).

REFERENCES

[1] Pausas, J. G. and Vallejo, V. R., “The role of fire in European Mediterranean

ecosystems,” in E. Chuvieco (Ed.), [Remote sensing of large wildfires in the

European Mediterranean basin], Springer-Verlag, 3-16 (1999).

[2] Pausas, J. G. and Fernandez-Muñoz, S., “Fire regime changes in the Western

Mediterranean Basin: From fuel-limited to drought-driven fire regime,” Climatic

Change 110, 215-226 (2012).

[3] Levin, N. and Heimowitz, A., “Mapping spatial and temporal patterns of

Mediterranean wildfires from MODIS,” Remote Sensing of Environment 126, 12-

26 (2012).

[4] Díaz-Delgado, R. and Pons, X., “Spatial patterns of forest fires in Catalonia (NE of

Spain) along the period 1975-1995. Analysis of vegetation recovery after fire,”

Forest Ecology and Management 147, 67-74 (2001).

[5] Wittenberg, L., Malkinson, D., Beeri, O., Halutzy, A. and Tesler, N., “Spatial and

temporal patterns of vegetation recovery following sequences of forest fire in a

Mediterranean landscape, Mt. Carmel Israel,” Catena 71, 76-83 (2007).

[6] Gouveia, C., DaCamara, C. C. and Trigo, R. M., “Post-fire vegetation recovery in

Portugal based on spot/vegetation data,” Natural Hazards and Earth Systems

Science 10, 673-684 (2010).

[7] van Leeuwen, W.J.D., Casady, G., Neary, D., Bautista, S., Alloza, J.A., Carmel, Y.,

Wittenberg, L., Malkinson, D. and Orr, B., “Monitoring post-wildfire vegetation

response with remotely sensed time-series data in Spain, USA and Israel,”

International Journal of Wildland Fire 19, 75-93 (2010).

[8] van Leeuwen, W. J. D., “Monitoring the effects of forest restoration treatments

on post-fire vegetation recovery with MODIS multitemporal data,” Sensors 8,

2017-2042 (2008).

[9] Barbosa, P. M., Grégoire, J. -M. and Pereira, J. M. C., “An algorithm for extracting

burned areas from time series of AVHRR GAC data applied at a continental

scale,” Remote Sensing of Environment 69, 253-263 (1999).

[10] Chuvieco, E., Ventura, G., Martín, M. P. and Gómez, I., “Assessment of

multitemporal compositing techniques of MODIS and AVHRR images for burned

land mapping,” Remote Sensing of Environment 94, 450-462 (2005).

[11] Chuvieco, E., Martín, M. P. and Palacios, A., “Assessment of different spectral

indices in the red—Near-infrared spectral domain for burned land

discrimination,” International Journal of Remote Sensing 23, 5103-5110 (2002).

[12] Fraser, R. H., Li, Z. and Cihlar, J., “Hotspot and NDVI differencing synergy

(HANDS): A new technique for burned area mapping,” Remote Sensing of

Environment 74, 362-376 (2000).

[13] Giglio, L., Descloitres, J., Justice, C. O. and Kaufman, Y. J., “An enhanced

contextual fire detection algorithm for MODIS,” Remote Sensing of Environment

87, 273-382 (2003).

[14] Giglio, L., van der Werf, G. R., Randerson, J. T., Collatz, G. J. and Kasibhatla, P.,

“Global estimation of burned area using MODIS active fire observations,”

Atmospheric Chemistry and Physics 6, 957-974 (2006).

[15] Roy, D. P., Jin, Y., Lewis, P. E. and Justice, C. O., “Prototyping a global algorithm

for systematic fire-affected area mapping using MODIS time series data,”

Remote Sensing of Environment 97, 137-162 (2005).

[16] Davies, D. K., Ilavajhala, S., Wong, M. M. and Justice, C. O., “Fire information for

resource management system: Archiving and distributing MODIS active fire

data,” IEEE Transactions on Geoscience and Remote Sensing 47, 72-79 (2009).

[17] Roy, D. P., Boschetti, L., Justice, C. O. and Ju, J., “The Collection 5 MODIS burned

area product—global evaluation by comparison with the MODIS active fire

product,” Remote Sensing of Environment 112, 3690-3707 (2008).

[18] Boschetti, L., Roy, D., Barbosa, P., Boca, R. and Justice, C., “A MODIS assessment

of the summer 2007 extent burned in Greece,” International Journal of Remote

Sensing 29, 2433-2436 (2008).

Capítulo 5

[19] Bastarrika, A., Chuvieco, E. and Martín, M.P., “Mapping burned areas from

Landsat TM/ETM+ data with a two-phase algorithm: Balancing omission and

commission errors,” Remote Sensing of Environment 115, 1003-1012 (2011).

[20] Stroppiana, D., Bordogna, G., Carrara, P., Boschetti, M., Boschetti, L. and Brivio,

P. A., “A method for extracting burned areas from Landsat TM/ETM+ images by

soft aggregation of multiple spectral indices and a region growing algorithm,”

ISPRS Journal of Photogrammetry and Remote Sensing 69, 88-102 (2012).

[21] Savitzky, A. and Golay, M.J.E., “Smoothing and differentiation of data by

simplified least squares procedures,” Analytical Chemistry 36, 1627-1639 (1964).

RESUMEN

En esta segunda parte del capítulo se aplica el método en dos fases para la detección

de áreas quemadas, presentado en la primera parte del capítulo, a datos de NDVI

procedentes de MODIS, con una resolución espacial de 250m y una resolución

temporal quincenal, en área extensa de la Comunidad Valenciana. Los resultados se

comparan con los registros de incendios disponibles en la base de datos de incendios

de la Dirección general de Prevención, Extinción de Incendios y Emergencias de la

Generalitat Valenciana. Los mapas de incendios detectados mostraron un alto acuerdo

con los perímetros registrados en la base de datos usada como referencia, como se

muestra con diversas medidas e indices de precision y porcentajes de errors, con un

comportamiento similar o mejor que los proporcionados por otros métodos

disponibles en la literatura, incluso de mayor complejidad y dificultad de aplicación en

zonas con gran extensión.

INTRODUCTION

Wildfires are common and natural disturbances in Mediterranean regions worldwide,

where they largely contribute to shape the structure and functioning of flammable

ecosystems [1]. Thus, many Mediterranean species have acquired adaptive

mechanisms to persist and regenerate after wildfires, facilitating the autosuccession of

the plant community and contributing to accelerate the recovery of the vegetation

cover [1, 2]. However, depending on fire attributes such as severity, frequency or size,

wildfires may promote critical changes in Mediterranean ecosystems and landscapes.

For example, in the Mediterranean Basin, frequent wildfires contribute to maintain the

dominance of shrubs in areas where the climate and soil conditions would be suitable

for a forest ecotype [3, 4, 5]. Severe and/or frequent wildfires may also promote

structural changes towards more fire-prone vegetation [6, 7]. Furthermore, the

synergistic effect of wildfire and post-fire extreme climatic events, such as droughts or

torrential rainfall, may result in long windows of disturbance, and challenge the

resilience of Mediterranean ecosystem [8, 9].

During the twentieth century, the fire regime in the Mediterranean Basin has shifted

to larger and more frequent wildfires [6, 10]. For example, in eastern Spain, there was

a major shift around the early 1970s, resulting in double annual fire frequency and four

times larger average fire size for the post-1970s period as compare with the previous

100 years. [11]. The main driving factor for this change was the rapid increase in fuel

amount and continuity due to the combined effect of agricultural land abandonment

and extensive reforestation programs [6, 11]. The rapid pace of these changes in fire

regimes explains the growing concern about the future spatiotemporal dynamics and

the associated ecological and socio-economic impacts of wildfires in the

Mediterranean Basin, particularly under a climate change context that could imply a

further increase in fire risk and frequency in the Mediterranean Basin [12, 13].

Accurate mappings of burnt areas as a result of wildfires are essential to analyze the

spatiotemporal distribution of wildfires and its relation with different environmental

factors, as well as to monitor the recovery of vegetation and the effect of management

and restoration treatments. Fire scars can be confidently mapped from visual

comparison of pre- and post-fire high resolution aerial photographs or satellite

imagery of the zone that includes the wildfire; the resulting burnt area maps are

widely used to investigate the spatiotemporal patterns of wildfire impacts [14-19].

Remote sensing from MODIS and other space-borne sensors has been increasingly

Capítulo 5

used in the last decades for automatic or semiautomatic detection of active or past

wildfires, usually from daily records of a suitable combination of reflectance bands,

and with varying results in terms of accuracies and omission and commission errors

[14, 20-30].

The objective of the present work was to develop and test some simple algorithms and

variations for automatic or semiautomatic detection of burnt areas from MODIS 250m

biweekly NDVI time series data for a Mediterranean region. We evaluated the

detection and mapping method on a target area located in the Valencia region, East

Spain, which is a good model for most of the western Mediterranean Basin [11].

Methods

MODIS data were downloaded from the NASA website (currently accessed through the

Reverb data gateway, http://reverb.echo.nasa.gov/reverb/). We used the NDVI 16-

days composite band from a time series of MOD13Q1 MODIS/Terra product at 250m

resolution (tile h17v05), starting in February 2000.

A database of fire event records for the time period 2000-2005 was provided by the

Fire Prevention, Extinction and Emergencies Office of the Valencia Regional

Government (Dirección general de Prevención, Extinción de Incendios y Emergencias,

Conselleria de Gobernación y Justicia, Generalitat Valenciana). For each fire event, the

fields in the database included information on date; Municipality where the wildfire

started, which gives name to the fire; forest and total affected area, and cartography

of the fire perimeter, obtained from IRS or SPOT images and field surveys.

Burnt areas detection and mapping

We used a two-phase algorithm to detect and map burnt areas, first detecting a subset

of seed pixels showing significant drops, as described below, in the series of NDVI data,

and then delimitating the fire scar for each potential wildfire using an extension

algorithm from the seed pixels. The main steps of the detection and mapping method

used in this work are as follows (see [31] for a description of the steps of the algorithm

illustrated with examples):

For each individual pixel in the area, all the points in the time series of NDVI data were

considered as potential change points, i.e., potential dates of wildfire occurrence. We

fitted separate models to the data previous and posterior to the change point

(hereafter pre and post models, and computed the discrepancies between the models.

We tested different types of models, such as polynomials of different degrees and

models including cyclic terms. Discrepancies between the parameters of the pre and

post models were used as measure of the magnitude of the jump in NDVI values at the

potential change-point. We devised an efficient algorithm for equispaced data, as is

the case for nominal dates in NDVI biweekly series, valid for a wide family of fitting

models, for which discrepancies between parameters of the fitted models are linear

functions of the data, so that they can be computed with an appropriate digital filter,

as the convolution of the vector of data with a suitable vector of weights, in a similar

way to the classical Savitzky-Golay smoothing method [32]. As a result of a wide set of

trials, we finally discarded complex models and selected a combination of two simple

models: a simple constant model of 1-year amplitude, to account for the dominance

of an annual phenological cycle of the vegetation, and the difference between two

consecutive points. Discrepancies between pre and post models were calculated using

the product of two Haar wavelets of amplitude n+n and 1+1, what we called a

travelling Haar pulse.

For each pixel in the area, we computed the value of the detected maximum drop in

the series of NDVI values; obtained a map of smoothed maximum drops using a

Gaussian kernel of short amplitude, and selected a few NDVI curves, corresponding to

individual pixels in the proximity of the local maxima in the map, as the curves with

maximum real, non smoothed, change in NDVI values in a very reduced

neighbourhood of the local maxima. The selected curves were ranked by the

magnitude of their maximum NDVI change, and only those with drops exceeding a

certain threshold, which in this work was set to a value of 0.13, were retained as seed

pixel for the second phase of the method.

From the set of seed pixels, we apply an extension clustering algorithm, to define

spatially contiguous clusters through some measure of similarity between each pixel in

the region and those used as seeds. A simple effective approach consisted of clustering

spatially contiguous pixels with detected maximum drops in NDVI values

corresponding to dates similar to those of the seed curves, or by using a combination

of dates closeness and curves similarity, measured as the correlation between their

temporal series, or curves, of NDVI values.

Capítulo 5

Evaluation of the method

We evaluated the detection and mapping method against the Regional database of fire

event records of the Valencia region. We compared the burnt areas detected by our

method with the registered wildfire perimeters and burnt area, for the time period

2000-2005, in a region of interest (ROI) in the Valencia region, located on the

Mediterranean east coast of the Iberian Peninsula (Figure 1).

Figure 1: Location of the Region of Interest (ROI) in the Valencia region, East

Spain.

The ROI is a large area of 10.178 Km2, including the central part, and 44% of the total

area, of the Valencia region. For the period 2000-2005, and considering a minimum fire

size of either 15 ha or 90 ha, total burnt area in the ROI was more than 62% of total

burnt area for the whole Valencia region (Table 1).

We compared the maps of detected and registered burnt areas using the GIS system

ESRI© ArcGis 9.0 (Environmental System Research Institute Inc., California). From this

comparison, we estimated the confusion matrices for the sets of detected and

registered fires with areas larger than 15ha and 90ha, and computed overall

accuracies, omission and commission errors, and several indices of agreement [33, 34].

Table 1: Number of fires and total and forest burnt areas in the ROI and the whole

Valencia region, for the time period 2000-2005, for wildfires with affected areas

exceeding 15ha and 90ha.

ROI Valencia region

Burnt area

Number of fires 52 18 77 31

Total burnt area (ha) 11130 10011 17862 16189

Forest area (%) 21 20 26 25

RESULTS

The application of the method to the ROI, using the combination of local and 1-year

average drops to define seed pixels, yielded the results shown in Figure 2 (only

detected fire scars with areas larger than 15ha are mapped).

Table 2 shows a set of accuracy measures estimated from the confusion matrices that

resulted from the comparisons between detected fire scars and the reference

perimeters registered in the official, regional database. Values of overall accuracy and

the various indices of agreements were very high, particularly for the subset of

medium-to-large (> 90 ha) detected burnt areas as compared with the whole set (> 15

ha) of detected scars. Omission errors were very low, less than 4% for fire scars larger

than 90 ha, while commission errors were moderately low.

The accuracy of the method for the assessment of burned surface is illustrated in

Figure 3. Either for large wildfires, such as the Chiva wildfire, or small ones, such as the

Vall de Ebo wildfire, the agreement between registered perimeters and detected fire

scars is very high.

Capítulo 5

Figure 2. Detected fire scars (> 15 ha) from MODIS in the ROI for the period 2000-2005.

Table 2: Accuracy measures for detected burnt areas, larger than 15ha and larger than

90ha, as compared with the Regional database of fire event records used as reference.

Detected burnt

areas >15ha

Detected burnt

areas >90ha

Overall accuracy (%) 99.52 99.75

Quantity disagreement (%) 0.23 0.19

Allocation disagreement (%) 0.25 0.07

Omission error (%) 11.55 3.43

Commission error (%) 26.93 18.76

Standard Kappa index of agreement (kstandard) 0.798 0.881

Kappa for no information (kno) 0.990 0.995

Kappa for allocation (kallocation) 0.883 0.965

Kappa for quantity (kquantity) 0.995 0.996

Capítulo 5

Figure 3. Registered perimeters (Regional database of fire event records) and detected

burnt areas from MODIS (this study) for the largest (Chiva wildfire, upper panel) and

smallest (Vall de Ebo wildfire, bottom panel) burnt area (> 90 ha) recorded in the ROI

for the period 2000-2005.

Table 3: Main characteristics of the registered (Regional database of fire event records)

and detected (this study) burnt areas for wildfires with affected area larger than 90ha

recorded in the ROI for the period 2000-2005.

Registered burnt area (R) Detected burnt area (D)

Date Wildfire Total (ha) Forest (%) Total (ha) D-R overlap (%)†

16/09/2000 Chiva 2164.46 18.4 2508.41 99.09

28/08/2003 Buñol 1701.64 8.0 1974.55 98.37

03/09/2000 Simat de Valldigna 1259.13 25.5 1381.84 97.59

12/07/2005 Simat de Valldigna 641.95 12.1 792.55 97.92

12/08/2004 Serra 624.39 44.7 674.61 95.92

27/08/2000 Planes 558.42 16.2 639.17 97.74

09/08/2001 Vall de Gallinera 457.11 4.5 586.60 95.41

31/07/2000 Alcalalí 413.45 44.9 444.31 96.55

22/06/2005 Xativa 400.00 33.3 418.88 92.85

31/01/2003 Eslida 391.37 56.7 614.89 91.69

25/06/2001 Villalonga 275.05 0.4 293.44 96.30

24/01/2005 Vall de Almonacid 249.92 30.0 264.01 93.81

21/08/2000 Jérica 208.71 60.0 196.30 78.89

03/08/2000 Requena 190.76 50.0 214.87 91.86

29/08/2001 Chiva 174.30 3.4 199.84 95.34

11/10/2002 Beniganim 105.32 10.3 121.15 90.81

15/02/2005 Llaurí 99.13 9.1 115.15 90.13

27/08/2000 Vall de Ebo 96.16 8.4 98.29 89.40

†Overlapping of registered and detected burnt areas as percentage of the registered

Capítulo 5

Table 3 compares the registered and detected burnt areas for each individual wildfire

(>90 ha) in the ROI for the period 2000-2005. Overlapping areas ranged from78.89 to

99.09 % (average value: 93.9 ± 1.1) of the reference registered area, with larger

overlapping fractions for larger wildfires. Except for one case, Jérica wildfire, the

detected fire scar was slightly larger than the reference registered area.

DISCUSSION

A wide range of Mediterranean burnt areas, ranging in size between 15 ha and 2500

ha, and including contrasting fractions of forest and non-forest (shrublands,

grasslands, croplands) can be effectively detected and mapped with a simple two-

phase algorithm for automatic detection of burnt areas from MODIS 250m biweekly

NDVI time series data. In terms of overall accuracies, omission and commission errors

and indices of agreement, the results obtained with our method were at least as good

as previous studies that used a variety of automatic or semiautomatic fire scars

detection methods from remote sensing in the Mediterranean Basin and other regions

(see, e.g., [14, 29, 35, 36, 37, 38] and references therein).

Some methods to detect fire scars from remotely sensed time series data incorporate,

as is the case with our method, two different phases: first detecting possible abrupt

changes in the temporal data for some pixels, usually by jointly considering several

spectral indices, and then using some algorithm for region growing to finally delimitate

the perimeter of the fire scar. This two-phase approach is a way of balancing or

compensating omission and commission errors, by setting strict thresholds or

conditions for a pixel to be selected as a seed for the second phase, thus lowering

commission errors, and then using, explicitly or implicitly, lower requirements to

incorporate additional pixels in the extension phase, thus reducing omission errors [35,

36, 37].

In our method, detection of abrupt changes in NDVI as a consequence of vegetation

burning was best achieved by using the product of the jumps in 1-year averages and

the drops in NDVI values of two consecutive points, with no clear improvement by

fitting more complex models to the pre- and post- partial data series. Local drops,

between to consecutive values, are normally used in different detection methods,

although the use of 1-year averages has also been incorporated in some algorithms

[39]. We found a clear improvement when using the combination of both differences,

in comparison with the use of each of them separately. We think that both local drops

in NDVI and substantial drops in pre- and post- 1-year averages can be produced by

many factors different of fire, as local fluctuations in NDVI measurements, droughts

and other climatic variations, agriculture and forest management, among others, but

their combination is likely to be the result of a wildfire.

In the second phase, delimitation of fire scars was based on propagating clusters from

the seed pixels to spatially connected pixels with similarities between their dates of

detected maximum drops and the correlations between their NDVI data series. The

maps of the detected burnt areas showed a good agreement with the perimeters

recorded in the database of fire events records used as reference, at least for medium-

to-large fires, with affected area exceeding 90ha. For these set of fires, the average

percentage of burnt area detected by our method was very high (≈ 94%), with an

average percentage of overestimation of 14.0 ± 3.1, which was mainly due to the low

spatial resolution of the MODIS data. We think that our method can equally be applied

to higher resolution data, as Landsat images, from which the expected results would

improve. The method could also be used in a semiautomatic way, to analyse a window

area where a wildfire is known to have occurred, in order to automatically delimitate

its perimeter, and also it could be fine tuned by adjusting to the local characteristics of

the area the value of the threshold to select the seeds, the amplitude of the filter used

to compute the map of smoothed maximum drops, and the relative weights of the

similarities in nominal dates of fire and correlations between NDVI curves in the

clustering algorithm used to extent the seeds.

Although the evaluation of the algorithm was carried out in a relatively large area, it

was applied with a fixed threshold value, which should be adapted to the particular

region to be analysed. As in other methods, the selection of the threshold value can be

based on the distribution of detected maximum drops, or it may be derived from

cross-validation from a set of independently mapped fire scars.

As the objective of the work was to test the ability of using MODIS-derived NDVI and

simple methods to detect and map burnt areas, no other spectral band or indices were

used, no systematic procedure to optimize the parameters of the method was

performed, and no previous filtering of the evaluated area to discard agricultural or

non-forest areas was applied, leaving room for further improvements in the accuracy

of the method.

Capítulo 5

CONCLUSIONS

In a large area in the Mediterranean Valencia region, for the temporal period analyzed,

burnt areas as a consequence of wildfires were efficiently detected from 250 m

resolution MODIS derived NDVI 16-days time series data. A two-phase algorithm, first

identifying seed pixels with maximum NDVI drops above some threshold, combining

local and 1-year average drops, and then clustering contiguous pixels with an

extension algorithm provided good results. The maps of the detected burnt areas

larger than 90ha showed a good agreement with the perimeters registered in the

database of fire records used as reference.

The algorithms of the method can be efficiently implemented, allowing its application

to large regions, thus providing a useful tool for assessing ecological and

environmental factors of wildfire patterns and impacts. Further work could extend the

applicability of the method, by including the automatic selection of thresholds in NDVI

jumps to facilitate its use in regions with different types of vegetation, and optimizing

the modifications needed to best map areas with fire recurrence.

ACKNOWLEDGEMENTS

01), funded by the Spanish Ministry of Innovation and Science, CASCADE (GA283068),

funded by European Commission under the Seventh Framework Program, and

GVPRE/2008/310, funded by the Valencia Regional Government (Generalitat

Valenciana).

REFERENCES

[1] Keeley J.E., Bond W. J., Bradstock R.A., Pausas J.G., Rundel, P.W., Fire in

Mediterranean Ecosystems. Ecology, Evolution and Management. Cambridge

University Press, Cambridge, 2011.

[2] Pausas J.G., Verdu M., Plant persistence traits in fire-prone ecosystems of the

Mediterranean basin: a phylogenetic approach. Oikos, 2005, 109, 196-202.

[3] Bond W. J., Woodward F-I., Midgley G.F. The global distribution of ecosystems in a

world without fire. New Phytol., 2005, 165, 525– 538.

[4] Santana V.M., Baeza M.J., Marrs R.H., Vallejo V.R., Old-field secondary succession in

SE Spain: can fire divert it?. Plant Ecol., 2010, 211, 337–349.

[5] Pausas J.G., The response of plant functional types to changes in the fire regime in

Mediterranean ecosystems. A simulation approach. J. Veg. Sci., 1999, 10, 717-722.

[6] Moreira F., Viedma O., Arianoutsou M., Curt T., Koutsias N., Rigolot E. et al.,

Landscape-wildfire interactions in southern Europe: implications for landscape

management. J. Environ. Manage., 2011, 10, 2389-2402.

[7] López-Poma R., Orr B.J., Bautista S., Effect of pre-fire land use on vegetation

recovery after fire in a Mediterranean mosaic landscape. Int. J. Wildland Fire, (en

prensa).

[8] Mayor A.G., Bautista S., Llovet J., and Bellot J., Post-fire hydrological and erosional

responses of a Mediterranean landscape: Seven years of catchment-scale

dynamics. Catena, 2007, 71, 68-75.

[9] De Luís M., Raventós J., González-Hidalgo J.C., Fire and torrential rainfall: effects on

seedling establishment in Mediterranean gorse shrublands. Int. J. Wildland Fire,

2005, 14, 413-422.

[10] Pausas J. G., Vallejo V. R., The role of fire in European Mediterranean ecosystems.

In: Chuvieco E. (Ed.), Remote sensing of large wildfires in the European

Mediterranean basin. Springer-Verlag, 1999, 3-16.

[11] Pausas J. G., Fernandez-Muñoz S., Fire regime changes in the Western

Mediterranean Basin: From fuel-limited to drought-driven fire regime. Climatic

Change, 2012, 110, 215-226.

[12] Mouillot, F., Rambal, S., Joffre, R., 2002. Simulating climate change impacts on fire

frequency and vegetation dynamics in a Mediterranean-type ecosystem. Glob.

Change Biol., 8, 423-437.

[13] Moriondo M., Good P., Durao R., Bindi M., Giannakopoulos C., Corte-Real J.,

Potential impact of climate change on fire risk in the Mediterranean area. Clim.

Res., 2006, 31, 85-95.

[14] Levin, N., Heimowitz, A., Mapping spatial and temporal patterns of Mediterranean

wildfires from MODIS. Remote Sens. Environ., 2012, 126, 12-26.

Capítulo 5

[15] Díaz-Delgado R., Pons X., Spatial patterns of forest fires in Catalonia (NE of Spain)

along the period 1975-1995. Analysis of vegetation recovery after fire. Forest Ecol.

Manag., 2001, 147, 67-74.

[16] Wittenberg L., Malkinson D., Beeri O., Halutzy A., Tesler N., Spatial and temporal

patterns of vegetation recovery following sequences of forest fire in a

Mediterranean landscape, Mt. Carmel Israel. Catena, 2007, 71, 76-83.

[17] Gouveia C., DaCamara C.C., Trigo R.M., Post-fire vegetation recovery in Portugal

based on spot/vegetation data. Nat. Hazard. Earth Sys., 2010, 10, 673-684.

[18] van Leeuwen W.J.D., Casady G., Neary D., Bautista S., Alloza J.A., Carmel Y. et al.,

Monitoring post-wildfire vegetation response with remotely sensed time-series

data in Spain, USA and Israel. Int. J. Wildland Fire, 2010, 19, 75-93.

[19] van Leeuwen, W.J.D., Monitoring the effects of forest restoration treatments on

post-fire vegetation recovery with MODIS multitemporal data. Sensors, 2008, 8,

2017-2042.

[20] Barbosa P.M., Grégoire J.-M., Pereira J.M.C., An algorithm for extracting burned

areas from time series of AVHRR GAC data applied at a continental scale. Remote

Sens. Environ., 1999, 69, 253-263.

[21] Chuvieco E., Ventura G., Martín M.P., Gómez I., Assessment of multitemporal

compositing techniques of MODIS and AVHRR images for burned land mapping.

Remote Sens. Environ., 2005, 94, 450-462.

[22] Chuvieco E., Martín M.P., Palacios A., Assessment of different spectral indices in

the red—Near-infrared spectral domain for burned land discrimination. Int. J.

Remote Sens., 2002, 23, 5103-5110.

[23] Fraser R.H., Li Z., Cihlar J., Hotspot and NDVI differencing synergy (HANDS): A new

technique for burned area mapping. Remote Sens. Environ., 2000, 74, 362-376.

[24] Giglio L., Descloitres J., Justice C.O., Kaufman Y.J., An enhanced contextual fire

detection algorithm for MODIS. Remote Sens. Environ., 2003, 87, 273-382.

[25] Giglio L., van der Werf G.R., Randerson J.T., Collatz G.J., Kasibhatla P., Global

estimation of burned area using MODIS active fire observations. Atmos. Chem.

Phys., 2006, 6, 957-974.

[26] Roy D.P., Jin Y., Lewis, P.E., Justice, C.O., Prototyping a global algorithm for

systematic fire-affected area mapping using MODIS time series data. Remote Sens.

Environ., 2005, 97, 137-162.

[27] Davies D.K., Ilavajhala S., Wong M.M., Justice, C.O., Fire information for resource

management system: Archiving and distributing MODIS active fire data. IEEE T.

Geosci. Remote, 2009, 47, 72-79.

[28] Roy D.P., Boschetti L., Justice C.O., Ju J., The Collection 5 MODIS burned area

product—global evaluation by comparison with the MODIS active fire product.

Remote Sens. Environ., 2008, 112, 3690-3707.

[29] Boschetti L., Roy D., Barbosa P., Boca R., Justice C., A MODIS assessment of the

summer 2007 extent burned in Greece. Int. J. Remote Sens., 2008, 29, 2433-2436.

[30] Loepfe L., Lloret F., Romn-Cuesta R.M., Comparison of burnt area estimates

derived from satellite products and national statistics in Europe, Int. J. Remote

Sens., 2012, 33, 3653-3671.

[31] García M.A., Alloza J.A., Bautista S., Rodríguez F., Detection and análisis of burnt

areas from MODIS derived NDVI time series data. Proceedings of the SPIE,

(submitted to publication).

[32] Savitzky A., Golay M.J.E., Smoothing and differentiation of data by simplified least

squares procedures. Anal. Chem., 1964, 36, 1627-1639.

[33] Jensen J. R., Introductory digital image processing: A remote sensing perspective.

Prentice Hall, New Jersey, 2005.

[34] Pontius R.G., Millones M., Death to Kappa: birth of quantity disagreement and

allocation disagreement for accuracy assessment. Int. J. Remote Sens., 2011, 32,

4407-4429.

[35] Bastarrika A., Chuvieco E., Martín M.P., Mapping burned areas from Landsat

TM/ETM+ data with a two-phase algorithm: Balancing omission and commission

errors. Remote Sens. Environ., 2011, 115, 1003-1012.

[36] Stroppiana D., Bordogna G., Carrara P., Boschetti M., Boschetti L., Brivio, P.A., A

method for extracting burned areas from Landsat TM/ETM+ images by soft

aggregation of multiple spectral indices and a region growing algorithm. ISPRS J.

Photogramm., 2012, 69, 88-102.

Capítulo 5

[37] Gómez I., Martín M.P., Prototyping an artificial neural network for burned area

mapping on a regional scale in Mediterranean areas using MODIS images, Int. J.

Appl. Earth Obs. Geoinf., 13, 2011, 741-752.

[38] Koutsias N., Pleniou M., Mallinis G., Nioti F., Sifakis N.I., A rule-based semi-

automatic method to map burned areas: exploring the USGS historical Landsat

archives to reconstruct recent fire history, Int. J. Remote Sens., 2013, 34, 7049-

[39] Moreno Ruiz J.A., Riaño D., Arbelo M., French N.H.F., Ustin S.L., Whiting M.L.,

Burned area mapping time series in Canada (1984–1999) from NOAA-AVHRR

LTDR: A comparison with other remote sensing products and fire perimeters,

Remote Sens. Environ., 117, 2012, 407–414.

Como se ha puesto de manifiesto en el Capítulo 2, los modelos ocultos de Markov

(HMMs) pueden ser utilizados para caracterizar las propiedades fenológicas de la

vegetación a partir de datos de índices de vegetación por teledetección.

Los HMMs pueden ser aplicados de forma eficiente a grandes conjuntos de datos y

pueden proporcionar resultados consistentes cuando se consideran conjuntos de

píxeles con dinámicas de vegetación similares, lo que puede conseguirse utilizando una

clasificación externa o mediante un agrupamiento automático previo en función de las

distribuciones de cambios en los valores de NDVI.

Los parámetros estimados mediante el ajuste de HMMs, probabilidades de transición y

medias y varianzas de las emisiones, reflejan las propiedades de los diferentes estados

fenológicos de la vegetación, pudiendo ser utilizados para comparar las dinámicas de

comunidades vegetales afectadas por perturbaciones naturales o tratamientos

experimentales.

La utilización de HMMs permite la incorporación del conocimiento previo del sistema

en la modelización del mismo, mediante la selección del número de estados o de las

transiciones permitidas, proporcionando modelos específicamente ajustados a

distintas comundades, que podrían permitir una major determinación de los

parámetros fenológicos de interés.

Una vez seleccionada la topología de un HMM, la estimación de sus parámetros se

puede realizar a partir de todo el conjunto de píxeles y valores temporales, lo que

produce estimaciones altamente consistentes. Sin embargo, la inferencia de los

estados ocultos se realiza al nivel de píxeles individuales, pudiendo llevarse a cabo en

distintos periodos temporales, permitiendo analizar relaciones con factores

ambientales locales.

Se pueden considerar distintas extensiones al análisis presentado en el Capítulo 2,

incluyendo la utilización de tipos de HMMs más complejos y su comparación con otros

métodos de ajuste de curvas usuales en el análisis fenológico. En particular, podrían

utilizarse modelos de HMMs de orden superior o semi-Markovianos, que permitirían

modelizar de forma más específica la duración de las diferentes fases estacionales.

Los modelos lineales a trozos constituyen uno de los tipos de modelos más simples que

pueden considerarse para extraer las tendencias generales y los puntos de cambio en

series largas de datos con cambios de comportamientos. Los algoritmos presentados

en el Capítulo 3 proporcionan un método computacionalmente eficiente para ajustar

modelos continuos lineales a trozos a series de datos con un alto número de puntos y

en los que es preciso considerar un alto número de puntos de cambio. Estos

algoritmos pueden ser una alternativa eficiente a los métodos basados en la

optimización de funciones de riesgo mediante búsquedas exhaustivas.

El ajuste de funciones a trozos para estimar tendencias y periodos de crecimiento y

decrecimiento en series de datos, como las correspondientes a series temporales de

índices de vegetación por teledetección, implica la asunción explícita o implícita de de

cierta información sobre el sistema, que permite seleccionar el nivel de complejidad

del modelo a ajustar o el rango esperable de los parámetros que lo definen.

Los algoritmos presentados en el Capítulo 3 incluyen diferentes opciones y parámetros

que permiten adaptarlos al tipo de modelo más adecuado en función de los datos

considerados. No obstante, los resultados presentados muestran que no es preciso

realizar una optimización de los valores iniciales o los parámetros de los algoritmos,

pues un amplio rango de valores razonables generalmente proporciona ajustes finales

similares.

Una extension simple de los algoritmos de ajuste de modelos lineales a trozos

presentados en el Capítulo 3 consistiría en su implementación en forma

completamente automática, utilizando medidas globales de ajuste para comparar

modelos con el mismo número de puntos de cambio y algún tipo de índice de

selección de modelos para decidir entre modelos con distinto número de parámetros.

Los datos de series temporales de índices de vegetación pueden mostrar dinámicas

complejas, como se muestra en el Capítulo 4 con datos de NDVI procedentes de

MODIS en zonas de la Comunidad Valenciana. Estas dinámicas incluyen oscilaciones

que no siempre pueden ser adecuadamente analizadas utilizando modelos clásicos de

análisis espectral con valores constantes de los parámetros.

El modelo de ajuste con componentes cuasi-periódicos presentado en el Capítulo 4,

que incluye un componente secular y componentes cíclicos con parámetros variables

en el tiempo, de modo que las fases, amplitudes y frecuencias de las funciones

periódicas del modelo pueden variar de forma lenta a lo largo del tiempo, proporciona

una mayor flexibilidad y una mejor representación del comportamiento de los datos

reales

El ajuste de modelos con componentes cuasi-periódicos puede realizarse mediante un

análisis de tipo tiempo-frecuencia, en el que se realiza un análisis de Fourier local en

cada punto de la serie de datos, utilizando una ventana móvil de amplitud adecuada. El

alto costo computacional de este tipo de análisis pude reducirse notablemente en el

caso de datos equiespaciados, como los correspondientes a fechas nominales en datos

de NDVI, mediante el algoritmo eficiente propuesto en el Capítulo 4, facilitando su

aplicación a grandes volúmenes de datos.

Los valores de los parámetros que definen los componentes cuasi-periódicos, y su

variación en el tiempo, pueden proporcionar una herramienta útil para investigar las

relaciones de la dinámica de la vegetación en función de la composición de la

comunidad vegetal, de factores externos como cambios en los usos del suelo o en

variables climáticas, así como de sus interacciones. La aplicación extensiva del método

en distintas zonas, como se prevé realizar en áreas de la Comunidad Valenciana,

permitiría evaluar el potencial de este tipo de modelos como herramienta de análisis

en ecología vegetal y del paisaje.

La detección de zonas afectadas por incendios forestales, en las que aparecen cambios

abruptos en las series temporales de datos de NDVI, puede llevarse a cabo de forma

efectiva mediante el algoritmo en dos fases presentado en el Capítulo 5.

La identificación de los píxeles semilla del algoritmo puede realizarse de forma simple

combinando las caídas locales, entre un punto temporal y el siguiente, en los valores

de NDVI con las caídas en los promedios anuales previos y posteriores a cada punto,

sin que se obtengan mejores resultados al utilizar modelos de ajuste más complejos. La

delimitación de las áreas quemadas puede llevarse a cabo mediante un algoritmo de

extensión a partir de los píxeles semilla hacia píxeles contiguos con dinámicas de NDVI

similares.

La comparación de las áreas quemadas detectadas con las zonas registradas en una

amplia zona de la Comunidad Valenciana, presentada en el Capítulo 5, mostró una alta

precisión del método, especialmente en el caso de los incendios de mayor superficie.

Los algoritmos del método de detección de áreas quemadas propuesto en el Capítulo 5

pueden ser implementados de forma eficiente, permitiendo su aplicación en áreas

extensas.

Posibles extensiones del método que faciliten su aplicación automática en áreas

diversas incluirían la selección automática de los valores umbrales en las caídas de

NDVI para incorporar un píxel como semilla potencial del algoritmo y optimizar las

modificaciones necesarias para la detección de zonas con recurrencia de incendios. La

aplicación automática del método en regiones amplias, con diferentes tipos de

vegetación, puede proporcionar una herramienta útil para el análisis del efecto de

distintos factores ecológicos y ambientales en la distribución e impacto de los

incendios forestales.

El conjunto de herramientas de análisis desarrolladas en este trabajo puede ser de

utilidad en distintos aspectos del análisis del paisaje y la ecología vegetal. Su aplicación

efectiva a problemas reales de interés ecológico, en algunos casos ya en proceso en

colaboración con otros grupos de investigación, mostrará la potencialidad de los

distintos métodos propuestos.

Análisis espacial mediante modelos ocultos de Markov

Francisco Rodríguez1, Miguel A. García1 y Susana Bautista2

1Departamento de Matemática Aplicada y 2 Departamento de Ecología, Universidad de

Alicante, Ctra. San Vicente del Raspeig s/n, 03690 San Vicente del Raspeig, Alicante,

España.

Extracto del Capítulo 7 del libro: Introducción al Análisis Espacial de Datos en Ecología

y Ciencias Ambientales: Métodos y Aplicaciones. Maestre, F.T.; Escudero, A.; Bonet, A.

(eds.). Servicio de publicaciones de la Universidad Rey Juan Carlos de Madrid, 2008.

ÍNDICE

Resumen

7.1. Introducción

7.2. Elementos y algoritmos básicos de los modelos ocultos de Markov

7.3. Generalizaciones y variaciones del modelo básico

7.4. Casos prácticos

7.4.1 Análisis de transectos de vegetación con datos de presencia-ausencia

7.4.2 Análisis de transectos de vegetación con datos cuantitativos

7.5. Consideraciones finales

7.6. Revisión de software

7.6.1. Herramientas en el entorno MATLAB

7.6.1.1. Funciones básicas en MATLAB

7.6.1.2. Toolbox HMM y conjunto de funciones H2M

7.6.2. Herramientas para el entorno R

7.6.3. Otros programas

7.7. Páginas web de interés

7.1. Introducción

Existe una gran variedad de métodos disponibles para el análisis de patrones

espaciales, como se muestra en los diversos capítulos de este libro o en la bibliografía

específica (p. ej. Dale 1999, Fortin y Dale 2005), con diverso grado de relación entre

ellos (Dale et al. 2002). Algunas de las técnicas en uso fueron desarrolladas

originalmente en otros campos, como es el caso del análisis espectral (Ripley 1978,

Renshaw y Ford 1984) o más recientemente el empleo de wavelets (Dale y Mah 1998),

provenientes ambas del campo de la teoría de la señal y que ya se encuentran

incorporadas en el conjunto de herramientas estándares para realizar análisis espacial

en ecología.

Los modelos ocultos de Markov (que escribiremos, en adelante, HMM, tanto en

singular como en plural, utilizando las iniciales del término en inglés, Hidden Markov

Models, con las que se denotan usualmente), son una técnica de modelización de

datos secuenciales desarrollada y aplicada originalmente en el campo del

reconocimiento automático del habla (Rabiner 1989), donde actualmente es una

herramienta casi imprescindible, pues la mayor parte de las aplicaciones en este

campo incluyen algún tipo de HMM en su estructura. Los HMM han encontrado

aplicación, más recientemente, en disciplinas muy diversas, como el análisis de imagen

(Aas et al. 1999) o la psicología (Visser et al. 2002), destacando su uso creciente en el

análisis de electroencefalogramas y otras señales biológicas (p. ej. Penny y Roberts

1998, Novák et al. 2004) y, muy especialmente, en bioinformática, donde los HMM

están ya bien establecidos como una de las técnicas básicas (p. ej. Baldi et al. 1994,

Baldi y Brunak 1998, Durbin et al. 1998) y donde se están utilizando distintas variantes

de HMM y generalizaciones (p. ej. Winters-Hilt 2006).

A pesar del enorme incremento en las publicaciones sobre HMM y aplicaciones en los

últimos quince años (véanse Cappé 2001a, para una recopilación bibliográfica de la

pasada década, y la Figura 7.1), la utilización de HMM en áreas de interés en ecología,

aunque en aumento, ha sido muy escasa hasta la fecha. En Viovy y Saint (1994) se

aplican los HMM para el estudio de la dinámica temporal de la vegetación a partir de

datos de teledetección. En Tucker y Anand (2005) se discute la utilidad de los HMM, en

comparación con la modelización mediante cadenas de Markov clásicas, para detectar

dinámicas ecológicas complejas. En Guilford et al. (2004) se utilizan HMM para analizar

datos de navegación de aves. En Franke et al. (2004) se analizan mediante HMM los

estados de comportamiento del caribú, mientras que en Franke et al. (2006) se trata

de predecir los lugares de caza del lobo a partir de datos de localización por GPS. En

Ver Hoef y Cressie (1997) se utilizan HMM para modelizar transectos de vegetación en

pastizales, con el objetivo de definir los bordes, o puntos de cambio, entre las zonas

con y sin vegetación. Una aplicación similar, con el objetivo de analizar patrones

complejos en transectos de vegetación, se lleva a cabo en Rodríguez y Bautista (2001),

trabajo en el que se basa parcialmente la revisión que se presenta en Rodríguez y

Bautista (2006).

En la siguiente sección se exponen los aspectos esenciales de los HMM, explicando los

conceptos básicos, pero sin desarrollar con todo detalle las cuestiones matemáticas y

computacionales. Para el lector interesado, una buena recomendación para empezar a

profundizar en el tema es la exposición clásica de Rabiner (1989), así como la revisión

de Bengio (1999), en la que se incluyen distintas extensiones de los modelos básicos.

Entre los libros específicos sobre el tema, posiblemente el de un nivel más asequible

sea MacDonald y Zucchini (1997), mientras que en Elliot et al. (1995) y Cappé et al.

(2005) se presentan diversas generalizaciones y aspectos más avanzados. También

puede encontrarse un tratamiento de los aspectos básicos de los HMM en manuales

sobre aprendizaje automático (p. ej. Sierra 2006).

7.2. Elementos y algoritmos básicos de los modelos ocultos de Markov

Para explicar los conceptos básicos de los HMM, conviene empezar por los modelos

más simples, los denominados HMM discretos de primer orden. Asimismo, es

conveniente recordar en qué consiste una cadena o modelo de Markov, un concepto

bien conocido en ecología, por sus aplicaciones en dinámica de poblaciones (p. ej.

Caswell 2001) o en la modelización de la dinámica de la sucesión (p. ej. Waggoner y

Stephens 1970, Wootton 2001).

Un modelo de Markov consiste esencialmente en un conjunto de K estados, en los que

puede encontrarse el sistema en cada momento, y ciertas probabilidades de transición

o de paso de cada estado a todos los demás, incluyendo la transición al propio estado

de partida. Los estados pueden representar cualquier situación de interés del sistema

que se está modelizando, como las posibles distintas fases en la sucesión de un bosque

o los cuatro nucleótidos que se pueden encontrar en una cadena de ADN, pero una vez

decidido cómo ordenamos los K estados posibles podemos referirnos a ellos por sus

índices y hablar del estado i, donde i puede tomar los valores 1…K. Una realización del

modelo es una cadena de n estados, S1… Sn, en los que se encuentra el sistema en n

momentos sucesivos, es decir, una sucesión {St} de longitud n, en donde cada St puede

tomar uno de los valores 1…K. La propiedad esencial de una cadena de Markov de

orden r es que la probabilidad de que St tome uno de los K valores posibles no

depende de lo que haya ocurrido antes de los r momentos anteriores; en particular, en

una cadena de Markov de primer orden, que es el caso que consideraremos en

adelante, la evolución futura del sistema sólo depende del estado actual y no de cómo

se haya llegado a él, es decir, es independiente de la historia del sistema. Cuando las

probabilidades de transición son constantes a lo largo del tiempo se dice que la cadena

de Markov es homogénea. En ese caso las probabilidades de transición pueden darse

en forma de una matriz T=(Tij) de tamaño K×K (K filas y K columnas), donde el

elemento Tij es la probabilidad de pasar del estado i al estado j, es decir, la

probabilidad condicionada de que sea St+1=j dado que St=i. Si las frecuencias con las

que aparecen los distintos estados permanecen constantes, se dice que la cadena es

estacionaria; las frecuencias de estados en una cadena estacionaria están

determinadas por las probabilidades de transición (la propiedad matemática que

define estas probabilidades es que constituyen un vector propio de T correspondiente

al valor propio 1). Una forma de representar un modelo de Markov es mostrar los

distintos estados posibles y las probabilidades de transición entre ellos (Fig. 7.2); un

esquema alternativo, para indicar cómo evoluciona el sistema y destacar la

independencia condicional respecto de la historia previa, se presenta en la Figura 7.3.

En un HMM existe una cadena de Markov, pero corresponde a un proceso oculto, no

observable. Estos estados ocultos corresponden a propiedades del sistema, reales o

ideales, que no podemos observar directamente, pero que se corresponden, a través

de un modelo probabilístico, con un conjunto de manifestaciones que sí pueden ser

observadas, lo que en los HMM discretos se denominan los símbolos del sistema, de

modo que para cada estado oculto existe una cierta probabilidad de que se observe

uno de estos símbolos, denominada probabilidad de emisión del correspondiente

símbolo.

Precisando estas ideas para el caso de un HMM discreto de primer orden, un HMM de

este tipo queda definido por los siguientes cuatro elementos esenciales: un conjunto

de K estados, un conjunto de D símbolos, una matriz T=(Tij), de tamaño K×K, de

probabilidades de transición y una matriz E=(Eip), de tamaño K×D, de probabilidades de

emisión. Los datos observables consisten en una cadena de n símbolos, es decir, una

sucesión {Yt} de longitud n, donde el valor de cada Yt puede ser uno de los D símbolos,

para t = 1,…n. Existe una sucesión correspondiente oculta, no observable, de estados

{St}, donde cada St puede tomar uno de los K valores que constituyen el conjunto de

estados. De igual modo que en las cadenas de Markov, siempre que no se produzca

ambigüedad podremos identificar las etiquetas de los símbolos o estados y sus índices,

de modo que diremos que St toma los valores 1…K y que Yt toma los valores 1…D.

Cuando el sistema se encuentra en el estado i, tiene una probabilidad Tij de pasar al

estado j, incluyendo la posibilidad de pasar al mismo estado, es decir, de que en el

momento St+1 el sistema permanezca en el mismo estado en el que se encontraba en el

instante St, lo que ocurrirá con probabilidad Tii; asimismo, existe una cierta

probabilidad Eip de emitir el símbolo p, que sólo depende del estado en el que se

encuentra el sistema. Estas probabilidades de transición, Tij, y de emisión, Eip, son

independientes de cuál haya sido la historia del sistema hasta llegar al estado actual,

es decir, de cuáles hayan sido las sucesiones de estados (ocultos) y de observaciones.

Por tanto, podemos decir que existen dos relaciones de independencia condicional

para las sucesiones de estados y de observaciones. Dado un cierto estado St, se tiene

que Yt es independiente del resto de observaciones, mientras que dado St-1 se tiene

que St es independiente de todos los estados anteriores S1 … St-2. Esta última relación

nos dice que la sucesión de estados ocultos constituye una cadena de Markov de

primer orden.

Una representación gráfica de relaciones de probabilidad condicionadas, como las

anteriores, entre un conjunto de variables aleatorias se denomina red bayesiana (Pearl

1988; Heckerman 1996). La Figura 7.4 muestra una red bayesiana correspondiente a

un HMM de primer orden, representada mediante un grafo acíclico dirigido, donde los

nodos del grafo representan las variables aleatorias correspondientes a los estados del

sistema en momentos sucesivos (círculos para los estados ocultos y cuadrados para las

observaciones) y donde la ausencia de una flecha entre dos variables nos indica su

independencia condicional.

Aunque para caracterizar un HMM necesitamos conocer todos sus elementos, la

estructura básica, lo que se denomina la topología, de un HMM queda definida por el

número de estados ocultos, el número de símbolos y las transiciones de estados y

emisiones de símbolos no permitidas (para las que las correspondientes

probabilidades de transición o emisión se asume que son cero). Estos son los

elementos básicos que deben tenerse en cuenta para modelizar un cierto sistema,

pues quiénes sean las etiquetas correspondientes a los estados ocultos y a los símbolos

no afecta al modelo, aunque es esencial para su interpretación, mientras que los

valores concretos de los parámetros del HMM, esto es, los valores de las

probabilidades de transición y de emisión que pueden tomar valores no nulos, serán

normalmente estimados a partir de los datos experimentales, ajustando de esta forma

a los datos el modelo cuya estructura se ha propuesto previamente.

Conociendo la topología de un HMM y los valores de sus parámetros (probabilidades

de transición y emisión) se podría simular su comportamiento, es decir, podríamos

obtener secuencias aleatorias de estados ocultos y observaciones generadas por el

modelo. Para ello, sin embargo, necesitamos conocer un último elemento, las

frecuencias iniciales o probabilidades de iniciar la cadena en cada uno de los K estados

posibles, es decir, un vector con elementos πi, para i=1..K, donde πi=P(S1=i),

probabilidad de que la sucesión de estados comience en el estado i. Si suponemos que

la sucesión de estados ocultos es estacionaria, entonces las probabilidades de

encontrar los distintos estados permanecen constantes a lo largo del tiempo y las

frecuencias iniciales serán iguales a estas probabilidades que, como se dijo antes,

quedan perfectamente determinadas a partir de la matriz de probabilidades de

transición. Esta suposición de estacionariedad será razonable, por ejemplo, cuando la

secuencia de observaciones que intentamos modelizar sea un segmento aleatorio

dentro de una serie más larga, posiblemente infinita. Por el contrario, en algunas de las

aplicaciones más extendidas de los HMM, como en reconocimiento de habla o en

bioinformática, la identificación de una cierta frase, en una sucesión de sonidos, o de

una cierta estructura, en la secuencia de una proteína, requerirá disponer de un

modelo para el estado inicial de la secuencia a identificar, o su determinación, de la

forma más precisa posible, a partir de las observaciones.

En todo caso, la mayor utilidad de los HMM como herramientas de análisis se basa en

la posibilidad de estimar un modelo a partir de una serie de datos, que suponemos que

son el resultado observable de una serie de estados no directamente accesibles y en

los que estamos interesados, bien sea por tener un cierto significado para el problema

abordado o porque de esta forma se obtiene un modelo con una mayor capacidad de

predicción.

Una vez que hemos definido una cierta topología, existen dos problemas básicos en el

análisis de HMM, conocidos como el problema del aprendizaje y el problema de la

inferencia. Dada una secuencia de observaciones, el problema del aprendizaje consiste

en estimar los parámetros del modelo, es decir, las probabilidades de transición y

emisión. En realidad podría plantearse un problema de aprendizaje más general, en el

que se incluyese la selección de la topología, pero, aunque se han propuesto técnicas

para abordar este problema general (Heckerman 1996), se trata de un tema más

complejo, sobre el que comentaremos algunos aspectos más adelante. Dada una

secuencia de observaciones, y una vez que los parámetros han sido estimados, o bien

si son previamente conocidos, el problema de la inferencia consiste en obtener la

correspondiente sucesión de estados ocultos.

El problema del aprendizaje puede ser resuelto mediante el algoritmo denominado EM

(Dempster et al. 1977), o de maximización de la esperanza (expectation maximisation),

que proporciona los valores de los parámetros que maximizan (el logaritmo de) la

verosimilitud de las observaciones en función de los parámetros; usualmente se utiliza

el denominado algoritmo de Baum-Welch (Baum et al. 1970), que es una versión

particular, eficiente desde el punto de vista computacional, del algoritmo EM. El

problema de la inferencia puede ser resuelto obteniendo la sucesión de estados

ocultos más probable mediante un algoritmo de programación dinámica conocido

como algoritmo de Viterbi (Viterbi 1967), que es un caso especial de algoritmos de

inferencia aplicables a modelos gráficos más generales desarrollados por Pearl (1988) y

otros autores (p. ej. Smyth 1997, Smyth et al. 1997).

Veamos con un ejemplo los distintos conceptos expuestos en este apartado. La

estructura del HMM que vamos a considerar se muestra en la Figura 7.5, donde se

indica que el modelo consta de dos estados ocultos y dos símbolos, con las

probabilidades de transición y emisión que allí se muestran. Este modelo es similar a

uno de los que vamos a considerar más adelante para modelizar transectos con datos

de presencia o ausencia de vegetación; en ese caso, los estados ocultos

corresponderán a zonas densas y ralas de vegetación (manchas y claros) mientras que

los símbolos corresponderán a que en el punto concreto del transecto haya habido o

no contacto con vegetación. Utilizando el modelo indicado en la Figura 7.5, se ha

simulado una cadena de estados ocultos, y la correspondiente secuencia de símbolos,

de longitud 50 (Figura 7.6). Para obtener el estado inicial de la secuencia de estados

ocultos, se han considerado probabilidades de inicio correspondientes a las

frecuencias estacionarias, π = (π1, π2) = (0,5, 0,5), que se han obtenido a partir de la

matriz de (probabilidades de) transición como el vector propio correspondiente al

valor propio 1, normalizado de modo que la suma de sus componentes sea la unidad.

Supongamos ahora, como es habitual en las aplicaciones, que no conocemos ni los

parámetros del modelo ni la sucesión de estados ocultos, sino únicamente la secuencia

de símbolos. Dados ciertos valores de los parámetros, es decir, dado un modelo (M), se

puede calcular la probabilidad de cualquier sucesión de estados ocultos, P(estados |

M), pues sólo hay que multiplicar la probabilidad de inicio por las probabilidades de

transición correspondientes; asimismo, dada una cierta sucesión de estados ocultos, la

probabilidad de que se emita una secuencia de observaciones puede calcularse,

P(observaciones | estados), pues sólo hay que multiplicar las correspondientes

probabilidades de emisión; por tanto, la probabilidad de que el modelo emita la

secuencia de observaciones de la Figura 7.6 se obtendrá sumando, para todas las

sucesiones de estados ocultos posibles, el producto de estas dos probabilidades,

P(observaciones | estados) x P(estados | M), y esta probabilidad nos indicará la

verosimilitud de los datos observados como función de los parámetros del modelo. Los

valores de los parámetros que hacen máxima la probabilidad de los datos observados

son las estimaciones de máxima verosimilitud, que se obtienen aplicando el algoritmo

de Baum-Welch. Una vez que se han obtenido estimaciones de los parámetros,

podemos calcular la probabilidad a posteriori de cada una de las secuencias posibles

de estados ocultos dados los datos observados; la estimación de la secuencia de

estados ocultos que proporciona el algoritmo de Viterbi es precisamente la secuencia

que hace máxima esta probabilidad. Una explicación detallada y muy clara del

funcionamiento de estos algoritmos puede verse en Rabiner (1989). En la Figura 7.6 se

muestran también los valores estimados de los parámetros del modelo y la secuencia

estimada de estados ocultos. Como puede observarse, existen discrepancias, en algún

caso apreciables, entre los valores reales y estimados de los parámetros, debido al

tamaño reducido de la secuencia de observaciones utilizada para ajustar el modelo.

7.3. Generalizaciones y variaciones del modelo básico

Una característica esencial de los HMM es que el conjunto de estados ocultos es

discreto, aunque algunos autores relajan el término para incluir modelos estocásticos

más generales (p. ej. Elliot et al. 1995). Sin embargo, los datos observables pueden

corresponder a una variable aleatoria cualquiera que siga una cierta distribución cuyos

parámetros dependan del estado en el que se encuentra el sistema. El caso que hemos

considerado anteriormente, con un número finito de símbolos que se emiten con

ciertas probabilidades, dependiendo del estado oculto, es sólo el caso más simple que

podemos encontrar.

Supongamos, por ejemplo, que se dispone un transecto con unidades de muestreo en

las que se cuenta el número de individuos de una especie y se trata de identificar

zonas de alta y baja densidad. El modelo que podemos considerar es que las

observaciones siguen una distribución de Poisson, pero que el parámetro de esta

distribución, es decir, el número medio de individuos por unidad de muestreo, es

mayor o menor según que nos encontremos en una zona de alta o baja densidad. De

esta forma, consideraríamos un HMM con dos estados ocultos (alta y baja densidad) y

emisiones discretas, pero no finitas, determinadas por una función de probabilidad con

un parámetro que depende del estado oculto.

También podríamos considerar situaciones en las que las observaciones correspondan

a una variable aleatoria continua, como ocurriría, por ejemplo, al considerar transectos

de cuadrados de vegetación en los que la variable de interés fuese la biomasa o la

cobertura. En estos casos, en lugar de los símbolos y probabilidades de emisión que

encontramos en los HMM discretos, lo que se tiene es, para cada estado oculto, una

función de densidad que determina la distribución de probabilidad de las

observaciones. Podríamos, por ejemplo, suponer que la biomasa en cada unidad de

muestreo sigue una distribución normal, pero que los parámetros de la distribución,

media y/o varianza, dependen del estado oculto del sistema. En la sección siguiente se

presentará un ejemplo de modelización de transectos de vegetación con datos de

cobertura, donde se mostrará cómo se aplican los HMM con observaciones continuas.

Existen diversas extensiones al modelo básico, discreto o continuo, considerado

anteriormente, que añaden aún mayor flexibilidad a la capacidad de modelización de

los HMM. Entre ellas, podemos destacar los modelos ocultos semimarkovianos, en los

que un estado puede emitir una cadena de símbolos cuya longitud viene determinada

por una cierta distribución de probabilidad, y que han sido utilizados para analizar la

variación espacial en datos de precipitación (Sansom 1999, Sansom y Thompson 2003).

También podrían ser de interés para el análisis de patrones espaciales los HMM

jerárquicos (Fine et al. 1998) y los HMM factoriales (Ghahramani y Jordan 1997), en los

que se asume una cierta estructura en el conjunto o en la sucesión de estados ocultos,

lo que los hace adecuados para la modelización de sistemas con patrones a distintas

escalas.

Los HMM son herramientas de modelización de datos secuenciales, debido a que se

basan en que los estados ocultos constituyen una cadena de Markov. En dos o más

dimensiones, el instrumento análogo a las cadenas de Markov es lo que se denomina

campos aleatorios de Markov (MRF o Markov random fields), en los que la realización

del modelo, en lugar de venir dado por una secuencia de estados del sistema en

momentos sucesivos, consiste en un conjunto de estados del sistema con ciertas

relaciones de contigüidad, como, por ejemplo, los estados en distintas posiciones

espaciales. La propiedad análoga a la condición de que en una cadena de Markov de

orden r la probabilidad de que un estado tenga un cierto valor sólo depende de los r

estados anteriores, y no de los demás, consiste en un MRF en que la probabilidad de

que un estado tenga un cierto valor sólo depende de los elementos vecinos, según la

relación de contigüidad y el tipo de MRF considerado, y es independiente del resto de

elementos (véase, por ejemplo, Rue y Held 2005). Existen también los

correspondientes modelos con estados ocultos, denominados HMRF (Kunsch et al.

1995), que permiten la modelización de, por ejemplo, sucesiones bidimensionales de

datos con una lógica similar a la de los HMM, aunque con una mayor complejidad que

en el caso unidimensional.

7.5. Consideraciones finales

El aspecto que consideramos más destacable en la aplicación de HMM al análisis de

patrones espaciales es la utilización de un modelo explícito de la estructura del patrón,

a través de la selección de la topología del HMM. De esta forma, el conocimiento que

se tenga a priori sobre el sistema, o que se derive de otro tipo de análisis descriptivos

previos, puede incorporarse en el proceso de modelización, permitiendo un estudio

más profundo y detallado.

En las áreas en las que los HMM están más extendidos, se utilizan, en general, con un

enfoque de aprendizaje automático, en problemas en los que se dispone de una

abundante base de datos (denominados de entrenamiento) con los que es posible

seleccionar y estimar el HMM más apropiado (a veces, con un gran número de estados

ocultos), obteniéndose modelos con una alta capacidad de predicción, pues el

problema típico es clasificar o identificar nuevos datos. Aunque una situación similar

puede darse en algunos problemas de interés en ecología (p. ej. modelización de series

de precipitaciones o de datos de teledetección), es más habitual que se disponga de un

conjunto reducido de datos y que el enfoque sea más de tipo estadístico, en el sentido

de que se desee poder obtener intervalos de confianza para las estimaciones de los

parámetros y poder contrastar tanto los valores de ajuste de un cierto modelo como

distintos modelos alternativos. Aunque estas cuestiones no solían ser tratadas con

detalle en las publicaciones tradicionales sobre HMM, es un tema que está recibiendo

mayor atención en los últimos tiempos, lo que sin duda contribuirá a la extensión del

uso de HMM en problemas de ecología y de otras disciplinas con necesidades

similares.

Además de la posibilidad de usar bootstrapping, se tiene que, dada una cierta

topología, las estimaciones de máxima verosimilitud de los parámetros (obtenidas con

el algoritmo EM) son asintóticamente normales (Bickel et al. 1998) y es posible

obtener intervalos de confianza (Visser et al. 2000) y realizar tests de razón de

verosimilitudes para contrastar sus valores (Giudici et al. 2000). Sin embargo, estos

tests no sirven para decidir entre modelos con diferente número de estados; aunque

se pueden utilizar criterios de selección de modelos para elegir el modelo más

apropiado entre un conjunto de modelos candidatos (Visser et al. 2002, MacDonald y

Zucchini 1997), la fundamentación teórica de este enfoque no está completamente

justificada. No obstante, existen diversas aproximaciones al problema (p. ej.

Stinchcombe y White 1998) e investigaciones en marcha, que deben proporcionar a

corto plazo métodos alternativos plenamente contrastados.

Referencias

Aas, K., Eikvil, L. y Huseby, R.B. 1999. Applications of hidden Markov chains in image

analysis. Pattern Recognition 32: 703-713.

Baldi, P., Chauvin, Y., Hunkapiller, T. y McClure, M.A. 1994. Hidden Markov models of

biological primary sequence information. Proceedings of the National Academy of

Science USA 91: 1059-1063.

Baldi, P. y Brunak, S. 1998. Bioinformatics, the Machine Learning Approach. MIT Press,.

Baum, L.E., Petrie, T., Soules, G. y Weiss, N. 1970. A maximization technique occurring

in the statistical analysis of probabilistic functions of Markov chains. The Annals of

Mathematical Statistics 41: 164-171.

Bautista, S. 1999. Regeneración post-incendio de un pinar (Pinus halepensis, Miller) en

ambiente semiárido. Erosión del suelo y medidas de conservación a corto plazo, Tesis

doctoral, Universidad de Alicante. Alicante. 238 pp.

Bautista, S. y Vallejo, V.R. 2002. Spatial variation of post-fire plant recovery in Aleppo

pine forests. En: Fire and Biological Processes (eds Trabaud L. y Prodon, R.), pp. 13-24.

Backhuys Publishers, Leiden.

Bellot, J., Bautista, S. y Meliá, N. 2000. Post-fire regeneration in a semiarid pine forest

as affected by the previous vegetation spatial pattern. En: Mediterranean

Desertification. Research results and policy implications, EUR 19303 (eds. Balabanis, P.,

Peter, D., Ghazi, A. y Tsogas, M.) pp. 343-350. European Commission, Luxembourg.

Bengio, Y. 1999. Markovian models for sequential data. Neural Computing Surveys 2:

129-162.

Bickel, P. J., Ritov, Y., y Ryden, T. 1998. Asymptotic normality of the maximum-

likelihood estimator for general hidden Markov models. Annals of Statistics 26: 1614-

Cappé, O. 2001a. Ten years of HMMs. URL: www.tsi.enst.fr/~cappe/docs/hmmbib.html

Cappé, O. 2001b. H2M : A set of MATLAB/OCTAVE functions for the EM estimation of

mixtures and hidden Markov models. URL: www.tsi.enst.fr/~cappe/h2m/

Cappé, O., Moulines, E. y Rydén, T. 2005. Inference in Hidden Markov Models. Springer.

Caswell, H. 2001. Matrix Population Models. Construction, Analysis and Interpretation.

Second edition. Sinauer Associates, Inc. Publishers, Sunderland, Massachussets.

Dale, M.R.T. 1999. Spatial Pattern Analysis in Plant Ecology. Cambridge University

Press, Cambridge.

Dale, M.R.T. y Mah, M. 1998. The use of wavelets for spatial pattern analysis in

ecology. Journal of Vegetation Science 9: 805-814.

Dale, M.R.T., Dixon, P., Fortin, M.J., Legendre, P., Myers, D.E. y Rosenberg, M.S. 2002.

Conceptual and mathematical relationships among methods for spatial analysis.

Ecography 25: 558-577.

Dempster, A.P., Laird, N.M. y Rubin, D.B. 1977. Maximum likelihood from incomplete

data via the EM algorithm. Journal of the Royal Statistical Society Series B 39: 1-38.

Durbin, R., Eddy, S., Krogh, A. y Mitchison, G. 1998. Biological sequence analysis.

Probabilistic models of proteins and nucleic acids. Cambridge University Press,

Cambridge.

Efron, B. y Tibshirani, R.J. 1993. An introduction to the bootstrap. Chapman and Hall,

Nueva York.

Elliiott, R.J., Aggoun, L. y Moore, J.B. Hidden Markov Models. Estimation and Control.

1995. Springer-Verlag, Nueva York.

Fine, S., Singer, Y. y Tishby, N. 1998. The Hierarchical Hidden Markov Model: Analysis

and Applications. Machine Learning 32: 41-62.

Fortin, M.J. y Dale, M.R.T. 2005. Spatial Analysis. A Guide for Ecologists. Cambridge

University Press, Cambridge.

Franke, A., Caelli, T. y Hudson, R.J. 2004. Analysis of movements and behavior of

caribou (Rangifer tarandus) using hidden Markov models. Ecollogical Modelling 173:

259-270.

Franke, A., Caelli, T., Kuzyk, G. y Hudson, R.J. 2006. Prediction of wolf (Canis lupus) kill-

sites using hidden Markov models. Ecollogical Modelling 197: 237-246.

Ghahramani, Z. y Jordan, M.I. 1997. Factorial hidden Markov models. Machine

Learning 29: 245-273.

Giudici, P., Ryden, T. y Vandekerkhove, P. 2000. Likelihood-Ratio Tests for Hidden

Markov Models. Biometrics 56: 742-747.

Greig-Smith, P. 1952. The use of random and contiguous quadrats in the study of the

structure of plant communities. Annals of Botany 16: 293-316.

Greig-Smith, P. 1979. Pattern in vegetation. Journal of Ecology 67: 755-779.

Guilford, T., Roberts, S., Biro, D. y Rezek, I. 2004. Positional entropy during pigeon

homing II: navigational interpretation of Bayesian latent state models. Journal of

Theoretical Biology 227: 25-38.

Elliot, R.J., Aggoun, L. y Moore, J.B. 1995. Hidden Markov Models – Estimation and

Control. Springer -Verlag, Nueva York.

Harte, D.S. 2005. Package “HiddenMarkov”: Discrete Time Hidden Markov Models.

Statistics Research Associates, Wellington. URL:

www.statsresearch.co.nz/software.html

Heckerman, D. 1996. A tutorial on learning with Bayesian networks. Technical Report

MSR-TR-95-06. Microsoft Research, Redmon.

Hill, M.O. 1973. The intensity of spatial pattern in plant communities. Journal of

Ecology 61: 225-235.

Kunsch, H., Geman, S. y Kehagias, A. 1995. Hidden Markov random fields. The Annals

of Applied Probability 5: 577-602.

MacDonald, I. L. y Zucchini, W. 1997. Hidden Markov and Other Models for Discrete-

Valued Time Series. Chapman and Hall, Londres.

Murphy, K. 1998. Hidden Markov Model (HMM) Toolbox for Matlab. URL:

www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html

Novák, D., Cuesta-Frau, D., ani, T.Al, Aboy, M., Mico, P. y Lhotská, L. 2004. Speech

Recognition Methods Applied to Biomedical Signals Processing. Proceedings of the

26th Annual International Conference of the IEEE EMBS, San Francisco, CA, pp. 118-

Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible

Inference. Morgan Kaufmann, San Mateo.

Penny W.D., Roberts S.J. 1998. Gaussian Observation Hidden Markov Models for EEG

analysis. Technical Report TR-98-12, Imperial College, London.

Rabiner, L.R. 1989. A tutorial on hidden Markov models and selected applications in

speech recognition. Proceedings of the IEEE 77: 257-286.

Renshaw, E. y Ford, E.D. 1984. The description of spatial pattern using two-dimensional

spectral analysis. Vegetatio 56: 75-85.

Ripley, B.D. 1978. Spectral analysis and the analysis of pattern in plant communities.

Journal of Ecology 66: 965-981.

Rodríguez, F. y Bautista, S. 2001. Patch-gap analysis of presence-absence data in

vegetation transect using hidden Markov models, with application to the

characterisation of post-fire plant pattern disturbance in a semiarid pine forest. En:

Ecosystems and Sustainable Development III, Advances in Ecological Sciences 10 (eds.

Brebbia, C.A., Villacampa, Y. y Usó, J.L.), pp. 801-809. WIT Press, Southampton.

Rodríguez, F. y Bautista, S. 2006. Modelos ocultos de Markov para el análisis de

patrones espaciales. Ecosistemas 2006/3. URL:

www.revistaecosistemas.net/articulo.asp?Id=433&Id_Categoria=1&tipo=portada

Rue, H. y Held, L. 2005. Gaussian Markov Random Fields. Theory and Applications.

Chapman & Hall/CRC, Boca Raton.

Sansom, J. 1998. A hidden Markov model for rainfall using breakpoint data. Journal of

Climate 11: 42-53.

Sansom, J. 1999. Large scale variability of rainfall through hidden semi-Markov models

of breakpoint data. Journal of Geophysical Research 104 (D24): 31631-31643.

Sansom, J. y Thompson, C.S. 2003. Mesoscale spatial variation of rainfall through a

hidden semi-Markov model of breakpoint data. Journal of Geophysical Research 108

(D8): 8379.

Sierra, B. 2006. Aprendizaje automático: conceptos básicos y avanzados. Aspectos

prácticos utilizando el software Weka. Pearson-Prentice Hall, Madrid.

Smyth, P. 1997. Belief networks, hidden Markov models, and Markov random fields: A

unifying view. Pattern Recognition Letters 18: 1261-1268.

Smyth, P., Heckerman, D. y Jordan, M.I. 1997. Probabilistic independence networks for

hidden Markov probability models. Neural Computation 9: 227-269.

Stinchcombe, M. and White, H. 1998. Consistent specification testing with nuisance

parameters present only under the alternative. Econometric Theory 14: 295-324.

Tucker, B.C. y Anand, M. 2005. On the use of stationary versus hidden Markov models

to detect simple versus complex ecological dynamics. Ecological Modelling 185: 177–

Ver Hoef, J.M. y Cressie, N. 1997. Using hidden Markov chains and empirical Bayes

change-point estimation for transect data. Environmental and Ecological Statistics 4:

247-264.

Viovy, N. y Saint, G. 1994. Hidden Markov Models Applied to Vegetation Dynamics

Analysis Using Satellite Remote Sensing. IEEE Transactions on Geoscience and Remote

Sensing 32: 906-917.

Visser, I., Raijmakers M.E.J. y Molenaar P.C.M. 2000. Confidence intervals for hidden

Markov model parameters. British Journal of Mathematical and Statistical Psychology

53: 317–327.

Visser, I., Raijmakers, M.E.J. y Molenaar, P.C.M. 2002. Fitting hidden Markov models to

psychological data. Scientific Programming 10: 185–199.

Viterbi, A.J. 1967. Error bounds for convolutional codes and an asymptotically optimal

decoding algorithm. IEEE Transactions on Information Theory 13: 260-269.

Waggoner, P. y Stephens, G. 1970. Transition probabilities for a forest. Nature 225:

1160-1161.

Winters-Hilt, S. 2006. Hidden Markov model variants and their application. BMC

Bioinformatics 7 (Supl. 2):S14.

Wootton, J. 2001. Prediction in complex communities: analysis of empirically derived

markov models. Ecology 82: 580-598.

1983 1986 1989 1992 1995 1998 2001 2004 2007

Figura 7.1. Distribución por año del número de artículos en revistas científicas

internacionales incluidas en el “Science Citation Index” (SCI), desde 1983, que incluyen

el término “hidden Markov” en el título (1047 publicaciones en total). Fecha de

búsqueda: 18/04/07.

Figura 7.2. Representación de una cadena de Markov, indicando los estados del

sistema y las probabilidades de transición entre ellos.

StSt-1 St+1

Figura 7.3. Representación de una cadena de Markov de primer orden, indicando los

estados del sistema en momentos sucesivos. Las flechas directas entre estados indican

las relaciones de (in)dependencia condicional. Para una cadena de primer orden, la

probabilidad de que St tome un cierto valor sólo depende del valor del estado en el

momento inmediatamente anterior, St-1, y no de los valores de los estados en

momentos previos.

StSt-1 St+1

Yt Yt+1 Yt-1

Figura 7.4. Representación de un HMM de primer orden en forma de red bayesiana.

0,9 0,1T0,3 0,7

= 0,8 0,2E0,1 0,9

Figura 7.5. Esquema de los estados ocultos y transiciones permitidas en un ejemplo de

HMM discreto de primer orden. El modelo consta de dos estados, con las

probabilidades de transición indicadas en la matriz T, y dos símbolos, con las

probabilidades de emisión mostradas en la matriz E. Aunque en el esquema no se han

representado las emisiones, el número de símbolos queda indicado por el número de

columnas de la matriz E.

Estados ocultos:

Observaciones:

Estados ocultos inferidos con el algoritmo de Viterbi:

Parámetros estimados:

0,96 0,04T

0,07 0,93

0,89 0,11E

0,27 0,73

Figura 7.6. Sucesiones de estados ocultos y observaciones (de longitud 50) simuladas

con el modelo de la Figura 7.5, parámetros estimados a partir de la secuencia de

observaciones utilizando el algoritmo de Baum-Welch y sucesión de estados ocultos

inferida utilizando el algoritmo de Viterbi.

Series espacio-temporales, NDVI, MODIS, Componentes Miguel ... · Series espacio-temporales, NDVI, MODIS, HMM, Componentes cuasiperiódicos, Fenología, Detección de zonas incendiadas

Documents

Using RapidEye and MODIS Data Fusion to Monitor … ·...

FREQUENCY ANALYSIS OF MODIS NDVI TIME SERIES FOR ... ·...

Ejercicio 1: Extrayendo Datos de Series Temporales MODIS...

MODIS NDVI Nov. 2007 (NASA) Allan Spessa Modelling...

MODIS NDVI Time Series Classification in ENVI for … ·...

Assessment of MODIS NDVI time series data products for

-IKONOS, ETM, MODIS NDVI: comparison -Jeff Morisette,...

Remote Sensing of Environment - Brian Wardlow · NDVI as...

Background to the MODIS Weekly Maximum-NDVI Composites for.....

Different responses of MODIS-derived NDVI to root-zone soil....

Cropland area estimates using Modis NDVI time series in...

Pre and Post fire vegetation behavioral trends from...

Assessment of MODIS NDVI time series data products for...

Evaluation of MODIS LAI, fAPAR and the relation between...

Landsat ETM+ and MODIS EVI/NDVI Data Products for...

Reconstruction of NDVI time-series datasets of MODIS based.....