Top Banner
42 Methods If the geographical space defined by a given leaf has enough utility, a model is run on the environmental and occurrence data within the specified space. Utility is defined as the ratio of observed species locations over the total geographic space. Once all the base models have generated predictions, these predic- tions are aggregated, with each base model’s predictions being weighted by the relative size of the geographic extent used to train the model. Finally, each prediction is computed as the weighted average prediction of all base models at that location. Our work uses Maximum Entropy (Maxent) modeling as the base model. + + HDDT Overall Predictions Model Training Partitioned Data (Occurrences + Environment) Aggregate Predictions Generate Base Model Partition Data We propose a model that recursively partitions the geographic space into regions appropriately sized as input into local or “base” models, and then aggregates the weighted predictions of the base models as the final prediction. We use a Hellinger Distance Decision Tree (HDDT) to recursively partition the space, with each leaf of the tree defining a particular geographic space. As a skew insensitive method, the HDDT model is effective even when the number of species observations is small. Model Together, these conditions can define the variables of the geographical space G. Species distribution models are correlative methods that estimate the area with suitable abiotic conditions for species, known as G A , based upon observed locations. Often neglected, however, is the implicit effect of the size of the geographic space itself, which, if too large, is likely to produce a drastic imbalance in the number of observed locations versus unobserved ones and, if too small, may represent only a fraction of the species’ suitable condi- tions. Our work focuses on reducing these effects by ensuring that input data comes from a reasonable geographic extent. The geographical distribution of a species is defined by the confluence of three factors: biotic conditions, abiotic condi- tions, and movement conditions. Each is elaborated upon in the following diagram. Background Species: Vireo bellii Data Source: GBIF The method outperforms the base models by all metrics measured, including AUROC and AUPR. The predictions also display visibly greater fine-grained detail. Shown below is a species distribution prediction generated by the partitioning method for the North American songbird Bell’s Vireo. Results Forming knowledge of the potential distri- butions of species is important for the con- tinued development of conservation strate- gies. One method to assist in this process is species distribution modeling, which is the modeling of species’ niche requirements by combining occurrence data with ecological and environmental variables. We develop a method of robustifying species models by partitioning the environmental extent area, which can vary significantly. Decision trees are used to recursively partition the extent, with local predictions aggregated from many base models. The method improves upon state-of-the-art techniques. Abstract L A I Data, Inference Analytics, and Learning Lab @ ND Reid A. Johnson Nitesh V. Chawla Computer Science and Engineering University of Notre Dame Recursively Partitioning the Geographic Space Using Decision Trees Species Distribution Modeling Quick Summary Full Detail
1
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2013_CSE-SRS_Poster

42

Methods

If the geographical space defined by a given leaf has enough utility, a model is run on the environmental and occurrence data within the specified space. Utility is defined as the ratio of observed species locations over the total geographic space.

Once all the base models have generated predictions, these predic-tions are aggregated, with each base model’s predictions being weighted by the relative size of the geographic extent used to train the model. Finally, each prediction is computed as the weighted average prediction of all base models at that location. Our work uses Maximum Entropy (Maxent) modeling as the base model.

+

+HDDT

OverallPredictions

Model TrainingPartitioned Data(Occurrences + Environment)

AggregatePredictions

GenerateBase Model

PartitionData

We propose a model that recursively partitions the geographic space into regions appropriately sized as input into local or “base” models, and then aggregates the weighted predictions of the base models as the final prediction.

We use a Hellinger Distance Decision Tree (HDDT) to recursively partition the space, with each leaf of the tree defining a particular geographic space. As a skew insensitive method, the HDDT model is effective even when the number of species observations is small.

Model

Together, these conditions can define the variables of the geographical space G. Species distribution models are correlative methods that estimate the area with suitable abiotic conditions for species, known as GA, based upon observed locations. Often neglected, however, is the implicit effect of the size of the geographic space itself, which, if too large, is likely to produce a drastic imbalance in the number of observed locations versus unobserved ones and, if too small, may represent only a fraction of the species’ suitable condi-tions. Our work focuses on reducing these effects by ensuring that input data comes from a reasonable geographic extent.

The geographical distribution of a species is defined by the confluence of three factors: biotic conditions, abiotic condi-tions, and movement conditions. Each is elaborated upon in the following diagram.

Background

Species: Vireo belliiData Source: GBIF

The method outperforms the base models by all metrics measured, including AUROC and AUPR. The predictions also display visibly greater fine-grained detail. Shown below is a species distribution prediction generated by the partitioning method for the North American songbird Bell’s Vireo.

Results

Forming knowledge of the potential distri-butions of species is important for the con-tinued development of conservation strate-gies. One method to assist in this process is species distribution modeling, which is the modeling of species’ niche requirements by combining occurrence data with ecological and environmental variables. We develop a method of robustifying species models by partitioning the environmental extent area, which can vary significantly. Decision trees are used to recursively partition the extent, with local predictions aggregated from many base models. The method improves upon state-of-the-art techniques.

Abstract

LA

I

Data, Inference Analytics,and Learning Lab @ ND

Reid A. Johnson Nitesh V. ChawlaComputer Science and Engineering

University of Notre Dame

Recursively Partitioning the Geographic Space Using Decision TreesSpecies Distribution Modeling

Quick Summary

Full Detail