Improving the correlation hunting in a large quantity of SOM component planes Classification of agro-ecological variables related with productivity in the sugar cane culture. Miguel BARRETO Andrés Pérez-Uribe MINISTERIO DE AGRICULTURA Y DESARROLLO RURAL asocaña
22
Embed
Improving the correlation hunting in a largequantity of SOM component planes
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Improving the correlation hunting in a largequantity of SOM component planes
Classification of agro-ecological variables related withproductivity in the sugar cane culture.
A Self-organizing maps (SOMs) can be seen as a data visualization technique that reduces the dimensionality of data through the use of a self-organizing clustering algorithm.
The problem that data visualization attempts to solve is that humans cannot visualize high dimensional data.
These techniques can be used to improve the understanding of high dimensional data by visualizing information in a low dimensional space.
A SOM presents high dimensional data in a low dimensional space by placing points that are close in the high dimensional space, close in the low dimensional space.
From a computational point of view, the self-organizing model is both a projection method which maps high-dimensional data space into low-dimensional space (reduction of dimensionality), and a clustering method, so that similar data samples tend to be mapped to nearby neurons.
Component planes
V1 TMAS P1AS Ra1AS
Vector 1
Vector 2
Vector n
To improve the analysis of the relationships between variables and/or their influence on the outputs of the system, it is possible to slice the Self-organizing maps in order to visualize their so-called component planes
Example: Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study (Junbai Wang et al, 2002)
a) 42 DLBCL samples, the color scale of SOM red indicates high expression, blue indicates low expression. b) The cluster numbers resprent gruop of genes contained.
Correlation hunting
The task of organizing similar components planes in order to find correlating components is called correlation hunting.
Correlation huntingThe expression correlation does not include just linear correlations, butalso nonlinear and local or partial correlations between variables
Correlation hunting
However, when the number of components is large it is difficult to determine which planes are similar to each other.
Correlation huntingA new SOM can be used to reorganize the component planes in order to perform the correlation hunting. The main idea is to place correlated components close to each other.
Correlation hunting
An advantage of using a SOM for component plane projection is that the placements of the component planes can be shown on a regular grid. In addition, an ordered presentation of similar components is automatically generated. A disadvantage is that the choice of grouping variables is left to the user.
More component planes …
Heart disease
279 component planes
This database contains 13 attributes (which have been extracted froma larger set of 75)
Clustering of SOM component planes based on the SOM distance matrix
The U-matrix had been used as an effective cluster distance function. The U-matrix visualizes distances between each map unit and its neighbors, thus it is possible to visualize the SOM cluster structure.
Use the Vellido’s algorithm to partition the map
The Vellido’s algorithm is used to obtain different partitioning levels of the clustering of the SOM. The Vellido’s algorithm provides a partitioning of the map into a set of base clusters. The number of clusters is equal to the number of local minima on the U-matrix; allowing different levels of clustering.
The agricultural productivity of a geographic area depends on many agro-ecological variables like soil and terrain characteristics, climaticconstraints, human behavior and management.
Soil
Management
Climate
Genotype
Productivity
A new approach
1358 experiments
Sowing Growing Harvest
SoilManagementClimate Genotype
Each agroecological event is unique in time and space, but it is possible to find similar characteristics between events that allow finding similar behaviors permitting to discover why and how the agroecological variables affect the crop development and therefore the agricultural productivity.
The variables
Climate variables. Continuous data.
Average Temperature (TempAvg), / After seed (AS) / Before Harvest (BH) Average Relative Humidity (RHAvg) / After seed (AS) / Before Harvest (BH) Radiation (Rad) / After seed (AS) / Before Harvest (BH) Precipitation (Prec) / After seed (AS) / Before Harvest (BH)Soil variables. Order (Ord) / 3 Orders (Ord1, Ord2, Ord3) Nominal Data Texture (Tex) / Ordinal Data Deep (Dee)/ Ordinal DataTopographic variables. Landscape (Ls) / 3 Landscapes (Ls1, Ls2, Ls3) Nominal Data Slope (Sl). / Ordinal DataOther variables. Water Balance (WB) Ordinal Data Variety (Var) / 3 varieties (V1, V2, V3) Nominal DataProductionTotal 54
BMUs of the component planes: productivity, radiation 1 month before harvest(Ra1BH) and radiation 1 month after seed (Ra1AS).
Conclusions
Visualization of agroecological variables is very important but difficult due to the high dimensionality of the data. The SOM algorithm is a powerful technique able to deal with this problem, but it is used as an exploratory analysis.
In this study is presented a methodology to enhance the component planes analysis process. This methodology improves the correlation hunting in the component planes with a tree-structured clusters representation based on the SOM distance matrix.
By analyzing the obtained groups of agro-ecological variables and cultivated zones, it was possible, as an example of the application of the methodology, to find a relationship between the radiation after seed, before harvest, and a high-medium productivity.
We are currently looking forward to develop data mining and visualization techniques in order to improve the decision support in the sugar cane culture based on the aforementioned methodology.