Application of rule induction techniques for detecting the possible impact of endocrine disruptors on the North Sea ecosystems Tim Verslycke 1 , Peter Goethals 1,2 , Gert Vandenbergh 1 , Karen Callebaut 3 & Colin Janssen 1 1 Laboratory of Environmental Toxicology and Aquatic Ecology, Ghent University 2 Institute for Forestry and Game Management 3 Ecolas n.v.
23
Embed
Application of rule induction techniques for detecting the possible impact of endocrine disruptors on the North Sea ecosystems Tim Verslycke 1, Peter Goethals.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Application of rule induction techniques for detecting the possible
impact of endocrine disruptors on the North Sea ecosystems
Tim Verslycke1, Peter Goethals1,2, Gert Vandenbergh1, Karen Callebaut3 & Colin Janssen1
1 Laboratory of Environmental Toxicology and Aquatic Ecology, Ghent University2 Institute for Forestry and Game Management3 Ecolas n.v.
2598 240 26 mammalian Human MCF-7 cells In vitro Laboratory 6 days 10 µMTechnical grade; E-screen
ChemID
ChemNameNl CASChemForm
Molweight
BP MP Pressure SolubilityLogKow
Phase
240 DDT 50-29-3 C14H9Cl5 354,49 260°C 108°C 1,9E-7 mm Hg at 20°C 3,1-3,4 µg/l 6,19 Solid
Tabel: Endocrine
Tabel: Chemicals
RefID Authors Year Source
26 Soto, A.M., Chung, K.L., Sonnenschein, C. 1994 Environ. Health Perspect., 102:380-383
Tabel: References
Relational database
Rule induction techniques
Data mining (analysis) techniques:
1) Clustering methods (which data are related or ‘similar’)e.g. cluster analysis
2) Classification methods (how are variables related, merely using classes (numerical or not) = rules amongst variables)e.g. decision trees
3) Regression methods (quantitative description of the relation between two variables)e.g. multivariate regression
A
A
B
B
A
B
Rule induction techniques
Classification and decision trees: induction of rules from datasets
• which variables are relatede.g. which variables are mainly related to endocrine disruptive effects in animals
• how are variables related (quantitative rules making use of treshold values or classes)e.g. when hormone concentration higher than value A, then estrogenic effects of type X will occur
Rule induction techniques
WEKA data mining software: DOS command window but also Visual JAVA interface
Induced rule set
Rule set performance indicators
Applications on ED-North database
Example on crustacean data
1) Prediction of endocrine disruptive effects based on
physical/chemical properties of chemicals
2) Prediction of estrogenic effect of chemicals to the
crustaceans in the database
3) Which factors (flow, concentration, duration, ...) affect this
estrogenicity
1) Which molecular characteristics are related to estrogenic effects
This exercice on the ED North data base illustrated that data mining can help to find relations between:
Type of organisms
Test and environmental
conditions
Estrogenic effects
Compounds and their structure
General discussion
Data mining helps to find errors and outliers in the data set, and creates insights to improve further data collection and the development of databases
Interaction between data miners and domain experts (ecologist, ecotoxicologist) very important:
1) easily find ‘reliable nonsense’ rules by excluding important variables during the analysis (need for expertise of ecotoxicologist)
2) the parameter settings and the insight in tuning them have a very important impact on the richness of the outcome of the data mining exercice (need for data mining expertise)
General discussion
The collected data set itself influences to an important extend the outcome of the analysis:
1) importance of collecting data that cover the whole range (variables and their values/classes) and stratification of the instances is necessary
2) Selection of variable-classes can affect the results to a high extend (e.g. larval-adult problem, amount of effect-classes, ...)
Conclusions
Data mining allows to find which gaps exist in the database and delivers information for sustainable data collection and management
Data mining delivers insight in the dataset: generation of knowledge from data
Highly impredictable parts in the dataset are useful to focus further research on
General reliable rules are promising for decision support in environmental management
Important to be aware of exploring correlations instead of causal relations! Control by experts or further research (validation) is always necessary