Cyberinfrastructure Research at Virginia Tech, July 2004 ...

IIGNEOUSGNEOUS R ROCKSOCKS, T, TERRANESERRANES ANDAND C CRUSTALRUSTAL EEVOLUTIONVOLUTION: :

SSOLIDOLID E EARTHARTH S SCIENCECIENCE C CYBERINFRASTRUCTUREYBERINFRASTRUCTURE R RESEARCHESEARCH ATAT VVIRGINIAIRGINIA T TECHECH

PPROJECTROJECT P PLANLAN

http://www.vt.edu/

IIGNEOUSGNEOUS R ROCKSOCKS, T, TERRANESERRANES ANDAND C CRUSTALRUSTAL EEVOLUTIONVOLUTION: :

SSOLIDOLID E EARTHARTH S SCIENCECIENCE CCYBERINFRASTRUCTUREYBERINFRASTRUCTURE R RESEARCHESEARCH ATAT VVIRGINIAIRGINIA T TECHECH

PPROJECTROJECT P PLANLAN

JJULYULY 2004 2004

Contact: A.K. SinhaDepartment of Geoscience

Virginia TechBlacksburg, VA 24061E-mail: [email protected]

2

mailto:[email protected]

Report available at http://pitlab.geol.vt.edu

GEON Research Working Group at Virginia Tech GEON Research Working Group at Virginia Tech

Principal InvestigatorPrincipal Investigatoro A.K. Sinha …………………………… Department of Geosciences, Virginia Tech

Other ParticipantsOther Participants

o Cindy Stover………………………….. Department of Geosciences, Virginia Tech

Graduate Student ParticipantsGraduate Student Participants

o Ari Mitra………………………………. Department of Geosciences, Virginia Tech

o Alex Zendel…………………………… Department of Geography, Virginia Tech

o Amine Chigani………………………... Department of Computer Science, Virginia Tech

o Jihane Najdi…………………………… Department of Computer Science, Virginia Tech

o Matt Phillip……………………………. Department of Computer Science, Virginia Tech

o Raghavendra Nyamagoudar…………… Department of Computer Science, Virginia Tech

o Satish Tedapelli………………………... Department of Computer Science, Virginia Tech

Undergraduate ParticipantsUndergraduate Participants

o Andrew Owusu-Asiedu……………….. Department of Business Info Tech, Virginia Tech

o Ryan Walker…………………………... Department of Geosciences, Virginia Tech

o Keely Larson………………………….. Department of Geosciences, Virginia Tech

o

All Other Past ParticipantsAll Other Past Participants

o Boyan Brodaric……………………….. Geological Survey of Canada

o Murray Journay……………………….. Geological Survey of Canada

o Calvin Barnes…………………………. Texas Tech University

o Art Snoke……………………………… University of Wyoming

o Clinton Smyth…………………………. GeoReference Inc., Vancouver

o Naren Ramkrishnan……………………. Department of Computer Science, Virginia Tech

o

And the research team at the San Diego Super Computer CenterAnd the research team at the San Diego Super Computer Center

3

CONTENTSCONTENTS

Project Summary 5GEON Research at Virginia Tech 6General Workflow of Rock Explorer 7

1. What is Workflow? 72. Introduction to Rock Explorer 83. General Workflow of Rock Explorer 84. Component Specifications 85. Product Prospective 10

The Igneous Rock Database 121. Introduction and General Description 122. Storing data that describe geologic bodies and sample locations 123. Handling Metadata 13

4. Handling heterogeneous data sources 135. Digital Images of Samples 136. Connectivity of this database to the overall GEON system 137. Reference Dataset 14

Ontology & Earth Sciences 141. What is Ontology 142. The Need for and Use of Ontologies 143. Technical Aspects of Ontology Development Section 15

Tool Box 171. Introduction 172. Implementation in Java 173. Tool Categories 17

A. Calculation 17a. Example: Norm Calculation 17b. Example: Mineral Formula Calculation 17

B. Classification 18a. Example: Use of SVGs and Igneous Rocks 18

C. Modeling 19a. Partial Melting Modeling 19

4. Application and Use 195. Classification Diagrams and Tools for Igneous Rocks 20

Data Mining in Earth Sciences 201. What is Data Mining 202. Components of Data Mining Algorithms 203. How is data mining different from statistical analysis 214. Ontology driven data mining research in solid earth science at Virginia Tech 21

Information Integration Scenario 221. A-Type Integration Scenario 22

Summary Statement 23References 24Rock Classification Diagrams Reference 25

4

Section I. Project SummarySection I. Project Summary

The GEON (GEOscience Network) research project is responding to the pressing need in the geosciences to interlink and share multidisciplinary data sets to understand the complex dynamics of Earth systems. To rise to this challenge, we have formed a coalition of IT researchers, representing key technology areas, and Earth Science researchers, representing a broad cross-section of Earth Science sub-disciplines. The need to manage the vast amounts of Earth science data was recognized through NSF-sponsored meetings, which gave birth to the Geoinformatics initiative. The creation of GEON will provide the critical initial infrastructure necessary to facilitate Geoinformatics and other research initiatives, such as EarthScope.

Creating the GEON cyberinfrastructure to integrate, analyze, and model 4D data poses fundamental IT research challenges due to the extreme heterogeneity of geoscience data formats, storage and computing systems and, most importantly, the ubiquity of "hidden semantics" and differing conventions, terminologies, and ontological frameworks across disciplines. GEON IT research focuses on modeling, indexing, semantic mediation, and visualization of multi-scale 4D data, and creation of a prototype GEONgrid, to provide the geoscience community an IT head start in facing the research challenges posed by understanding the complex dynamics of Earth systems. An important contribution will be embarking on the definition of a Unified Geosciences Language System (UGLS), to enable semantic interoperability. The GEONgrid leverages experience gained in the National Partnership for Advanced Computational Infrastructure (NPACI) program, and the TeraGrid Distributed Terascale Facility. We will create a portal to provide access to the GEON environment, which will include advanced query interfaces to distributed, semantically-integrated databases, Web-enabled access to shared tools, and seamless access to distributed

computational, storage, and visualization resources and data archives.

Two testbed regions, the mid-Atlantic and the Rocky Mountains, have been identified to define the GEON geoscience challenges, though the system will be able to accommodate national and international research activities. These testbed regions were selected due to the variety of geological issues embodied within them requiring interlinking of multiple disciplinary databases and also because they are areas of expertise for the GEON geoscience research team. The results of GEON research will significantly impact large multi-scale geoscience research programs such as Earthscope, as well as individuals and smaller groups of researchers, thereby leading to an intellectual transformation of the entire science. Recognizing this potential, the U.S. Geological Survey has joined as a major partner and has made creation of key GEON databases a priority effort over the next several years. Via DLESE, GEON will become an important resource for sharing knowledge about the Earth for a variety of audiences, including K-12 students and teachers.

Many disciplinary geoscience database projects are already underway, indicating the readiness of the community to participate in such a national-scale effort. As NSF and other agencies begin to invest in these databases creation efforts, a need for a cyberinfrastructure that will enable integration of these databases becomes imminent. Various GEON-like grid efforts, such as GriPhyN, NEESGrid, and BIRN, have all indicated the readiness of the IT community to provide the necessary interoperable infrastructure, and testify to the value of integration of IT with major science and education initiatives. It is now the opportune moment to start the GEON program, to herald the geosciences into the era of Geoinformatics and accelerate geoscience research in a timely manner. In sum, GEON is an IT-based Geoscience revolution that will play a critical role in a more holistic understanding of the dynamics of Earth systems. It will also create new scientific paradigms and renew the

5

excitement in the community of the post-plate tectonics era.

Section II. GEON Research at Section II. GEON Research at Virginia TechVirginia Tech

Studies of magmatic processes in geoscience research provide fundamental information on the physical and chemical evolution of continents through time. Diverse tectonic processes of collision, extension, and transpression leave a rich geologic record saturated with igneous rocks. Plutons often provide our only direct monitor of the thermal budget of a region, and they also yield information about uplift histories and paleogeophysical properties of rocks through time. The utilization of such critical existing knowledge and information in the geosciences to conduct comprehensive interdisciplinary research is always hampered by the lack of organizational structures, despite such data being already available in the literature. In our ongoing research we will construct a geospatially-referenced information system on plutons, develop web-based information access mechanisms, and collaborate with researchers conducting similar efforts. With these collaborative efforts we will be able to help form a comprehensive geoscience information system for research and education purposes.

This research program envisions constructing a field based database on plutons and their ages to discover new relationships between magmatism, crustal evolution and other disciplinary geologic records preserved in both space and time. It is very likely that future research on integrating correlations between evolution of sedimentary basins, extensional/compressive tectonism to metamorphism, and the geophysical signatures of such varied processes is only possible through recognition of the thermal budget of the crust/mantle through both space and time. It must also be emphasized that our understanding of near surface processes is also closely linked to information of the thermal interplay between deep crust and mantle. Therefore all geologic data that contributes to this connection of

“Planetary Energetics and Dynamics “(NSF Geosciences Beyond 2000) must be at the forefront of databases designed to study dynamic earth processes. The linkages between crustal instability enhanced by thermal perturbations as recorded by plutons is often preserved in exposed roots of mountain belts, and the proposed database constructed on igneous rocks (plutonic rocks) through time within the Appalachian orogen will not only yield correlations between other geologic processes (e.g. style of thrusting, basin development, cooling rates, paleogeothermal gradients, tectonic setting) but also provide a firm basis for a national database leading to a digital earth model.

Figure 2.1: Geologic map of the central Appalachian orogen (compiled from State Geological Surveys)

The Appalachian Orogen is a continental scale mountain belt that provides a geologic template to examine the growth and breakup of continents through plate tectonic processes. The record spans a period in excess of 1000 million years, and preserves the only known example in the world of a completely preserved one and a half Wilson Cycle (the opening and closing of oceans). Complex assembly of plates through collision can be recognized in the rock record of the mid-Atlantic Appalachian orogen through time. It constitutes a great scientific challenge for earth scientists to readily and clearly identify data for the critical separation of overprinted processes involved in the creation of mountain belts. The Paleozoic Era in the mid –Atlantic region provides evidence of multiple collisional events (Taconic, Acadian and Alleghanian), the

6

cause and effects of which are the subject of numerous leading research activities.

Figure 2.2: Regional distribution of igneous rocks in the central Appalachian orogen compiled by Virginia Tech. The extensive distribution of igneous rocks with associated databases facilitates discovery of new hypotheses for the origin of terranes and accretionary orogens.

In order to develop an IT based understanding of the cause and effects of geological processes associated with crustal evolution, our research in GEON will focus most of its resources towards developing an integrated view of crustal evolutionary processes represented in the Appalachian orogen.

Some Key Scientific Questions in the Appalachian orogen which are likely to utilize information from the igneous rock record: How do recognizable events (deformation,

metamorphism, magmatism, sedimentation) relate to one another during collision, during extension, and during exhumation? How do we link or relate such events in the broader plate tectonic picture?

How do we identify terranes that may be involved in multiple orogenies and ascertain their role in orogeny?

What are the geologic scenarios for thin skinned terrane accretion?

What is the relationship between rheology and deformation at all crustal levels? (a research recommendation of the NSF sponsored workshop report on New

Departures in Structural geology and Tectonics (2003)

What is the relationship between tectonic settings and paleo-geothermal gradients?

In summary, the our research activities would provide the community an opportunity to address scientifically unique questions that will arise through integrating the proposed database with other disciplinary databases (specifically geochronology, metamorphism, experimental petrology, structural geology, stratigraphy, magnetic, gravity, seismic, geothermal) as a function of space and time, thus leading eventually to a 4-D visualization of the thermal – mechanical evolution of the continental crust (of significance to the EARTHSCOPE initiative), and most importantly, providing the next generation of geoscientists a valuable educational and research tool.

Figure 2.3 shows the plan to systematically build the cyberinfrastructure, including both object and process ontologies, to facilitate a more robust integrated understanding of crustal evolution.

Figure 2.3: Progressive development of the cyberinfrastructure with emphasis on igneous rocks, time and crustal evolution.

Section III. General Workflow of Section III. General Workflow of Rock ExplorerRock Explorer

1. What is Workflow?In this part of the paper, we use the term

workflow to describe a series of structured activities and computations (Munindar P. Singh,

7

Mladen A. Vouk) that arise in designing software applications. However, workflows (especially, scientific workflows) have a broader focus. Many scientists and engineers utilize these workflows to design, execute, monitor, and communicate their analytical procedures with minimal effort. These workflows provide necessary abstractions that enable the effective communication between domain agents and IT expertise.

2. Introduction to Rock ExplorerRock Explorer is a prototype web-based

portal that is being designed by Virginia Tech to allow the user to explore national and international databases, tools and ontologies for different types of rocks: igneous, metamorphic and sedimentary. This portal will also permit access to information about rocks from the two

GEON testbeds (Mid Atlantic and Rocky Mountains) as well as other parts of North America. Figure 3.1 is a proposed web page layout of the portal page for Rock Explorer.

Figure 3.1: A proposed web layout of the portal page

In figure 3.2 we show our proposed web page interface to the igneous rocks portlet. This page provides access to databases, ontologies, and data analysis tools related to igneous rocks. Icon based access to regional igneous rock database (i.e. Mid Atlantic Region) as well as to other reference databases will be possible. This interface will also host access to animations that show the distribution of igneous rocks as a function of time and other attributes. We envision this activity to provide the end user a geospatial map (with zooming capabilities)

where connectivity to databases can be established by a clicking on a polygon. For example, information regarding name, age, records of rock and mineral analyses as well as images contained in the database can be highlighted through the use of a pop-up box.

The relationship of the database content tools and services are represented in a high level workflow diagram (figure 3.3). This diagram shows the various research activities being conducted at Virginia Tech.

3. General Workflow of Rock ExplorerThe workflow (figure 3.3) highlights the

different components of Rock Explorer. Each component represents a research area that Virginia Tech’s team is serving within the GEON project. There are four main nodes to this prototype web application: Databases, ontologies, tools, and data mining. In addition, this tool presents a prototype solution to an integration scenario that shows how geoscientists can use ontologically based queries to extract information from databases, and use their findings to develop new knowledge.

4. Component Specifications First, databases section of Rock Explorer allows access to databases related to igneous rocks. This section directly accesses Virginia Tech’s igneous rock data warehouse. In addition, it provides links to reference rock databases available on the web. The data accessed is, then, available to be downloaded, analyzed, or processed inside the tool box section (Figure 3.4).

Second, Rock Explorer makes many levels of igneous rock ontologies that have been developed by Virginia Tech available for analysis and modification, if required, to make integration more robust. For readability purposes, these ontologies are represented in different formats; some are class diagrams, others are tree-structures. Similar to raw data sets, these ontologies are available to the community for modifications or additions of new ontologies. It is anticipated that the San Diego Supercomputer Center will be the guardian of these ontologies as they are

GEON Rock ExplorerHome Igneous

rocksMetamorphic

RocksSedimentary

RocksTestBeds

Goal Statement:

8

Access, Analysis, Visualization,and Modeling of Rock Data

developed by the earth science community (Figure 3.5).

Figure 3.2: A proposed web interface of the igneous rocks portlet

9

Figure 3.3: Workflow diagram that highlights the different components of Rock Explorer. Each component represents a research area being investigated at Virginia Tech

The third component of the workflow represents the tool box section. In this part of Rock Explorer, the user is able to access and apply different data analysis tools to their data. In figure 3.6, the Tool Selector actor facilitates choosing a tool based on three major categories: calculation, classification, or modeling. Under each of these categories, the user will find different analysis tools; several of which have been already developed and made available by Virginia Tech. More extensive visualization tools will be available through the GEON portal. Others can be readily implemented and added as this prototype application evolves, and emerges into a national cyber infrastructure project. Based on the tools selected, the Output Selector actor decides the output formats based on the results of the selected tool and/or the user format preferences (discussed later in tool box section).

Figure 3.6: Workflow diagram showing types of data analysis tools developed at Virginia Tech and available for web based application. Some classification related activities have already been implemented by the research team at SDSC

For demonstration purposes, figure 3.5a and 3.5b (from Efrat Jaeger, San Diego Supercomputer Center) illustrate the workflow for modal classification which utilizes mineral abundances to assign a rock its name. Figure 3.5a shows that the mineral abundance data for a given ssID is extracted from the modalData

10

database and is sent to the classifier along with the appropriate diagram. The result is a name for the rock, which can be utilized for additional queries.

Figure 3.7a: Workflow diagram of modal classification for naming igneous rocks

Inside the classifier (Figure 3.5b), the modal classification is divided into finer descriptive levels. At each level, a diagram is chosen according to the region of the classification diagram from the previous level, and the (new) classification diagram yields a sub-type category name for the rock

Figure 3.7b: Workflow representation of components inside Classifier actor in figure 3.5a

5. Product Prospective

The Rock Explorer workflow represents our vision of how a geoscientist would like to use these data analysis tools in a web based environment. It also serves to provide a platform for collaborative research with San Diego Supercomputer Center team (SDSC). Our intention is to provide a prototype which will be utilized by SDSC to develop the cyber infrastructure for the earth sciences.

GEON Rock Explorer – Page 3

Reference Databases: Click on the links below to go to reference databases.Virginia Tech Igneous Rock Database

GEOROC DatabaseOther Databases

Mid Atlantic Region Database Files: Click on a file link to access/download the file. All files are in Microsoft Access format.

WholeRock_GeoChemistry TableMinerals TableIsotope and RadiometricModalData TablePublisheAgesandInitial

Texture Table Fracture Table Fabric Table Inclusions Table

Figure 3.4: Shows a draft of the web page layout of the databases section of Rock Explorer

11

Home Igneous Rocks Metamorphic Rocks Sedimentary Rocks GEON TestBedsMid AtlanticRocky MountainOther Regions

Figure 3.5: Workflow diagram that shows how ontologies related to igneous rocks will be accessed

This part of the section describes the properties of the tools used to design the workflow: Ptolemy II: Most of the diagrams in this section were created using the Ptolemy II Version 4.0-Beta software. This software framework was developed as part of the Ptolemy project at University of California at Berkeley (Electrical Engineering & Computer Science). It is a Java-based-component assembly framework with a graphical user interface called Vergil. The Ptolemy project studies modeling, simulation, and design of concurrent, real-time, embedded systems. The project is named after Claudius Ptolemaeus, the second century Greek astronomer, mathematician, and geographer. For more information refer to the Ptolemy Project web page.

Kepler: Kepler is a unique system that combines high-level workflow design with execution and runtime interaction, access to local and remote data, and local and remote service invocation. The development of Kepler was based on the dataflow-oriented Ptolemy II system. It inherits many advanced features from Ptolemy, and numerous extensions and features have been added recently for supporting the scientific workflows. Kepler is a collaboration between computer and domain scientists with SEEK project (http://seek.ecoinformatics.org), the GEON project (http://www.geongrid.org), the Ptolemy II software project (http://ptolemy.eecs.berkeley.edu), and the SDM Center (http://sdm.lbl.gov/sdmcenter). It is a cross-project, open source activity, with an

12

http://sdm.lbl.gov/sdmcenter

http://ptolemy.eecs.berkeley.edu/

http://www.geongrid.org/

http://seek.ecoinformatics.org/

http://ptolemy.eecs.berkeley.edu/#in_browser

http://ptolemy.eecs.berkeley.edu/#in_browser

active and growing community of developers and users.

Section IV. The Igneous Rock Section IV. The Igneous Rock DatabaseDatabase

1. Introduction and General DescriptionsThis database schema was designed utilizing

concept maps for igneous rocks and attributes associated with intrusive igneous bodies (plutons). Based on the attributes, we have developed a spatially enabled database schema that organizes data describing the geochemistry, mineralogy and radiometrics of igneous rock samples, and the geological bodies, enclaves, fractures, fabrics, textures and inclusions to which they are geologically and geographically related. Our geodatabase system also accesses images of geologic samples taken at varying scales. Hand samples, thin sections and individual images are linked to spatially documented samples and provide the end user a means to explore igneous rocks from outcrop to microscopic level.

2. Storing Data that Describe Geologic Bodies and Sample Locations

The primary function of this database is to store data that describes geological bodies and samples acquired in the field. Hence, the “GeologicBodyAndLocation” and “Sample” tables are central to the database and provide a unique identifier for every geologic entity. Using a geologic body’s “BodyID”, tabular data describing its geometry and enclaves can be accessed. This BodyID also links to polygon GIS layers mapped at three scales ranging from 1:500,000 to 1:24,000. The “PublishedAgesAndInitial” table, which stores data regarding the age of bodies as derived by geologist using a variety of methods, also contains a BodyID. Through this link, the numerical age of a body can be displayed on a GIS layer as annotation, or the symbology of the GIS layer may be based on these ages, (e.g. older igneous bodies could be displayed with darker shades of blue and younger bodies could be displayed with lighter shades).

Initially, every sample that is stored in this database is registered in the “Sample” table where it receives a unique identifier, its “ssID”. This ssID provides a direct link to data stored in over 16 other tables and 4 GIS feature classes. Geochemical data, extracted from both bulk rock and mineral analyses, are stored in the “WholeRock_Geochemistry” and “Minerals” tables, respectively. Depending on the particular isotope that was analyzed, data generated from isotope and radiometric analyses are stored in 1 of 6 radiometric tables shown in the bottom right corner of figure 4.2. The ssID also serves as the portal to data concerning a sample’s modal data, texture, fabric, fractures, fluid inclusions, and melt inclusions. The samples are georeferenced in the “SamplePoints” GIS feature class. Hence, the interconnectedness of the ssID will allow all of the data in these tables to be displayed geographically. For example, a middleware tool such as Isoplot may read data from one of the 6 radiometric tables and compute numeric ages. The output of this computation may be stored in the “PublishedAgesAndInitial” table and then displayed geographically via the web-based GEON GIS map viewer. Thus, this database provides a means for geologists to explore the geotemporal nature of their data, as well as the data provided by other GEON participants.

3. Handling MetadataThis database schema also contains tables

that house metadata regarding geologic bodies and samples extracted from them. The “References” table contains information about the source of the data stored in other tables; all of which are directly linked to most of the other tables via a foreign key. This information includes the authors, article title, year of publication and the journal in which it is found. The “AnalyticalMethods” table describes the methods used to extract the data that are housed in other tables. For example, the geochemical data contained within the whole-rock geochemistry table may have been obtained using ICPMS, Electron Microprobe (EMP) or multiple methods. The methods used in each individual analysis (record) are documented via a foreign key that is directly linked to the “AnalyticalMethods” table.

13

4. Handling Heterogeneous Data SourcesTo allow geoscientists to easily share their

data that may be stored in a variety of database systems with heterogeneous and inconsistent table structures, data conversion utilities must be developed so that a geologist’s data can be successfully appended to the geodatabase discussed here. Once these data are uploaded into the eventual GEON implementation of this georelational database, the scientist can explore his/her data via the many GEON tools, such as rock classifiers and statistical analyses. More importantly, this ontologically integrated geodatabase will enable the scientist to compare and contrast these uploaded data to data that were previously shared by others in the GEON community.

Figure 4.1: Demonstration of the spatial-temporal capabilities in the Virginia Tech Igneous Rock Database

5. Digital Images of SamplesThis geodatabase system also accesses

images of geological samples captured at varying scales. Thin section images can be shown in their entirety as viewed with the naked eye as well as microscopic images of mineral crystals. Because these images are linked to georeferenced samples, these images coupled with this database structure provide geologists with a means to explore the earth from global to microscopic scales.

6. Connectivity of this Database to the Overall GEON System

Finally, this database will be interconnected to the Igneous Rock Ontology, which is discussed in a more detail in the following section of this paper. This interconnectedness will allow geologist to access data residing in this database using ontologically-driven, text-based query. For example, if a geoscientist wishes to examine the isotope data, the ontology system can direct the request to the appropriate tables in the database; in this example these tables would be 6 radiomentric tables in Figure 4.1 and the “PublishedAgesAndInitial” table. The ontology system will also serve as connection between this database and the GEON workflow and toolbox. The classifier described in section III of this paper must access the correct table in the database and then obtain data

14

Figure 4.2: Igneous Rock Database Schema utilized by Virginia Tech researchers

from the appropriate fields within it. For the integration scenario discussed in section II to function correctly, the ontology must be able to direct the classifier to the ModalData table and extract values from the geochemical fields that are necessary to plot the data on the appropriate classification diagram.

7. Reference DatasetA reference dataset for granites is being compiled to augment the existing database of igneous rocks as part of providing these data through a web interface for the scientific community. This dataset has been compiled in Microsoft Access protocol following the igneous rock database schema (developed by the Virginia Tech GEON team) from published literature and from contributions primarily from J. B. Whalen of the Geological Survey of Canada. This reference dataset presently consists of over 430 records of whole rock geochemical (major and trace elements) analysis from different type localities of the world. Our

database on granites utilizes their genesis/source for subdivision into four main types: A, S, I and M. This letter classification is widely accepted by the scientific community and is used to represent granites from various geologic environments.

The examination of these granites through various classification schemes (e.g., Whalen, et al., 1987, Chappell & White, 2001, and Eby, 1990) help geologists in understanding fundamental rock forming processes. The chemical parameter in these classification schemes are indicative of the tectonic environment (Pitcher, 1983) as well as constrain their genesis from various sources e.g. mantle or crust. This in turn helps us develop our understanding of crustal evolution Availability of such reference databases provides a needed resource to evaluate deposits of economic minerals like tin, copper, tungsten and molybdenum as they are commonly associated with certain types of granites.

15

The user will have the option of accessing the Virginia Tech Reference Database as well as the existing databases like GEOROC, USGS (Pluto database), PetDB, etc. directly through the web and thereby enhancing web querying capabilities and easing the process of comparing data from different datasets. This reference dataset on granites is also geared to help achieve the integration scenario discussed in Section VIII.

Section V. Ontology & Earth Section V. Ontology & Earth SciencesSciences

1. What is Ontology?In order for an agent to ask queries and make

statements about a subject domain, a conceptualization of that domain needs to be described. Ontologies, which are explicit specifications of domain conceptualizations (Gruber and Thomas, 1993), describe the entities in a subject domain, relations among them, as well as the processes and functions that apply to them (Farquhar, Fikes, Pratt, and Rice, 1995). One of the most important goals in developing ontologies is sharing common understanding of the structure of information among people or software agents (Noy and McGuinness, 2001). Other important reasons include: “enabling reuse of a specific domain knowledge in other domains, making domain assumptions explicit so that they can be changed as knowledge about the domain changes, separating domain knowledge from operational knowledge, and analyzing domain knowledge” (Noy and McGuinness, 2001).

2. The Need for and Use of OntologiesIn earth sciences, data sources constitute not

only of databases, but also analysis tools (section VI) used to analyze the information contained over these databases. Ontologies help to manage the interoperation between these different resources (Goble, Stevens, Ng, Bechhofer, Paton, Baker, Peim, and Brass, 2001). We emphasize the need to develop ontologies at multiple levels of granularity to facilitate exploration of data sources using an ontologically- structured approach.

Ontologies are also used as a computational path way for answering queries. Given a query, we can search for a set of appropriate paths in the ontologies and retrieve information from thetarget resources corresponding to the found path (Lu and Hsu, 2003). For instance, the integration scenario described in section II, and which will be discussed in detail in section VIII, will implement the igneous rock ontologies to answer geological queries. Therefore, this method makes details of querying web resources clear to the geologist, and allow them to seek and utilize geological resources on the web more efficiently. We also recognize that schema-based queries of databases access only the object view, but a knowledge base requires the application of both process and object ontologies.

3. Technical aspects of Ontology Development In order to use our ontologies in a web-based

system environment, we have developed these ontologies using the Web Ontology Language (OWL). OWL is a language for defining and initiating web ontologies (Smith, Welty, and McGuinness, 2004). OWL ontology might include description of classes, properties, instances of classes and relationships between these instances (Smith, Welty, and McGuinness, 2004). Classes describe concepts in the domain. Any class can also have subclasses which represents concepts that are more specific than the superclass (Noy and McGuinness, 2001).In figure 5.1, we show the class diagram of the high level conceptual model that exists between a concept Body and its constituent: rocks, structural features, etc. Specific measurements of attributes elements (single or multi-valued) taken in the field or laboratory are inter-related to each other through relationships and cardinality. For example, the class Solid is related to class Rock through a “consists of” relationship, as well as an explicit cardinality expression that one Solid body may consist of multiple types of rocks. Similarly, an instance of class Rock contains minerals and chemical elements.

The class diagram also presents the inheritance hierarchy that illustrates the class-subclass concept in relation to igneous rock ontology. An instance of the class Body

16

represents any substance in one of the two states: solid or liquid. Based on some properties associated with a Body, for example temperature and pressure, a body can be classifies to either a solid or a melt. Although our ultimate objective is to characterize igneous sources through complete solid and melt ontologies, our current research focuses on the solid igneous bodies. Class Igneous represents the concept of an igneous rock. Properties of this class represent the attributes of the concept. These properties could be instances of other classes. In our hierarchy, the Igneous class has two subclasses: Plutonic and Volcanic.

In figure5.1, every class represents a separate concept which can have its own ontology. A number of ontologies have already been developed; for instance, the concept of unit,

space, earth layer (SWEET, 2004). However, little research has been done to create the ontologies for other concepts that are equally important to earth sciences. Furthermore, the existing ontologies are very generic and need to be altered to communicate the need of geoscientists. Such ontologies are usually limited to a specific domain, and not adaptable for describing other domains.

We realize that these activities will yield prototype of ontologies, but they represent our understanding of the igneous rock domain. We anticipate community participation in exploring our ontologies, and providing feedback for the development of a more robust and general ontology for igneous rocks. With this approach, the fusion of all ontologies will result in a framework for the entire earth science.

Figure 5.1: Class diagram representing object view ontology for igneous rocks

17

Section VI. Tool BoxSection VI. Tool Box

1. IntroductionVirginia Tech is contributing the Tool Box, a

set of practical tools to the GEON project to be used over the internet by geoscientists. These tools are written in Java and will be accessible via a Java Server Page (JSP). The Tool Box consists of tools of three different categories: Calculation, Classification, and Modeling. Of the tools provided, the user can select the tool he/she wants to use, using Tool Selector. The user can also select the way he/she wants the output. The color selector provides choice of colors and symbol selector provides symbol choices. The output selector will facilitate the user to choose the output format through the color and symbol selectors. The user can also choose the scale to be used (log, semilog, etc) for the output.

2. Implementation in JavaIn order to utilize geological data it is often

useful to perform numerical computation on such data. Such numerical computations that deal with classification of rocks and minerals are being implemented.

The mineral data is stored in a Microsoft Access database. A Java class will be used to connect to the database via a JDBC connection. It will then read that data and perform the analysis. This analysis will be returned to the Java Script Page that called the Java class and will be displayed via a web page to the user.

3. Tool CategoriesA. Calculation Tools

Geologists use various computational tools to analyze and interpret the data. Methods for norm, radiometric age, temperature etc perform various calculations on geological data and the result is used in further analysis like classification and modeling. These methods are a part of Tool Box as calculation tools.

a. Example: Norm Calculation Norm is a calculated mineralogic

composition based on the conversion of a whole rock chemical analysis into the formulas of

common minerals. The oxides in the rock analysis are allocated, following a prescribed set of rules, to simple end- member formulas of the rock forming and common accessory minerals. The norm allows a chemical analysis of a rock to be recast in terms of the common minerals by which it can be classified. This is particularly useful for rocks which are fine-grained for gathering modal data (Philpotts, 1990).

The norm calculation is implemented in java. Some modifications (Hollocher, 2004) to this methodology have been incorporated. Our current implementation reads a tab delimited text file for sample data and calculates the weight norms for the data. These weight norms are outputted as a tab delimited text file. The norm calculations are then used for rock classification. The web-interface of norm will take in as input a set of sample rock data or an individual data in a specific format and output the norm calculations as a tabular format. The user can also upload a file of sample data through the web-interface to get the norm calculation for that data. The geologists can use this web-interface to analyze their data.

b. Example: Mineral Formula CalculationFeldspars, pyroxene, olivine, garnet, mica,

and epidote are common minerals found in most igneous rocks. We have developed calculation tools as part of a prototype for web-based analysis of mineralogic data.

The input for each one of these mineral tools is the mineral composition in weight percent. This data is used with the molecular proportion of oxides, atomic proportion of oxygen from each molecule, number of anions on the basis of accepted numbers of oxygen atoms to calculate number of ions for the mineral formula.

These tools will be accessible via a web interface and it will accept input data via a form on an HTML webpage or through an uploaded tab delimited file. Once the calculations are performed by our Java program the results will be outputted via JSP. With such an interface computations can be done without downloading and/or installing any tool. It also makes these tools platform independent.

The Virginia Tech igneous rock database includes mineral compositions and can be

18

accessed through this computation tool. For example, the data for the mineral olivine from the Baltimore mafic complex can be accessed through this tool to provide subclasses of mineral names within the olivine family.

The first step in acquiring the subclasses of a sample of olivine is to select the desired data from the database. As discussed in the data section of this paper, each sample of data is uniquely identified by an ssID value. After the desired data is selected as pictured in Figure 6.1 the data is plotted as outlined in Figure 6.2.

Figure 6.1: Table showing partial analysis for mineral olivine

As seen in fig 6.2 the samples that have been selected are classified into different olivine subclasses. The graphics for representing the subclasses are discuss in the following section.

Figure 6.2: SVG based graphics for discussing minerals of olivine group.

B. Classification Tools for Minerals and Rocks

The sample data can be classified by rock name, mineral, tectonic settings, rock affinity etc. The various methods to classify the data are included in the Tool Box as classification tools.

a. Example: Use of SVGs and Igneous RocksGEON research at Virginia Tech is aimed at

providing some of the computational tools and graphics for web based use. Current research has emphasized the coding of scalable vector graphics (SVG) for igneous rock/mineral classification. As SVG is a language for describing two-dimensional graphics in XML, it allows various types of graphics objects like text, vector graphic shapes (lines, curves, polygons etc.) to be implemented in a web environment.

The code for binary and ternary plots for the igneous rock/mineral classification is written in SVG and includes a header which specifies the dimension and orientation of the images. A few examples shown below include polygons within triangles where individual polygons represent a rock association.

Every image is drawn using a SVG header, which initializes the dimension on the triangle. The triangle is drawn first using the polygon function and is annotated at the vertices. Then the polygons inside the triangle are drawn. The rock types annotate the polygons. The SVG images show the name of the polygon on moving the mouse over that polygon. This was incorporated for better visualization of the images. These images will be used in the modal classification tool. A demonstration workflow of such a classification tool is shown in Figures 3.5a and 3.5b. In the modal classification tool, the PointInPolygon module plots the sample data over an appropriate SVG diagram and the Classifier module analyzes such diagrams and classifies the sample data. This classification procedure is done in multiple levels. At each level, a SVG diagram is chosen according to the region of the point(s)(to be classified) in the previous level and a new region for the point(s) is calculated according to the transition table, region of the point(s) and their mineral info

19

contained in this level’s diagram. Either the region is classified and given a rock name, or it leads to a different SVG diagram.

Figure 6.3 : Zr/4-Nb*2-Y

Figure 6.4 : Th-Hf/3-Ta

The SVG diagrams are classified into different classes: 1) Tectonic setting classifier for granites, 2). Element concentration rock name classifier, 3) Tectonic setting classifier for basalts, 4) Source classifier, 5) Modal QAPF rock name classifier, 6) Magma association classifier, 7) Magma type classifier.C. Modeling Toolsa. Partial Melting Modeling

The melting of source rocks is the process of partial melting and modeling is the numerical expression of that feature. By modeling the partial melting process we are able to get an insight into the source and ultimately discover the rock’s origin. (Figure 6.5)

The partial melting modeling tool accepts two matrices as input; a source rock data matrix, and a melting mode matrix. These matrices can be uploaded via a webpage, or read from a database. From these two matrices and a stored mineral values matrix, the source rock bulk distribution coefficient and the melting mode distribution coefficient is calculated. Then, from these two coefficient matrices we compute enrichment factor matrices for both the source rock data matrix and the melting mode matrix for any given value of F. Finally, chondrite-normalized values are calculated from the enrichment factors matrix.

The steps listed have been implemented in Java and are available for deydration melting, amphibolite melting, eclogite melting, granulite melting, hydrous graywacke melting, and tonalite melting.

4. Application and UseThere are various advantages of Tool Box.

Currently many of the tools are available but are part of different packages and the user has to download and or/install them to use them. Many of these tools are proprietary softwares and are expensive. Tool Box will have all the tools at one place, free of cost and will provide most of the functionalities of proprietary softwares. The source code for the tools will be available to all and they can download the source and modify it as per their requirements, can add additional features or make corrections in future. As Tool Box will have a web interface to it, the tools will be platform independent. The only software required will be a Java Plug-in for the web browser. Tool Box provides the advantage of ease of use. The geologists need not go through the learning curve for understanding the working of the software packages. They only need to set some parameters; pass the input (in required format) and the interface of Tool Box will take care of the rest by making the software transparent to the user. When combined with other components of GEON project like data mining, the Tool Box will help in interpretation of geological data towards answering key science questions through better visualization and analysis.

20

5. Classification Diagrams and Tools forIgneous Rocks

Table1: Classification Diagrams and Tools for Igneous Rock implemented for Web-based Research (References are in the reference section of this paper

Section VII. Data Mining in Earth Section VII. Data Mining in Earth SciencesSciences

1. What is Data Mining?Data mining is the analysis of observational

data sets to find unsuspected relationships and to summarize the data in novel ways.

The definition above refers to ‘observational data sets’ as opposed to ‘experimental data’. Data mining algorithms are usually applied on data sets collected for some purpose other than data mining analysis. So the data mining activity has no control on data collection.The relationships found out using a data mining algorithm are expected to be novel. These are often referred to as models or patterns. These relationships must be statistically significant (not occurring merely by chance).

2. Components of Data Mining Algorithms Model or Pattern Structure: A model is a

high-level, global description of a data set. It may be descriptive or inferential. Descriptive models summarize the data in a concise and convenient way. Examples of descriptive models include models for the overall probability distribution of the data (density estimation), partitioning data into groups (cluster analysis). Inferential models make a statement about the population from which the data were drawn or about likely future values. Examples of inferential models include regression models, mixture models. In contrast, a pattern is a local feature of the data, perhaps holding for only a few records or variables. Patterns represent departures from general run of data: a pair of variables that have a particularly high correlation, a set of records that always score the same on some variables, and so on.

Score function: This helps in judging the quality of a fitted model. Given a data set, there might be many possible models that can describe the data set. The purpose of a score function is to rank the models. Examples of score function are mean squared error, least squares principle.

Optimizing and Search method: The data mining algorithm tries to optimize a score

Classification Images Reference

Tectonic Setting Classifier for Granites

Nb vY 14

Ta v Yb 14

Rb vs (Y + Nb) 14Rb-(Yb +Ta) 14Hf-Rb/10-Ta * 3 22Hf-Rb/30-Ta * 3 22

Element Concentration Rock Name Classifier

TAS Alkalis – Silica 5

Molecular Normative CompositionNa2O + K2O vs SiO2 1Nb/Y vs Zr/TiO2 20

Tectonic Setting Classifier for Basalts

Ti vs V 16

Zr-Zr/Y 13

Source Classifier A, I, S, M Types

Ga-Al-Zr 19

Modal QAPF Rock Name Classifier

Q-A-P-F (Plutonic rocks)

17

Q-A-P-F (Volcanic rocks)

18

Magma Association Classifier

Cr-Ti (OFB and LKT)

10

Zr-Ti (LKT, CAB, OFB)

12

Zr-Ti/100-Sr/2 12Na2O+K2O-FeOt-MgO

3

SiO2-Na2O+K2O 3SiO2-FeOt/MgO 8FeOt/MgO-FeOt 8FeOt/MgO-TiO2 8MnO*10-TiO2-P2O5*10

9

SiO2-K2O 2SiO2-Na2O+K2OSiO2-K2O 6SiO2-Al2O3 6

SiO2-FeOt/(FeOt+MgO)

6

Mw%-Fw 6Cw%-FMw% 6

Magma Type Classifier ACNK-ANK 6

21

function by searching over different models and pattern structures. Efficient computational methods are required for finding the parameters for a model that optimize the score function.

Data Management Strategy: While executing a data mining algorithm on a large data base, the data sets have to be handled efficiently. Designing a data base to store the data sets so that accessing subgroups of data is as fast as possible, choosing the proper data structures and deciding which data sets need to be read into computer memory are all part of the data management strategy.

Data mining is a highly interdisciplinary activity. Statistics and mathematics play an important role in modeling the data (Mackay 1992). Parallel processing techniques are used for handling large sets of data (Maniatty et al 2000). Visualization is essential to better

understand the numerical output produced by the data mining algorithms (Thearling et al 2001).

3. How is Data Mining Different from Statistical Analysis?

The key difference between statistics and data mining algorithms is that statistics is concerned with primary analysis: the data are collected (often using standard experimental design techniques) with particular questions in mind, and then are analyzed to answer those questions. Data mining is used for secondary data analysis – finding patterns and relationships that we have no idea about initially. Thus data mining helps in extracting hidden knowledge from data bases. The typical tasks in statistical analysis are fitting a model, testing a hypothesis and predicting the confidence intervals. The tasks in data mining are finding patterns (e.g., association rules), classification (e.g., bayes classifier, and neural networks), and grouping (e.g., clustering).

Figure 7.1: Broad Classification of GeoRoc Data set.

4. Ontology Driven Data Mining Research in Solid Earth Science at Virginia Tech

The data mining research at Virginia Tech is focused on combining ontologies and data mining techniques to design novel algorithms for earth sciences. Preliminary efforts in this area, focused in the domain of biology can be found in (Reino-Castillo et al 2003). As described in Section V, ontology specifies the terms or concepts in a domain and the relationships that exist between them. Thus ontologies represent a user’s prior knowledge

about the domain. The data sets of the domain can be structured by associating them to the concepts of the ontology. Applying data mining algorithms to hierarchically structured data sets allows discovering relationships at multiple levels of abstraction as opposed to applying the algorithms to the unstructured data sets at a single level of abstraction. Different users may have different perceptions of the same data. Hence, they can supply their own ontologies to structure the data for data analysis. Thus, incorporation of ontologies into data mining algorithms facilitates multiple views of the same data.

As a first step, the rock data collected from “GeoRoc” data base (http://georoc.mpch-mainz.gwdg.de/) is being analyzed using common data mining techniques. The data set consists of geochemical analyses of different types of rocks, geospatially distributed all over the world. In order to apply data mining algorithms to this data set we have created a hierarchical representation of this data set to reflect recognized plate tectonic settings as shown in Figure 7.1. Each of the classes can be further described using a detailed ontology. For

22

instance, the detailed ontology of the class convergent margins is shown in Figure 7.2.

Figure 7.2: Simplified concept map representing the sub-classes within the class of Convergent Margin

Convergent margin settings include geometrical relationships between upper plate and the subducted (lower) plate. In addition, the composition (continental, oceanic) of both plates leads to well recognized geochemical affinities. Similarities and differences between these environments are further constrained by rate of subduction, angle of subduction as well as the age of the plates involved in convergent margin settings. Similar ontologies for other classes exist as well. Through data mining techniques we explore patterns and differences at various geospatial resolutions for a given class. The knowledge represented by ontologies help in choosing the level of abstraction desired in applying the data mining algorithms. For example, in the case of convergent margins one user may choose to compare continental margins and oceanic arcs. For this purpose the data sets belonging to all the sub concepts of the concept “continental margins” in the hierarchy should be retrieved. Similarly all the data sets corresponding to “oceanic arc” are also retrieved. Another user may choose to compare oceanic arcs with fast and slow subducted plates based on a cut off. This is more specific and at a lower level of abstraction as compared to earlier example. So we need only the specific data sets corresponding to these settings. Application of data mining techniques at multiple geospatial scales is likely to yield new knowledge on the geological processes operative

at different scales. Development of algorithms that can be used at different scales is necessary because volume of data is expected to be significantly different when analysis is conducted for a single volcanic center vs. the entire arc. Research on data mining associated with sparse data will be coupled with those available for analysis of large data sets (Ramakrishnan et al 2002). Application of models for spatial data (Ramakrishnan et al 2004) is also under consideration.

Section VIII. Information Integration Section VIII. Information Integration ScenarioScenario

1. A Geoscientist’s Integration ScenarioIn order to justify ontologically driven

knowledge discovery, we show an integration scenario where accessing a database through simple ontological constraints is necessary. Although more complex scenarios are more appropriate for geologic studies, this case study clearly identifies a problem not adequately addressed by existing ontologies, provides informal semantics for objects and relations included in the ontology and a motivation for ontology development. We present the scenario in the form of a text-based query: “What is the distribution and U/Pb concordant zircon ages of A-type plutons on VA? How about their 3-D geometry? (Figure 8.1)

Figure 8.1: A Schematic representation of a geoscientist’s information integration problem

To answer such a query, we have to follow a certain flow of information. The logical flow of steps would include, but not limited to, these

23

steps: 1) Locate the state of Virginia. 2) Identify all igneous rocks from geologic map. 3) Access the database on igneous rocks. 4) Discriminate between volcanic and plutonic rocks. 5) Filter mineralogical and geochemical data for the plutons. 6) Apply discriminant functions to classify plutons as A-type. 7) Access age and geochronological database. 8) Use zircon as mineral for identifying U/Pb concordant age 9).locate gravity data base 10). Overlay gravity data over polygons of A-type plutons in Virginia, 11) deploy tool for calculating shape of pluton that best fits the gravity data12). Display the 3-D geometry of the plutons. These logical functions represent decomposition of the primary query into several sub-queries (Figure 8.2), each of which uses web based access to data and tools.

Summary StatementSummary Statement

Our research activities utilize the concept that extracting knowledge from static databases requires well designed organizational structures that are able to identify relationships between resources being analyzed. Queries across multiple databases, which contain information that may be logically linked to each other, require organization of the concepts represented in the data. Our research recognizes that schema-based queries of databases access the object view (within igneous rocks), but geologic research also needs to include integration of processes that affect or produce the objects. Therefore, the research goal at Virginia Tech is to create a prototype of a computer based knowledge environment that specifically reflects the logic used by a geoscientist, with the recognition that his/her primary interest lies in understanding processes that have affected the rock record through time.

Figure 8.2: Shows the workflow of querying information and the output format selection process

24

ReferencesReferences

Altintas, I., Berkley, C., Jaeger, E., Jones, M. Ludascher, B., & Mock, S., 2004. Kepler: An Extensible System for Design and Execution of Scientific Workflows.http://www.sdsc.edu/~ludaesch/Paper/ssdbm04-kepler.pdf

Chappell, B.W. & White, A.J.R., 2001. Two contrasting granite types: 25 years later. Australian Journal of Earth Sciences, v. 48, p. 489-499.

cSIS (CyberSTRUCTURE information system), 2003, http://www.sci.uidaho.edu/cyber/

Deer, W.A., Howie, R.A., & Zussman, J., 1992. An introduction to The Rock-Forming Minerals. Longman Publishers.

Eby, G.N., 1990. The A-type granitoids: a review of their occurrence and chemical characteristics and speculations on their petrogenesis. Lithos, v. 26, p. 115-134.

Farquhar, A., Fikes, R., Pratt W. & Rice, J., 1995. Collaborative Ontology Construction for Information Integration. Knowledge Systems Laboratory, Department of Comp. Sci., Stanford University. http://www-ksl.stanford.edu/KSL_Abstracts/KSL-95-63.html

Goble, C.A., Stevens R., Ng, G., Bechhofer, S., Paton, N. W., Baker, P. G., Peim, M., & Brass, A., 2001. Transparent access to multiple bioinformatics information sources. IBM Systems journal, v. 40, no. 2, p.532-551

Gruber, T.R., 1993. Toward Principles for the Design of Ontologies Used for Knowledge Sharing. Knowledge Systems Laboratory, Department of Comp. Sci., Stanford University. http://ksl-web.stanford.edu/KSL_Abstracts/KSL-93-04.html

Hollocher, K. , 2004. Calculation of a Norm from a Bulk Chemical Analysis.

http://www.union.edu/PUBLIC/GEODEPT/COURSES/petrology/norms.htm

King, P.L., White, A.J.R., Chappell, B.W. & Allen, C.M., 1997. Characterization and origin of aluminous A-type granites from the Lachlan Fold Belt, Southeastern Australia. Journal of Petrology, v. 38, no. 3, p. 371-391.

Lu, J., & Hsu, C., 2003. Query Answering Using Ontologies in Agent-based Resource Sharing Environment for Biological Web Information Integration. 18th International Join Conference on Artificial Intelligence

MacKay, D., 1992. Bayesian interpolation. Neural Computation, v.4-3, p. 415-447

Maniatty, W., & Zaki, M., 2000. A requirements analysis for parallel KDD systems. IPDPS Workshop.

Munindar P. Singh, & Mladen A. Vouk, 1996. Scientific Workflows: Scientific Computing Meets Transactional Workflows. NSF Workshop on Workflow and Process Automation in Information Systems

Noy, N. F., & McGuinness, D. L., 2001. Ontology Development 101: A Guide to Creating Your First Ontology. Knowledge Systems Laboratory, Department of Comp. Sci., Stanford University.http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html

Philpotts, AR., 1990. Principles of Igneous and Metamorphic Petrology. Prentice Hall, New Jersey.

Pitcher, W.S., 1983. Granite type and tectonic environment. In: Mountain Building Processes, ed. Hsu, K. Academic Press, London, p. 19-40.

Ramakrishnan, N., & Bailey-Kellogg, C.K., 2002. Sampling Strategies for Mining in Data-Scarce Domains, IEEE/AIP Computing in Science and Engineering (CiSE), Vol. 4, No. 4

25

http://www.ksl.stanford.edu/people/dlm/papers/ontology101/ontology101-noy-mcguinness.html





http://ksl-web.stanford.edu/KSL_Abstracts/KSL-93-04.html

http://ksl-web.stanford.edu/KSL_Abstracts/KSL-93-04.html

http://www-ksl.stanford.edu/KSL_Abstracts/KSL-95-63.html

http://www-ksl.stanford.edu/KSL_Abstracts/KSL-95-63.html

http://www.sci.uidaho.edu/cyber/

http://www.sdsc.edu/~ludaesch/Paper/ssdbm04-kepler.pdf

http://www.sdsc.edu/~ludaesch/Paper/ssdbm04-kepler.pdf

Ramakrishnan, N., Bailey-Kellogg, C.K., Satish, T., & Pandey, V., 2004 Gaussian Processes for Active Data Mining, submitted to ACM KDD-2004

Reinoso-Castillo, J., Silvescu, A., Caragea, D., Pathak, J. and Honavar, V. (2003). Information Extraction and Integration from Heterogeneous, Distributed, Autonomous Information Sources: A Federated, Query-Centric Approach. in IEEE International Conference on Information Integration and Reuse.

Sinha, A.K., Zendel, A., Brodaric, B., & Barnes,C., 2004. Schema to Ontology for Igneous Rocks: implications for the development of a cyber infrastructure for the earth sciences, in press.

Smith K.M., Welty C., & McGuinness, D.L., 2004. OWL Web Ontology LanguageGuide. W3C. http://www.w3.org/TR/2004/REC-owl-guide-20040210

SWEET (Semantic web for earth and environmental terminology), 2004, http://sweet.jpl.nasa.gov/ontology

Thearling, K., Becker, B., DeCoste, D., Mawby, B., Pilote, M., & Sommerfield, D., 2001. Visualizing Data Mining Models, Published by Morgan Kaufman

Whalen, J.B., Currie, K.L. & Chappell, B.W., 1987. A-type granites: geochemical characteristics, discrimination and Petrogenesis. Contributions to Mineralogy and Petrology, v. 95, p. 407-419.

Rock Classification DiagramsRock Classification DiagramsReferenceReference

1. Cox , K.G ., Bell, J.D., and Pankhurst, R.J., 1979. The interpretation of igneous rocks. George Allen & Unwin, London, United Kingdom (GBR).

2. Gill, J.B., 1981. Orogenic andesites and plate tectonics. Springer Verlag, Berlin, Federal Republic of Germany (DEU).

3. Irvine, T.N., and Baragar, W.R.A., 1971. A guide to the chemical classification of the common volcanic rocks. Canadian Journal of Earth Sciences, v. 8, no. 5, p. 523-548.

4. Jensen, L.S., 1976. A new cation plot for classifying subalkalic volcanic rocks. Ontario Geological Survey Miscellaneous Paper, no. 66, 22 pp.

5. LeBas, M.J., LeMaitre, R.W., Streckeisen, A., and Zanettin, B., 1986. A chemical classification of volcanic rocks based on the total alkali-silica diagram. Journal of Petrology, v. 27, p. 745-750.

6. Maniar, P.D., and Piccoli, P.M., 1989. Tectonic discrimination of granitoids. Geological Society of America Bulletin, v. 101, no. 5, p. 635-643.

7. Meschede, M., 1986. A method of discriminating between different types of mid-ocean ridge basalts and continental tholeiites with the Nb-1bZr-1bY diagram. Chemical Geology, v. 56, Issues 3-4, p. 207-218.

8. Miyashiro, A., 1974. Volcanic rock series in island arcs and active continental margins. American Journal of Science, v. 274, no. 4, p. 321-355.

9. Mullen, E.D., 1983. MnO/TiO2/P2O5; a minor element discriminant for basaltic rocks of oceanic environments and its implications for petrogenesis. Earth and Planetary Science Letters, v. 62, Issue 1, p. 53-62.

10. Pearce, J.A., 1975. Basalt geochemistry used to investigate past tectonic environment in Cyprus. Tectonophysics, v. 25, Issues 1-2, p. 41-67.

11. Pearce, J.A., 1996. Relationships between high field strength element geochemistry and tectonic setting of volcanic rocks. 30th

International Geological Congress abstracts, vol. 30, v. 2, p. 367.

12. Pearce, J.A., and Cann, J.R., 1973. Tectonic setting of basic volcanic rocks determined using trace element analyses. Earth and Planetary Science Letters, v. 19, Issue 2, p. 290-300.

26

http://sweet.jpl.nasa.gov/ontology

http://www.w3.org/TR/2004/REC-owl-guide-20040210

http://www.w3.org/TR/2004/REC-owl-guide-20040210

13. Pearce, J.A., and Norry, M.J., 1979. Petrogenetic implications of Ti, Zr, Y and Nb variations in volcanic rocks. Contributions to Mineralogy and Petrology, v. 69, p. 33-47.

14. Pearce, J.A., Harris, N.B.W., and Tindle, A.G., 1984. Trace element discrimination diagrams for the tectonic interpretation of granitic rocks. Journal of Petrology, v. 25, p. 956-983.

15. Pearce, T.H., Gorman, B.E., and Birkett, T.C., 1977. The relationship between major element chemistry and tectonic environment of basic and intermediate volcanic rocks. Earth and Planetary Science Letters, v. 36, Issue 1, p. 121-132.

16. Shervais, J.W., 1982. Ti-V plots and the petrogenesis of modern and ophiolitic lavas. Earth and Planetary Science Letters, v. 59, Issue 1, p. 101-118.

17. Streckeisen, A., 1979. To each plutonic rock its proper name. Earth Science Review, v. 12, p. 1-33.

18. Streckeisen, A., 1979. Classification and nomenclature of volcanic rocks, lamprophyres, carbonatities, and melilitic rocks: recommendations and suggestions of the IUGS Subcommission on the Systematics of Igneous Rocks. Geology, v. 7, p. 331-335.

19. Whalen, J.B., Currie, K.L., and Chappel, B.W., 1987. A-type granites: geochemical characteristics, discrimination and petrogenesis. Contributions to Mineralogy and Petrology, v. 95, p. 407-419.

20. Winchester, J.A., and Floyd, P.A., 1977. Geochemical discrimination of different magma series and differentiation products using immobile elements. Chemical Geology, v. 20, p. 325-343.

21. Wood, D.A., 1980. The application of a Th-Hf-Ta diagram to problems of tectonomagmatic classification and to establishing the nature of crustal contamination of basaltic lavas of the British Tertiary Volcanic Province. Earth and Planetary Science Letters, v. 50, Issue 1, p. 11-30.

27

Cyberinfrastructure Research at Virginia Tech, July 2004 ...

Documents

virginia tech oall

virginia tech oand

virginia tech o amine

virginia tech o ryan

virginia tech o matt

virginia tech o keely

earth science researchers

results of geon research