PetDB SedDB
EarthChem SESAR
GfG Digital Data Collections
Objective: develop, maintain, & operate community-driven, sustainable information systems that
support the long-term preservation of dataenable data discovery and use for the broad communityadvance a culture of “open access” to data (& samples) and the creation of a global data networksupport data analysis to empower sciencefacilitate data integration via interoperability & open access interfaces
GfG Projects
PetDBData Collection for Geochemistry & Petrology of Ocean Floor Igneous & Metamorphic Rocks
SedDBData Collection for Geochemistry of Marine Sediments
EarthChemGeochemistry Information Network
SESARRegistry for Earth Samples
Administration of global unique identifiers for samplesGlobal Sample Catalog
Grants
PetDB SedDB EarthChem SESAR GfG/MGDS Management
K. LehnertK. Block
K. LehnertS. L. Goldstein
Subcontracts:R. Murray (BU)N. Pisias (OSU)W. Snyder (Boise State)
K. Lehnert
Collaborative withD. Walker (U Kansas)
K. LehnertS. L. GoldsteinS. VinayC. Lenhardt
K. LehnertS. Carbotte
Sept 2007ToAug 2010
Jul 2005ToJun 2008
Sept 2005ToAug 2010
Apr 2006ToMar 2009
Apr 2007ToMar 2010
Budgets
SedDB23%
GfG Mgmt16%
SESAR21%
PetDB
EarthChem17%
Salaries76%
Subcontracts13%
Participant support4%
Travel5%
Other1%
Hardware&Software1%
FY2008: $983,669
GfG Components
1. Information Technology Services2. Data Management3. Community Building4. Education
PetDBSedDB
EarthChemSESAR
EarthChem
1. IT Services
Database Development & AdministrationModeling & metadata standards for geochemical and sample dataData submission tools & proceduresQC & data ingestion tools & proceduresMaintenance of databases
Access InterfacesWeb application development & maintenanceInteroperability (web services, xml development)Visualization & data analysis tools
Systems Operation & Maintenance
LDEO/CIESIN: Sri Vinay (Ass. Director), Robert Arko, Branko Djapic, Brian Falk, Annie Gerard, Yuanliang Liu, Frank Pascuzzi, David Strom U Kansas/KGS: Jason Ash, Eileen Jones
Built architecture that is used across all GfG projects.Will make operation and enhancement of existing systems and development of potential future systems more efficient, cost effective, and sustainable.
Toward an Integrated GfG System
GenericModularFlexibleScalableSecure
Geochemical Data ModelImplementation completed for SedDB 1.0Will be implemented for EarthChem in Fall 2007Will be implemented for PetDB in Winter 2007/8
Analytical metadata at the level of individual measurements Description of spatial and temporal components of samples and measurementsAbility to integrate data at any level of sample granularityTracking of relationships between samples and sub-samplesCapability to store ‘derived’ (model) types of observed valuesGeoSciML compliant
Geochemical Data Framework
Framework for executing complex queries against a geochemical relational databaseConsists of database query models and data cache that supports the web interfaces and web servicesDeveloped using Object Oriented Design (OOD) methodologies on a Java/J2EE platform supported by WebLogic application server
Implementation for SedDB 1.0 (alpha)Implementation for PetDB in Summer 2008
Web Services
Analytical Data ServingEarthChem XML (ECML) and GeoSciML
Geospatial Data ServingWMS/WFS implemented for PetDB dataWMS/WFS for SedDB by AGU Fall 2007SESAR Samples
Data Loading ServicesGeochemical Data validation and loading
Implemented for PetDB, SedDB, EarthChemSESAR Sample registration via external systems (e.g. IODP)
IT Services: 2007 Highlights
New hires: Frank Pascuzzi, Yuanliang Liu, & Brian Falk
Redesign/restructuring of all GfG web sitesConsistent architecture/menu items across all projectsAddition of event list application
SedDB 1.0 alpha releaseSESAR Catalog Search alpha releaseImproved EarthChem XML schema & portal interfaceSESAR XML schema for IODP sample registrationWMS/WFS for PetDB in GeoMapAppMap interfaces for PetDB, SedDB, SESAR
Redesign of GfG web sites
SedDB 1.0 (alpha)
SedDB 1.0 (alpha)
SedDB 1.0 (alpha)
SedDB 1.0 (alpha)
SESAR Catalog Search (alpha)
SESAR Catalog Search (alpha)
SESAR Catalog Search (alpha)
IT Services: Foci for the Next Year
PetDB: Migration to the new GfG architectureSESAR
SESAR Catalog Search: move to productionWeb application for MyGeosamplesWeb application for online sample registration
SedDBSedDB 1.0: move to productionWMS/WFS for cores and specific datasets (MARGINS focus sites)Implementation of tools developed at Boise State
Geochemistry Data Library
Geochemistry Data Library - WDC
Repository for datasets‘Raw’ datasets ingested into data collectionsElectronic supplements from journalsData compilations submitted by PIs
Request from journals to serve electronic supplements with geochemical data
GeologyGeological Magazine
Enhances long-term archiving of data collections
“iGeoCuration” SESAR Online Sample Curation System
Service to repositories/museums to manage their collections and provide public access
Modules to manage administrative metadata (e.g. storage, acquisition, loans, collections)Service to develop & operate public & administrative web interfaces to collections
Advantages for repositories:No IT infrastructure & IT personnel neededNo maintenance and risk & contingency management neededAccess from anywhere by authorized individualsPlatform independent
Contract with Harvard Geological Museum Interest by Marine Repositories
2. Data Management
Data solicitationData compilation & documentationData entry & ingestionQuality ControlUser support
LDEO/CIESIN: Karin Block, Annika Johansson, Rusty Lotti, Branko Djapic, Brian FalkInterns: Maxine Paul, Peregrine Gerard-Little
Integrating Data Ingestion
Redesigned data entry forms to accommodate requirements of Geochemical Data Model
Common for all data collectionsImplemented for SedDB 1.0Challenge: Data management needs vs. community acceptance
Adjusted DataEndorser to new data entry forms
Data Management: 2007 Highlights
Compiled & ingested ca. 150,000 chemical values to PetDB, SedDB, EarthChem datasets
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
2005 2006 2007
Nu
mb
er
of
Valu
es
Ing
est
ed
PetDBSedDBDLDS
Data Management: 2007 Highlights
Deep Lithosphere DatasetCompiled & ingested complete Wilshire (1988) xenoliths dataReceived new voluntary data submissions (Ducea, Lee, Basu, Ionov, Klump)Developing chemistry-based classification for peridotite xenoliths based on the dataset
Collaboration with R. Stern, M. Ducea, E. AnthonyPhD student Urmidala Raye (UT Dallas) internship at GfGPresentation by K. Block at AGU 2007
Workshop at EarthScope National Meeting (Monterey, March 2007)
Convened by K. Lehnert, M. Ducea, R. Keller, K. BlockAttended by 20 peopleRecommendation for future developments/expansion of dataset
Data Management: 2007 Highlights
New EC Dataset: Central Atlantic Magmatic Province (CAMP)
Collaboration with Claude Herzberg (Rutgers University)Trained Rutgers undergraduate students in data management & compilation18 publications compiled and enteredData will be available from PetDB and EarthChem
New SedDB Dataset: Global Mn-nodule Chemistry
USGS Open File report by F. ManheimDefinitive dataset for Mn nodules (ca. 750 samples)Compiled by undergraduate Maxine Paul, research with dataset is planned QuickTime™ and a
TIFF (LZW) decompressorare needed to see this picture.
Data Management: Success & Challenges
Voluntary contributions increasing (PetDB, CAMP, Deep Lithosphere)Increased number of NSF requests to PIs to submit data
Need policy for unpublished data / moratorium periodsNeed data submission templates & proceduresIncreased user support
Data Management: Foci for 2008
Test & improve data submission templatesDefine policies & procedures for data submissionMigrate PetDB data to GCDMMigrate Deep Lithosphere Dataset to GCDMComplete SedDB compilation of test bedsGrow EarthChem datasetsEncourage voluntary contributions
3. Community BuildingGoals
Achieve broad use & involvementImprove integration of data across systems, disciplines, organizational, & political structuresCreate new organizational structures to support, manage, and operate a Geoinformatics infrastructure
Approaches & ActivitiesUnderstand the social & cultural challengesEducate & involve the user communityDevelop & promote technical solutions & policiesAugment partnerships & collaborationsEncourage & participate in community efforts
Understand the Social & Cultural Challenges
Collaboration with social & information scientists
Educate & Involve Community
Workshops, Courses, Exhibits, Presentations, AC
4 EarthChem Workshops1 SESAR workshop2 Short Courses4 Exhibits
AGU 2006Goldschmidt 2007
PublicationsNewsletters / list serversWeb sites
Educate & Involve Community
ESF Field Workshop Barberton Drilling Project, Sept 2007ESF Field Workshop Barberton Drilling Project, Sept 2007
Upcoming Presentations
GSA 2007 (Oct 28-31)Lehnert: "My Data. Your Data. Our Data! - Overcoming Cultural Barriers to Open Data Sharing”Lehnert: “The PetDB Data Collection: Impact on Science” (poster)Walker: "EarthChem: New Developments”
Joint MARGINS-IFREE Workshop IBM SubFac (Nov 9-11)
Block: “Online Geochemical Database Resources for Teaching: The Subduction Factory and Beyond”Johansson: “SedDB - A Global Geochemical Database of Marine Sediment Geochemistry”Lehnert: “Databases - US Perspective”
AGU 2007 (Dec 10-14)Lehnert: “Community-Based Development of Standards for Geochemical and Geochronological Data”Lehnert: “Building a Global Data Network for Studies of Earth Processes at the World's Plate Boundaries”Block: “A Chemistry-Based Classification for Peridotite Xenoliths”Block: “Fostering Education and Research Goals Through Partnerships Between
Upcoming Short Courses
GSA 2007, DenverTuesday, Oct 30, noon
AGU 2007, San FranciscoSunday, Dec 9, 2-5pm
Develop & Promote Technical Solutions & Policies
Data & Metadata Standards
EarthChem Working Groups & Workshops
Defined standards and templates for data reporting.“Reporting of Geochemical Data and Metadata” (Lamont, April 2007)“GeoEarthScope and EarthChem - EARTHTIME Effort” (Kansas, April 2007)“EarthChem: (U-Th)/He Geo/Thermochronology” (Kansas, May 2007)Metadata Type Relevance Example
Measured Parameter mandatory 87Sr/86Sr Technique mandatory TIMS Instrument optional VG Laboratory mandatory Washington State Univ Analyst optional J. Vervoort Analysis Date optional 03/17/2006 Sample Preparation optional Ground in agate mill Chemical Treatment optional Leached in 1N HCL Reference Sample Name mandatory NSB-987 Reference Value Measured mandatory 0.710246 Reference Uncertainty mandatory 0.000017 Reference Uncertainty Unit mandatory 2 sigma absolute Number of Measurements Averaged (N) mandatory 24
Primary Analytical Metadata
Develop & Promote Technical Solutions & Policies
Interoperability Standards
Development & ImplementationGeoSciML -OGC (WMS, WFS) implemented for PetDB, SedDBXML (EarthChemXML, SESAR registration)Controlled vocabulariesIGSN implementation
Develop & Promote Technical Solutions & Policies
IGSN ImplementationSESAR Workshop “Implementing the International Geo Sample Number in Geological Sample Curation”
Established procedures for sample registration.Trusted Agents (e.g. IODP)Pre-populated MGDS forms
IGSN implemented at NGDC Index of Marine & Lacustrine Samples.Collections registered
ScrippsWHOILamontODP/DSDPURIARFUSPRRHarvard
Develop & Promote Technical Solutions & Policies
Policies
New Geochemical Society Data PolicyInitiated at Goldschmidt 2006, approved at Goldschmidt 2007
Editors Roundtable, Goldschmidt 2007Reached agreement among editors of major scientific journals on the principles of a joint policy for data publication.To be approved at Goldschmidt 2008.
GSA-GPPC Open Access Working Group
Develop & Promote Technical Solutions & Policies
“Global Data Networks” workshop (Kiel, May 2007), covened by S. Carbotte, K. Lehnert, S. Tsuboi, W. Weinrebe
Agreed on statements of principle and recommendations to addresstechnical, procedural, and organizational issues of open global data sharing.
Partnerships & CollaborationsNational
LEPR: Integration with experimental dataInterop proposal with State Surveys & USGS (“Geosciences Information Network”)GEON: SedDB in Paleo Integration ProjectNGDC: IGSN implementation, marine repositories
InternationalGEOROC - Germany (Al Hofmann, Baerbel Sarbas)GFZ Potsdam - Germany/International (Jens Klump)Mexico: MexDB (Luca Ferrari)China: Eastern China Geochemical Database (Shan Gao)South Africa: Economic Geology Research Institute (Allan Wilson)Australia: Geoscience Australia (Lesley Wyborn), CSIRO (Simon Cox)
Community OrganizationPlanning a National Geoinformatics System
Organization of NSF workshop (Walker)Participation in NGS Task Force & WG (Walker, Lehnert, Vinay)NGS Planning proposal to EAR July 2007Co-convened AGU 2006 Town Hall meeting (Lehnert)Co-convened GSA 2007 Town Hall meeting (Walker, Lehnert)
Community Building: Tasks 2008
Short courses at regional universities & collegesEditors Roundtable Goldschmidt 2008Science session at Goldschmidt 2008
Conveners: Lehnert, Salters, KubickiEGU 2008 Workshop with ICDP/IODP: “Data Infrastructure for Scientific Drilling”International Working Groups
Follow-up of Kiel workshopAdvance EarthChem International Network (workshop?)
Involvement in NGS
Summer Interns 2007Maxine Paul (Columbia University): SedDBUrmidola Raye (UT Dallas): EarthChem (w/ R. Stern)Naya Sou (Rutgers University): EarthChem - CAMP dataset (w/ C. Herzberg)Stephanie Bloomer (Rutgers University): EarthChem - CAMP dataset (w/ C. Herzberg)Peregrine Little-Gerard (Columbia University): EarthChem, PetDB
4. Education
CoursesInternshipsEducational materials
Metrics of SuccessUser statistics
PetDB: 2000-2500 unique users per monthEarthChem: from a few hundred to ca. 1100 unique users/mths
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Citations28 publications in 2007 citing PetDB, 168 publications since 20005 publications citing EarthChemToo early for SedDB, need complete dataset
EarthChem Statistics
User StatisticsUnique Users per Months
0
500
1,000
1,500
2,000
2,500
3,000
3,500
Oct06 Nov06 Dec06 Jan07 Feb07 Mar07 Apr07 May07 Jun07 Jul07 Aug07 Sep07
Nu
mb
er
of
Un
iqu
e U
sers
EarthChemPetDBSedDBSESARGfG