Top Banner
Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD [email protected]
13

Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD [email protected].

Jan 02, 2016

Download

Documents

Junior Matthews
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

Deutscher Wetterdienst

DAR Metadata CatalogMarkus Heene, DWD [email protected]

Page 2: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 2

Agenda Welcome

Notes

Performance Test - Infrastructure

High level architecture– Geonetwork

– terraCatalog

Performance Tests– Requirements

– Preconditions

– Results

– Remarks

Resources

Page 3: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 3

Notes The presented results are from May 2009

Both software solutions have released newer versions– Geonetwork 2.6

– terraCatalog 3.0

The findings of the Performance Study were made available to both

Page 4: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 4

Performance Test - Infrastructure

Server Server

Tomcat 5.5

Oracle 10gTest client

Application Server:CPU: 4 AMD Opteron 1800 MHzRAM: 9186716 kB

Page 5: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 5

Geonetwork: High level architecture

Geonetwork (version 2.2 and 2.4)– Servlet Container

• Main development for jetty (migration to other Servlet containers like Tomcat, OC4J possible)• Geonetwork consists of 3 different web applications which could interact• Different Frameworks used for the development: Jeeves, Struts, Spring, …• For the next generation of Geonetwork a system architecture redesign is announced: remove Jeeves

Framework (“Bringing data and metadata closer together”, FOSS4G2008 - Cape Town by Jeroen Ticheler)

– Metadata handling• Metadata XML file is stored as “large object” in Database (support for different vendors)• Search is mainly based on lucene index outside of Database• <gmd:fileidentifier> limited to varchar2(250) in basic installation• Huge time necessary to build lucene index

– Additional remarks• Open source software• Stable solution so far (migration to other Servlet container needs time)• Version 2.2 implements only some queries of CSW• Some Z39.50 support is available, currently only limited experiences inside DWD• Production installation with up to 25.000 records are running (what we found)

Page 6: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 6

terraCatalog 2.3: High level architecture terraCatalog 2.3

– Servlet Container• Main development for Tomcat (migration possible but not tried)• terraCatalog consists of different web applications which could interact• Consistent usage of frameworks through all web applications

– Metadata handling• Metadata XML file is stored in Database and “mapped” into relational model (database support for Postgresql and

Oracle)• Search is function of Database (Oracle Spatial and Text)• Mapping into relational model cause conflicts with XML documents (e.g. title is limited to varchar2(255), same for

abstract and keywords) valid ISO-conform XML documents could not be imported into terraCatalog• Oracle Spatial datatype could store only half of the world special treatment necessary for whole globe we

found Oracle errors in certain situations– Additional remarks

• Commercial software with support• Much more complete implementation of CSW compared to Geonetwork 2.2• No Z39.50 search functionality additional investment necessary• Production installation with up to 25.000 records are running• We found some bugs – SQL Injection, Oracle errors, import of valid XML documents not possible, error in export

metadata as XML document

Page 7: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 7

Performance Tests - Requirements Requirements based on WMO and INSPIRE

WMO (see WIS-TechSpec-8, DAR Catalogue Search and Retrieval, Technical Specification 1.1)

– Response time < 2 sec

– 40 combined search (keyword and bounding box) per second

– Minimum of 20 active sessions

INSPIRE– Response time < 3 sec

– Minimum of 30 active sessions

DWD– Minimum of 100.000 metadata records

Page 8: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 8

Performance Tests - Preconditions Importing Metadata

– Practical package size was 5.000 metadata records in an archive

– Import costs a lot of time (5.000 records ~ 45 minutes – 60 minutes)

– Importing metadata into terraCatalog generates GBs of redo-logs (200 MB per minute)

Formulate queries in CSW 2.0.2– Challenge was to describe a query that both system understood (limited CSW implementation from Geonetwork 2.2)

– Parameterize query for different result sets (e.g. search title for “zyx” 0 hits, search title for “gts” 136.511 hits)

Page 9: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 9

Performance Tests - Results

++sessions

Geonetwork : < 250 records

terraCatalog: < 400 records

oonumber of resultsets

Geonetwork : < 10.000 hits+onumber of hits

insufficientooerror handling

Geonetwork : < 10.000 hits

terraCatalog: timeout (no results after 15 min)

-ocombined search

+otime serach

Geonetwork : < 10.000 hits

terraCatalog: incorrect results (different resultsetbetween terraCatalog, Geonetwork and simple directsql query; specific requests (including equator): database exceptions

-ogeographicsearch

Geonetwork : < 10.000 hits+owildcard search

Geonetwork : < 10.000 hits+otext search

CommentsterraCatalogGeonetworkCriteria

++sessions

Geonetwork : < 250 records

terraCatalog: < 400 records

oonumber of resultsets

Geonetwork : < 10.000 hits+onumber of hits

insufficientooerror handling

Geonetwork : < 10.000 hits

terraCatalog: timeout (no results after 15 min)

-ocombined search

+otime serach

Geonetwork : < 10.000 hits

terraCatalog: incorrect results (different resultsetbetween terraCatalog, Geonetwork and simple directsql query; specific requests (including equator): database exceptions

-ogeographicsearch

Geonetwork : < 10.000 hits+owildcard search

Geonetwork : < 10.000 hits+otext search

CommentsterraCatalogGeonetworkCriteria

+ (fulfilled), - (failed), o (partially)

Page 10: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 10

Performance Tests - Results

++> 30sessions

oo< 3 secresponse time

++> 100.000(Datasets)

terraCatalogGeonetworkCriteriaRequirement

++> 30sessions

oo< 3 secresponse time

++> 100.000(Datasets)

terraCatalogGeonetworkCriteriaRequirement

++> 20sessions

-o> 40combined search

oo< 2 secresponse time

++> 100.000(Datasets)

terraCatalogGeonetworkCriteriaRequirement

++> 20sessions

-o> 40combined search

oo< 2 secresponse time

++> 100.000(Datasets)

terraCatalogGeonetworkCriteriaRequirement

INSPIRE

WMO

+ (fulfilled), - (failed), o (partially)

Page 11: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 11

Performance Tests - Remarks

Geonetwork fails to meet the requirement if the result set contains more than 10.000 hits ( response time scales with size of the result set)

Geonetwork installation with 140.000 metadata records– First access of the GUI takes minutes!

Geonetwork 2.2 deployment of web app with around 3000 metadata records costs hours terraCatalog fails to meet the requirement for combined searches terraCatalog could not meet the response time requirement for geographical searches terraCatalog errors if the search touches the equator Fuzzy search for title, abstract, keywords … is a nice feature terraCatalog up to 60 times faster as Geonetwork in simple queries Other solutions like geowaySDI.NODE are although tested only with 25.000 records

Currently it looks like that both systems are not capable to handle 140.000 metadata records according to the requirements of INSPIRE and WMO

Page 12: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 12

Resources WMO Wiki:

http://www.wmo.int/pages/prog/www/WIS/wiswiki/tiki-index.php?page=geonetworkdoc

Geonetwork: http://geonetwork-opensource.org/

BlueNet: http://anzlicmet.bluenet.utas.edu.au/

con terra: http://www.conterra.de/

Page 13: Deutscher Wetterdienst DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de.

DAR Metadata Catalog 24.09.2010 Page: 13

Q&A