Top Banner
LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?
37

LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

LIEGE 2010 ESPON Meeting

How to Use and Feed the ESPON Data Base?

Page 2: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 3: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 4: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Context

The ESPON Data Base is a Web-based application designed and developed by partners of the ESPON Data Base 2013 Project

The First phase of this Priority 3 ESPON Project ranges from mid 2008 to February 2011

« The goal of this project is to develop and manage a geo-referenced information system, taking into account the ESPON themes of applied research, their aims and geography to be covered. It will include a comprehensive database to be used within the ESPON 2013 Programme and an additional one to be published on the ESPON website. »

TPG_guidance_Scientific_Platform_and_Tools, May 2010

Page 5: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON Data Base 2013 Project in a Nutshell

Page 6: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON Data Base 2013 Project in a Nutshell

12 Challenges: the core of the project1. Collection of Basic Regional Data2. Harmonization of Time Series3. World / Regional Data4. Regional / Local Data5. Social / Environmental Data6. Urban Data7. , 8., and 9. ESPON Data Base Application3. Spatial Analysis for Quality Control4. Enlargement to Neighborhood5. Individual Data and Surveys

Data and Metadata• Metadata are probably more important than Data

Methods• Technical Reports that provide clear solution, identify

shortcomings and dead-ends

Page 7: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON Data Base 2013 Project in a Nutshell

Applications• OLAP Program for NUTS to GRID Conversion• Specific Program of Text Mining for the elaboration of the ESPON Thesaurus• Code in R language for outlier detection• ESPON Data Base Application

« The ESPON database combines the data from all projects: raw data, indicators and typologies. The TPG work related to any of the concepts measured by the indicators and/or typologies should make use of the ESPON database in order to ensure that results between TPGs are comparable. It also allows easier reproduction of results as the data is available in the ESPON database and can therefore be used by everyone. »

TPG_guidance_Scientific_Platform_and_Tools, May 2010

Page 8: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON Data Base 2013 Application

ESPON Data Base 2013• A repository gathering different indicators

• made available for ESPON Projects• provided by ESPON Projects

• A Web interface upon this repository and accessible through the ESPON Web site that allows

• to download data (and metadata) sets• to upload data (and metadata) sets

About the ESPON Data Base content• See the dedicated Interactive Workshop Session 5

Page 9: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON Data Base 2013 Application

History of the ESPON Data Base 2013

• First version: presented during Malmö Seminar and on line in November 2009

• Some Data and Metadata sets• Data and Medata Sets Formats as Excel Files• A simple Query Interface

• Second version: presented during Alcala Seminar and on line in June 2010

• More Data and Metadata sets• A metadata editor build with Geonetwork• A more elaborated Query Interface

Page 10: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON Data Base 2013 Application

Third (and final) version of the ESPON DB 2013 developed during the first phase of the ESPON DB 2013 Project

• presented today and on line at the end of December 2010• improvements until end of February 2011

What’s new in this version?

• More and more Data and Metadata sets• A login/password management interface• A back-office interface for its administration• An upload interface that guides users to enter data and metadata sets

• A new Metadata editor• A new and yet more evolved Query (Download) Interface

Page 11: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 12: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 13: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 14: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 15: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Advantages and requirements of an online DB

Allows sharing data among a big community

Allows online data exploration and discovery with a nice interface

Requires the respect of some syntactical rules (computers don’t do well in detecting ALL human mistakes)

• false units, false indicators, indicators without values, etc.

Requires the respect of some semantic rules in order to avoid (these are VERY difficult to detect automatically) :

• ambiguity (different entities of the real world appear as one in the database)• duplication (one entity of the real world corresponds to several entities in the database)

Page 16: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Problems before the metadata editor V2

(1) Syntactic issues

• Bad format dates (e.g. “06/2009”)

• Bad format indicator values (“ 634.7”)

• Bad format Booleans (mix of “TRUE”, “FAUX”, ”YES”)

• Modified names for metadata fields

• Alien, non-data or non-metadata text (leaving some of the comments or just forgotten copy-paste results), visible or HIDDEN• E. g. on a data file with 120+ columns, 2 hidden columns

in the middle with territorial unit names

Page 17: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Problems before the metadata editor V2

(1) Syntactic issues

• End of paragraph symbols (carriage return) in names

• Adding other metadata/data items not required by the profile

• Changing the order of metadata fields or changing the order of the data columns

• These are usually not very difficult to find and correct (MANUALLY), but correcting them is time consuming…

• They sometimes intersect with other software bugs and spawn new types of errors…

Page 18: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Problems before the metadata editor V2

(1) Syntactic issues

• Lack of correspondence between the dataset and the metadata file: indicator code and label code

Page 19: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Problems before the metadata editor V2

(2) Semantic issues

• Using various, unofficial names for projects, organizations, territorial units codes makes cross identification between multiple files or multiple deliveries complicated and results in duplicated objects in the database

• Incorrect descriptions for measure units• “Inhabitants per km2”, “MIO euro”•“number of employees” instead of “employees”•“%”, “index” or “ratio” instead of “ none” (indicator methodology)

Page 20: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Problems before the metadata editor V2

(2) Semantic issues

• Hasty copy/paste between indicator name and description (no new information)

• Decreasing precision in the metadata as we advance in the dataset

• No more methodology description• Indicator name and description become the same• Description is too short to understand what the indicator is about

• Wrong indicator values

Page 21: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Problems before the metadata editor V2

(2) Semantic issues

•Ambiguity and duplication can easily decrease the value of a big database

• Ambiguity:E.g. same name for

different indicators

•Duplication:•E. g. different name for the same indicator (“Total population”, “Average annual total population”, “Absolute population”, “Population male and female”,… or worse “Espon_Project_XX_A01”)

Page 22: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Problems before the metadata editor V2

(2) Semantic issues

• Misunderstanding between the indicator methodology and the estimation methodology

• Indicator methodology - general part • Population is a count variable, GDP is ratio between GDP/pop_t, etc.• Important for new/complex indicators (typologies, indexes, etc.)

• Value methodology – specific part• What methods of estimation/correction were applied (interpolation, adjusting with higher NUTS, etc.) and their approximate reliability

Page 23: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Solutions

For syntactic errors: • Automatic error checking at upload time• Assistance for the user (highlighting the faulty fields)

For semantic errors: • Changing metadata filling from an “editing algorithm with a lot of

copy-paste” into an “editing algorithm with a lot of browse and pick”• Some metadata values that are already known can be

automatically filled (like contact coordinates)• Already existing indicators are classed (by our UL colleagues)

into themes and are (unambiguously and uniquely) coded based on methodology – indicator ontology

• Already known geographic objects are stored in a spatial ontology and “alien” units may be rejected

• Outliers may be detected

Page 24: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 25: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 26: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

(1) Difficulty to fill and to use of a data delivery containing hundreds of indicators

•Class by order of importance and interestingness the data in your delivery

•Provide a data/metadata files titled 10_best_indicators_Project_X”.

•Provide another data/metadata files entitled “Database_Project_X” for the other indicators

Suggestions for improving (meta)data quality

Page 27: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

(2) Importance of the methodology field of the indicator

• Integrating an indicator in the ESPON DB is interesting if it is possible to re-use it for other purposes.

• In particular for complex indicators (typologies, outputs of models) to understand what is behind the calculation

• Avoid mentioning ONLY “cf. Final Report of XXX Project for further information”, of little use in an online database

Suggestions for improving (meta)data quality

Page 28: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

(2) Importance of the methodology field of the indicator

• Example: The Labour Market typology (ESPON 2006 Database). What is behind the typology??

ESPON 2006 Database metadata

ESPON 2013 Database metadata (filled thanks the Report

of the project)

Suggestions for improving (meta)data quality

Page 29: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Suggestions for improving (meta)data quality(3) The dataset has to cover at least the entire ESPON Area (EU27+4) and if possible Candidate Countries.

(4) In case of indicators described in the NUTS delineation, try to provide the information in the different NUTS level (NUTS0, NUTS1, NUTS2 and NUTS3 if possible)

(5) Mention systematically in the dataset the NUTS version of the territorial units

Example of metadata file

3 4 5

Page 30: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

(6) In case of territorial units which do not belong to the NUTS nomenclature, provide the ESPON Database Project with a precise nomenclature with names and if possible shapefiles to locate the territorial units.

Shapefile and dataset describing UMZ nomenclature, version 2000

(provided by ESPON Database Project)

Suggestions for improving (meta)data quality

Page 31: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Suggestions for improving metadata quality(7) If you estimate missing values for basic indicators, mention it in the label and explain the methodology used for filling the gaps in your dataset.

Label for estimated dataIn Bulgaria (ESPON 2013 Database

Basic indicators)

Page 32: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON DB operational indicator ontology

Purpose: • To allow obtaining unique and unambiguous codes for the

indicators in the database (database problem) • To allow a classification of indicators, easing data discovery

and exploration (user problem)

Idea for the ESPON thesaurus: • Melting together several classifications (themes, subthemes) • Producing a synthesis of these classifications• One indicator can belong to one or more themes and

subthemes• The description of the indicator is further enriched by adding

keywords

Page 33: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

ESPON DB operational indicator ontology

Idea for coding scheme: • Use the methodology of the indicator as a basis for creating an

abbreviated code • Leave aside everything that doesn’t relate strictly to the indicator

(like spatial, temporal or resolution descriptors)

• An indicator code is composed of several parts• Base indicator part (GDP, pop)• Restrictions, derivations, methods of calculation (m, av, ch)• Level of measurement (density ratio, count, etc.)

Page 34: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 35: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Outline

Introduction

How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)?

Demo (part one): The new ESPON Data Base Query Interface

What’s in a Data Set? What’s in a Metadata Set?

Metadata: some feedback

Demo (part two): How to Upload a Data Set and a Metadata Set?

The Manual Checking Phase and Classification of Themes

Demo (part three): How to Register into the ESPON Data Base?

Conclusion

Page 36: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

Agenda

31st Dec. 2010 • The new ESPON DB 2013 Web Application on line• Test and Survey

28th Feb. 2011• Final Report of the ESPON DB 2013 Project• End of the First Phase of the ESPON DB 2013 Project

(closure of the scientific activities)

Page 37: LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

LIEGE 2010 ESPON Meeting

Thank You for Your Attention