Top Banner
1/26 Improving volunteered geographic data quality Improving volunteered geographic data quality using semantic similarity measurements using semantic similarity measurements Arnaud Vandecasteele - Arnaud Vandecasteele - Rodolphe Devillers Rodolphe Devillers Memorial University of Newfoundland, Canada Memorial University of Newfoundland, Canada 8th International Symposium on Spatial Data Quality, 30 May - 1 June 2013
26

Improving volunteered geographic data quality using semantic similarity measurements

Jan 27, 2015

Download

Technology

arno974

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Improving volunteered geographic data quality using semantic similarity measurements

1/26

Improving volunteered geographic data quality Improving volunteered geographic data quality using semantic similarity measurementsusing semantic similarity measurements

Arnaud Vandecasteele - Arnaud Vandecasteele - Rodolphe DevillersRodolphe Devillers Memorial University of Newfoundland, CanadaMemorial University of Newfoundland, Canada

8th International Symposium on Spatial Data Quality, 30 May - 1 June 2013

Page 2: Improving volunteered geographic data quality using semantic similarity measurements

2/26

Outline

Introduction

Conclusion

Semantic SimilarityP-Rank algorithmTobler's Law

OSM Semantic PluginDescriptionExamples

Page 3: Improving volunteered geographic data quality using semantic similarity measurements

3/26

IntroductionNational Mapping Agencies

What make National Mapping Agencies Authoritative ?

Positional Accuracy

Completeness

Attribute Accuracy

ISO 19113ISO 19115

...ISO 19157

Page 4: Improving volunteered geographic data quality using semantic similarity measurements

4/26

IntroductionGeographic Information Quality view asa Project Management Triangle

Page 5: Improving volunteered geographic data quality using semantic similarity measurements

5/26

IntroductionGeographic Information Quality view asa Project Management Triangle

Really?

Page 6: Improving volunteered geographic data quality using semantic similarity measurements

6/26

Introduction

Could Another Map be authoritative* ?

* and cheap, and fast, accurate and in the better of worlds free

Page 7: Improving volunteered geographic data quality using semantic similarity measurements

7/26

IntroductionVolunteered Geographic Information (VGI)

Page 8: Improving volunteered geographic data quality using semantic similarity measurements

8/26

IntroductionVolunteered Geographic Information (VGI)

the widespread engagement of large numbers of private citizens, often with little in the way of formal qualifications, in the creation of geographic information

Goodchild - 2007

Page 9: Improving volunteered geographic data quality using semantic similarity measurements

9/26

Source: http://wiki.openstreetmap.org/wiki/Stats

OpenStreetMap (OSM) is a collaborative project to create a free editable map of the

world

+ 1 million

+ 1.8 billion nodes+ 180 million ways+ 1.9 million relations

Started in 2004

IntroductionThe OpenStreetMap project

Page 10: Improving volunteered geographic data quality using semantic similarity measurements

10/26

IntroductionData Quality & Volunteered Geographic Information

What aboutData Quality ?

Good geometric accuracyHaklay – 2010, Girres and Touya – 2010, Ludwig et al., - 2011

ButGeographic coverage patchwork

Goodchild - 2007

Semantics can be inconsistentBallatore et al., - 2012, Mooney and Corcoran - 2012

Page 11: Improving volunteered geographic data quality using semantic similarity measurements

11/26

Introduction

VGI changed the way we produce, publish and share Geographic Information

BUT

Semantic Quality is still an important issue

How to improve semantic quality using a VGI approach ?

Research Problem

Page 12: Improving volunteered geographic data quality using semantic similarity measurements

12/26

Semantic SimilarityWhat is Semantic Similarity ?

Landuse =

Forest

How to describe a forest in OpenStreetMap

Natural =

Wood

One concept, different representation !Q ? -> When should we use landuse=forest rather than natural=wood?* https://help.openstreetmap.org/questions/324/when-should-we-use-landuseforest-rather-than-naturalwood

11 different answers and no real general agreement

Page 13: Improving volunteered geographic data quality using semantic similarity measurements

13/26

Semantic SimilarityHow to measure the semantic similarity ?

● Geometric Model● Feature Model● Alignment Model● Network models● Transformation Model

Different models exist:

Semantic similarity applied to VGI:

Mooney and Corcoran - 2012

Ballatore et al., - 2012

Natural =

Wood

Landuse =

Forest

Natural =

Wood

Landuse =

Forest

Natural =

Wood

Landuse =

Forest

Measure?

Semantic Network created from the OpenStreetMap Wiki

Point Pattern analysis and semantic pattern

Page 14: Improving volunteered geographic data quality using semantic similarity measurements

14/26

Semantic SimilaritySemantic Network from the OSM Wiki, who it works ?

Page 15: Improving volunteered geographic data quality using semantic similarity measurements

15/26

Source: OSM WIKI

Semantic SimilaritySemantic Network from the OSM Wiki

Page 16: Improving volunteered geographic data quality using semantic similarity measurements

16/26

Measuring Semantic similarity

Two entities are similar if :

1 They are referenced by similar entities

2 They reference similar entities

A B

C

=

A B

C

=

Semantic Similarity

P-Rank Algorithm

Page 17: Improving volunteered geographic data quality using semantic similarity measurements

17/26

Semantic similarity

all things are related, but nearbynearby things

are more relatedrelated than distant things“

”Tobler - 1970

Semantic similarity and Geography

Tobler's first law of geography

Page 18: Improving volunteered geographic data quality using semantic similarity measurements

18/26

New Object in a cityNew Object in a cityA

P-Rank score

P-R

an

k s

core

P-Rank sc

ore

P-R

ank s

core

P-Rank score

P-R

ank

score

Semantic similarityApplied Tobler's first law to semantic similarity

Page 19: Improving volunteered geographic data quality using semantic similarity measurements

19/26

Java OpenStreetMap Editor

OpenStreetMap Semantic Plugin

OSM Editor usage stats (source OSM Wiki)

Page 20: Improving volunteered geographic data quality using semantic similarity measurements

20/26

Description

OpenStreetMap Semantic Plugin

Page 21: Improving volunteered geographic data quality using semantic similarity measurements

21/26

A BP-Rank Score

0.18

A CP-Rank Score

0.35

A DP-Rank Score

0.05

How similar are they ?

P-Rank scores

OpenStreetMap Semantic Plugin (aka OSMantic)Description

A

AC

Page 22: Improving volunteered geographic data quality using semantic similarity measurements

22/26

Creation of a new objectExamples - Creation of a new object

New object

Page 23: Improving volunteered geographic data quality using semantic similarity measurements

23/26

OpenStreetMap Semantic PluginExamples - Edition of an existing object

Page 24: Improving volunteered geographic data quality using semantic similarity measurements

24/26

OpenStreetMap Semantic PluginExamples – Semantic Similarity Evaluation

Page 25: Improving volunteered geographic data quality using semantic similarity measurements

25/26

Conclusion

The next big question ?

When will VGI be the next authoritative dataset ?

Semantic Similarity can be used to enhance the quality of VGI dataset

OSM Semantic plugin uses a collaborative approach to reduce the potential semantic similarity

How to improve the results:● Using the Tag Info database to know the most used tags ● By mixing the Geographic and the semantic approach (Ballatore + Mooney)

Page 26: Improving volunteered geographic data quality using semantic similarity measurements

26/26

Questions ?

Rodolphe DevillersMarine Geomatics Labhttp://www.marinegis.com/ Memorial University of Newfoundland

Acknowledgements

Natural Science and Engineering Research Council of Canada (NSERC)Andrea Ballatore for sharing his results