Top Banner
Wikimedia/British Library map mapping project review and latest update How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them ... on a budget of no not many euros. James Heald, Wikimedia volunteer (User:Jheald) Kimberly Kowal, British Library [email protected]
52

Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Aug 21, 2015

Download

Internet

James Heald
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Wikimedia/British Library map mapping project

– review and latest update

How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them

... on a budget of no not many euros.

James Heald,Wikimedia volunteer

(User:Jheald)

Kimberly Kowal,British Library

[email protected]

Page 2: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

1,000,000 imagesFantastic, but …

Page 3: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Very limited metadata

Page 4: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Very limited metadataCommons said no bulk upload

Page 5: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Volunteer response…

Create a subject index by book…

Page 6: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… encouraging images to be uploaded by the book(20,000 so far – majority by one user)

Page 7: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… however, manual categorisation of images isvery very time-consuming.

Page 8: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Could anything be done more automatically…

?

Page 9: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Maps: natural classification, given co-ordinates

Could anything be done more automatically…

?

Page 10: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

So: find the maps on Flickr, and tag them…

Page 11: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… using the index to drive the process

31 Oct

Page 12: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… using the index to drive the process

31 Oct

Page 13: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… using the index to drive the process

31 Oct

Page 14: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… using the index to drive the process

03 Nov

Page 15: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… using the index to drive the process

17 Dec

Page 16: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… using the index to drive the process

19 Dec

Page 17: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Oct 31

Page 18: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Oct 31

Page 19: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Nov 2

Page 20: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Nov 7

Page 21: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Nov 14

Page 22: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Dec 1

Page 23: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Dec 10

Page 24: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Dec 17

Page 25: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

But how many maps were there ?

Dec 28

Page 26: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

-- including 20,000 found independently by @Quasimondo, machine-assisted using his own pattern recognition methods

50,000 maps in all:

classmark detailed totals index index ------ ---------- ----------- misc 16074 14091 1983

Europe 13136 6254 6882British Isles 7191 269 6922North America 6758 1524 5234 USA 5782 1209 4573Asia 2736 1280 1456Africa 2300 1075 1225South America 895 659 236

Page 27: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Geo-location, using the Klokan/BL Georeferencer

(Free alternatives are also available)

Next step:

Page 28: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

10x more images than the BL has ever attempted before

Next step:

Page 29: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Success allows the old map to be laid over the top of a modern one

Page 30: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Pilot run of 3,000 completed

Page 31: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Now characterised by location …

Pilot run of 3,000 completed

Page 32: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

... and scale

Page 33: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

All that is needed to identify individual continents …

Page 34: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… countries …

Page 35: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… nation …

… nations …

Page 36: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… cities …

Page 37: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

… and beyond

… and beyond.

Page 38: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Ready to be uploaded to Commons…

Page 39: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Ready to be uploaded to Commons…

… almost

Page 40: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do list:

Better subject identification

Reasonable Commons categorisation

Page 41: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/1: Subject identification

Current: OSM Nominatim, 4 votes out of 5

Page 42: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/1: Subject identification

Small features: Look up on Wikidata, find plausible candidate

Page 43: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/1: Subject identification

Large features: can be over-cautiousNeed better idea of size of candidate features…

Page 44: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/1: Subject identification

Large features:… so compare typical existing maps

Page 45: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/2: Categorisation

Principle on Commons is to refine into groups of'human manageable' size.

~ 4 to 40 images (larger for series)

Good for humans, less good for machines... wildly different categorisation depths & naming

Page 46: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/2: Categorisation

Routine upload and management categories ... straightforward enough.

Maps from collection uploaded on <date> Maps from collection uploaded on <date> with

categorisation to confirm Images from <book>

but then ...

Page 47: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/2: Categorisation

Countries: Old maps of <country> Old maps of part of <country>Cities: Old maps of <city> Old maps of cities in <country>

Old maps of cities in <part of country>+ "<city>" itself ?

Features: (ie buildings, castles, cathedrals, battlefields, etc)

<Feature> / Plans of <Feature> Plans of <feature-type>s in <place>

Page 48: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do/3: Strengthening Wikidata

<feature-type> should be given by P31 ("Instance of“) -> church, castle, cathedral, battlefield, etc

But data often not yet there...Need to supply: WP category mining (care needed:"category spillage"), databases (if PD), etc.

Page 49: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

To do list

There is work to do…

But with some work, (and some human mop-up),automated upload + reasonable categorisationshould be possible.

Page 50: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

State of play

Georeferencing is underwayIndex pages now have “to georef” templates.

Page 51: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

State of play

Main progress page is live

Page 52: Mapping the maps - review and latest update (GlamWiki, Apr 12, 2015)

Conclusions: Tiered levels of wiki-pages leading to image searches can be used to drive large projects Even ad-hoc rough indexes are useful Commons's own old maps should be next

(~ 60,000)

Georeferencing is fun -- come and give it a try