Top Banner
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit Dario Taraborelli @readermeter National Institutes of Health • September 23, 2016
69

Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Apr 15, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Dario Taraborelli@readermeter

National Institutes of Health • September 23, 2016

Page 2: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikimedia Researchhttps://www.mediawiki.org/wiki/Wikimedia_Research

Page 3: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

The altmetrics manifestohttp://altmetrics.org/manifesto/

Page 4: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Page 5: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

A short history of Wikipedia

A website that anyone can edit

The largest reference work on the internet

A multi-language online encyclopedia

Page 6: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

A short history of Wikipedia

A website that anyone can edit

The largest reference work on the internet

A multi-language online encyclopedia

Page 7: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

A short history of Wikipedia

A website that anyone can edit

The largest reference work on the internet

A multi-language online encyclopedia

Page 8: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikipedia: unintended outcomes

accelerate the dissemination of scholarship

provide an infrastructure open scientific research

enable distributed fact-checking and curation of scientific knowledge

Page 9: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Outline1. Wikipedia as the front matter to all research

2. A new kind of open knowledge

3. Wikidata: Collaboratively curated linked open data

4. WikiCite: Building the sum of all human citations

5. Applications and opportunities for open science

6. Concluding remarks

Page 10: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

1. Wikipedia as the front matter to all research

Page 11: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

“Wikipedia is not the bottom layer of authority, nor the top, but in fact the highest layer without formal vetting. In this unique role, it serves as an ideal bridge between the validated and unvalidated Web.”

Casper GrathwohlChronicle of Higher Education

http://chronicle.com/article/article-content/125899/

Page 12: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Top sources of DOI lookups

http://crosstech.crossref.org/2014/02/many-metrics-such-data-wow.html http://blog.crossref.org/2016/05/https-and-wikipedia.html

wikipedia.org

Page 13: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

World’s most accessed online medical resources

Heilman and West (2015) doi.org/10.2196/jmir.4069

Page 14: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Most visited resource on Ebola in West Africa

Heilman (2016) http://tinyurl.com/jfuyduv

Most used internet site in Liberia, Sierra Leone and Guinea for Ebola during 2014 outbreak

Greater than CNN, CDC and WHO

Page 15: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

2. A new kind of open knowledge

Page 16: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Schmachtenberg et al (2014)http://lod-cloud.net [CC BY SA]

Page 17: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Challenges

Biases / errors

Coverage

Diversity and inclusiveness

Verifiability

Page 18: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Machine-readable linked open dataEditable by anyone

Supporting human + algorithmic curationComprehensive

Transparently verifiable

Page 19: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Machine-readable linked open dataEditable by anyone

Supporting human + algorithmic curationComprehensive

Transparently verifiable

Page 20: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Machine-readable linked open dataEditable by anyone

Supporting human + algorithmic curationComprehensive

Transparently verifiable

Page 21: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

3. Wikidata: Collaboratively curated linked open data

Page 22: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikidata

Free knowledge base that anyone can edit

Launched in 2012

Integrated with Wikipedia and other sister projects

Statistics (September 2016)Over 20M itemsOver 100M statements

Page 23: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikidata:Growth

http://reportcard.wmflabs.org/graphs/active_editors

English Wikipedia

Wikidata

Page 24: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikidata:Growth

http://reportcard.wmflabs.org/graphs/very_active_editors

English Wikipedia

Wikidata

Page 25: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikidata’s anatomy

https://www.wikidata.org/wiki/Wikidata:Introduction

Page 26: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Wikidata’s anatomy

Linked data, San Francisco, Jeblad https://commons.wikimedia.org/wiki/File:Linked_Data_-_San_Francisco.svg [CC BY SA]

Page 27: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

SPARQL:https://t.co/cDR4Lt7V6P

Birth place of people employed by MIT

Wikidata: queries

Page 28: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

SPARQL:http://tinyurl.com/h2lqv9y

Authors with a known location and ORCID

Wikidata: queries

Page 29: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Expert curation of scientific open data

Benjamin Good (2016) Opportunities and challenges presented by Wikidata in the context of biocurationhttp://tinyurl.com/hk9qrmz

Page 30: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Sample of current biomedical content in Wikidata

● All human, mouse genes and proteins (swissprot) ● All Gene Ontology terms● All Human Disease Ontology terms● All FDA approved drugs ● 109 reference microbial genomes

Mitraka et al (2015) Semantic Web Applications for the Life SciencesBurgstaller-Muelbacher et al (2016) DatabasePutman et al (2016) Database

Page 31: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Expert curation of scientific open data

Page 32: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Expert curation of scientific open data

Gene Wiki: WIkidata SPARQL exampleshttps://bitbucket.org/sulab/wikidatasparqlexamples/overview

Get all known drug-drug interactions for Methadone via its CHEMBL idGet a list of all diseases known to be treated by MetforminGet a list of all diseases that might be treated by Metformin

Page 33: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

4. WikiCite: Building the sum of all human citations

Randall Munroe, Wikipedian protester http://tinyurl.com/p3rodlb [CC BY]

Page 35: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Benjamin Good (2016) Opportunities and challenges presented by Wikidata in the context of biocurationhttp://tinyurl.com/hk9qrmz

Page 36: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Page 37: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

the disappearance of provenance

http://bit.ly/SumOfAllCitations

Page 38: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

the disappearance of provenance

Page 39: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Page 40: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

a provenance-preserving answer engine

The sum of all human knowledge

The sum of all data and sources backing human knowledge

+

Page 42: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

The molecular origins of insulin go at least as far back as the simplest unicellular [[eukaryotes]].<ref name='LeRoith'>{{cite journal | vauthors = LeRoith D, Shiloach J, Heffron R, Rubinovitz C, Tanenbaum R, Roth J | title = Insulin-related material in microbes: similarities and differences from mammalian insulins | journal = Can. J. Biochem. Cell Biol. | volume = 63 | issue = 8 | pages = 839–49 | year = 1985 | pmid = 3933801 | doi = 10.1139/o85-106 }}</ref> Apart from animals, insulin-like proteins are also known to exist in Fungi and Protista kingdoms.

References in Wikipedia

Page 43: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Page 44: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

WikiCite: goals

Build a repository of all Wikimedia citations and bibliographic metadata

Design data models and technology to improve the coverage, quality, standards-compliance and machine-readability of

citations and bibliographic metadata in Wikimedia projects

@wikicite • meta.wikimedia.org/wiki/WikiCite

Page 45: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

VisionTechnologyCommunityScaleLicensingIndependence

Page 46: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

https://tools.wmflabs.org/sqid/#/view?id=P2860

All biomedical OA review articles of the last 5 years

Page 47: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

The Zika corpus

Open citation graph layer

Bibliographic metadata layer

Expert annotation layer

Encyclopedic layer

Page 48: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

The Zika corpus

Encyclopedic layer

Page 49: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

The Zika corpus

Expert annotation layer

Encyclopedic layer Pathogen transmission process

Page 50: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

The Zika corpus

Bibliographic metadata layer

Expert annotation layer

Encyclopedic layer

Page 51: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

The Zika corpus

Open citation graph layer

Bibliographic metadata layer

Expert annotation layer

Encyclopedic layer

Page 52: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

5. Applications

Page 53: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Co-author graphs for individual researchers SPARQL: http://tinyurl.com/zml3jox

Page 54: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Most cited authors in the Zika research corpus (+ filtering by journal, OA status, type of statement) SPARQL: http://tinyurl.com/jb8da68

Page 55: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Semi-automated recommendation of entities, missing statements, references for unsourced statements

https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool

Page 56: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Semi-automated recommendation of entities, missing statements, references for unsourced statements

https://meta.wikimedia.org/wiki/Grants:Project/WikiFactMine https://twitter.com/larswillighagen/status/774614483394236416

Page 57: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Tools for crowdsourcing entity matching / disambiguation

http://www.generalist.org.uk/blog/2014/wikidata-identifiers-and-the-odnb-where-next/ http://www.generalist.org.uk/blog/2014/wikidata-and-identifiers-part-2-the-matching-process/

Page 58: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

read/write interfaces for biocuration

Page 59: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

all statements citing a New York Times article

the most popular scholarly journals used as citations for statements in any item that is a subclass of economics

all statements citing the works of Joseph Stiglitz

all statements citing journal articles by physicists from Oxford University

all statements citing a journal article that was retracted

all statements citing a source that cites a journal article that was retracted

New opportunities for linked open knowledge curation and discovery

https://meta.wikimedia.org/wiki/WikiCite_2016/Report/Group_5

Page 60: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

More reliable data for altmetrics services

https://www.altmetric.com/blog/new-source-alert-wikipedia/

Page 61: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

6. Concluding remarks

Page 62: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Dominant biocuration paradigm

● Cost of ad-hoc parsing of API responses or flatfile data● Ambiguous or non-existent xrefs● Persistence of funding ● Too much information to curate

B. Good (2016) Opportunities and challenges presented by Wikidata in the context of biocuration http://tinyurl.com/hk9qrmz

Page 63: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

A new paradigm for biocuration

● Reduce API/parser proliferation● Force up-front integration● Facilitate coordination ● Ensure that if funding is lost, data is not● Leverage community input

B. Good (2016) Opportunities and challenges presented by Wikidata in the context of biocuration http://tinyurl.com/hk9qrmz

Page 64: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

T. Putman (2016) Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes https://doi.org/10.6084/m9.figshare.3201796.v1

Page 65: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Accelerate the discoverability, reusability, and societal impact of open access

Page 66: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Support new forms of open curation and distributed fact-checking

Provide long-term, sustainable infrastructure to support open science

Benefit from large-scale distribution of data in the linked data ecosystem

Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Page 68: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

Thank youAcknowledgmentsDaniel Mietchen, Jonathan Dugan, Lydia Pintscher, Cameron Neylon, James Hare, James Heilman, Magnus Manske, Egon Willighagen, the Gene Wiki team (especially Andra Waagmeester, Tim Putman, Benjamin Good), the ContentMine team, the University of Chicago Knowledge Lab, all WikiCite 2016 participants and Wikidata Source Metadata project contributors.

Additional image credits

Library, National Park Service Collection thenounproject.com/term/library/191/ [CC0]Robot, Creative Stall thenounproject.com/term/robot/132360/ [CC BY]Open Access logo commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_transparent.svg [CC0]

[email protected] • @readermeter • @Wikidata • @WikiCite • @WikiResearch

Page 69: Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit

A short history of NIH and Wikimedia● 2002: article National Institutes of Health started on English Wikipedia● 2003: MEDLINE● 2004: PubMed● 2005: PubMed Central

○ along with Template:PMC● 2007: WikiProject National Institutes of Health

○ along with Template:National Institutes of Health● 2009: first Wikipedia Academy in the US took place at NIH

○ Susannah Fox: “Shared Kismet: Wikipedia and the NIH”○ triggers Guidelines for Participating in Wikipedia from NIH

● 2012: bot imports multimedia from PMC into Wikimedia Commons○ triggers formation of JATS for Reuse working group

● 2015: Template:NIH properties on Wikidata● 2016: First papers using Wikidata queries appear in PMC