Transcript

The sum of all human knowledge in the age of machines

A new research agenda for Wikimedia

Dario Taraborelli • Wikimedia FoundationBig Dive, 16 June 2015

Non-profit running Wikipedia and sister projects

Mission: support the creation and dissemination of collaboratively produced free knowledge.

250+ employees, mostly based in San Francisco

6th most popular web property by traffic of the planet

35M articles in 288 languages 26M media files 60M triples

A conversation

Academic research on Wikipedia

rise and decline of the editor population

gender gap and content biases

contributor motivation

asymmetries in content and provenance of contributions

socio-technical systems governing quality control.

WIkipedia’s rise and decline

https://meta.wikimedia.org/wiki/Research:The_Rise_and_Decline

Human curated knowledge in the age of machines

the long-form encyclopedia

Outline

1. sourcing information

2. consuming information

3. distributing content

A new research agenda

Distributed innovation: how we work

1. Sourcing information

Goats

1. Sourcing information

● What’s the role of humans in sourcing and verifying information when answers to most questions are readily available from search engines?

● Should Wikipedia start integrating algorithmically extracted sources in its contents?

● Should Wikipedia further invest in supporting human generated citations?

2. Consuming information

O. Keyes (2015) The Mobile Singularity is already here. Wikipedia and the Mobile Web

Bite-sized consumption

Structured contributions

Manipulating fragments

media

structured data

referencesmedia

long-form text

fragments

references geocoordinatesstructured

data

decoupled article

Decoupling the article

long-form article

2. Consuming information

● Can we transform Wikipedia contents to make them suitable to bite-sized consumption?

● How to accelerate extraction of structured data from Wikipedia and its use in Wikidata?

● How to design effective lightweight contribution funnels around structured data and content fragments?

● How to support programmatic manipulation of content fragments?

3. Distributing content

The paradox of reuse

Routing attention

Women in Science

Wikipedia needs your help

The English Wikipedia article Women in Science needs contributors from a more global perspective. Help expand it!

Routing attention

Routing attention

3. Distributing content

● How can we design content distribution systems that do not intermediate Wikipedia?

● How do we leverage content syndication to route (expert) attention to the source?

A new research agenda

Designing and evaluating systems to:

1. preserve and increase transparent sourcing of information

2. break down long-form articles into their constituents

3. optimize content fruition, as a function of access

4. enable lightweight contribution/manipulation of structured data / fragments

5. leverage content distributed / syndicated by 3rd parties

6. prioritize work and route contributors to the site, as a function of demand

Distributed innovation: how we work

Open knowledge curation ecosystem

Humans

Cyborgs

Machines

Wikimedia Research as a platform

Wikimedia Research & Data team

Edit/article quality classifiers

Automated link recommendations

Article creation recommendations

Fundraiser testing and optimization

Scaling Wikimedia Research

1:100,000,000Approximate ratio of full-time data scientists at WMF to monthly unique visitors

Formal collaborations

Stanford University

GroupLens, University of Minnesota

Oxford Internet Institute

Los Alamos National Laboratory

https://wikimediafoundation.org/wiki/Open_access_policy

Open data

https://meta.wikimedia.org/wiki/Research:Data

Open data: pageviews

http://www.wikipediatrends.com

Open data: clickstream

Wulczyn, E; Taraborelli, D (2015): Wikipedia Clickstream. http://dx.doi.org/10.6084/m9.figshare.1305770

Open data: tuples

https://www.wikidata.org/wiki/Wikidata:Data_access http://tools.wmflabs.org/wikidata-todo/tempo_spatial_display.html

Open data: real-time changes

https://wikitech.wikimedia.org/wiki/RCStream

Conclusions

Questions?

dario@wikimedia.org

@readermeter@wikiresearch

Image creditsElection Night Crowd, Wellington, 1931https://www.flickr.com/photos/nationallibrarynz_commons/3326203787CC0

King Billy of Dalkey Islandhttps://www.flickr.com/photos/paulodonnell/5937678226CC BY

Secretary at typewriter, 1912https://www.flickr.com/photos/muohio_digital_collections/3192197470CC0

"Getting em up" at U.S.Naval Training Camp, Seattle, Washington. ca. 1917 - ca. 1918https://www.flickr.com/photos/usnationalarchives/5505933145CC0

top related