Top Banner
Mendeley, putting data into the hands of researchers Kris Jack, PhD Data Mining Team Coordinator
41

Mendeley, putting data into the hands of researchers

May 28, 2015

Download

Technology

Kris Jack

I was invited to give a keynote presentation at the RecSysTEL Workshop (http://bit.ly/b2Bg2J) on 2010/09/30.

It presents Mendeley's tools for researchers and data sets that we made available for the dataTEL challenge, designed to provide new large scale data for researcers in recommendation systems.

The event was really enjoyable and the participants were excited about Mendeley.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mendeley, putting data into the hands of researchers

Mendeley, putting data into the hands of

researchers

Kris Jack, PhDData Mining Team Coordinator

Page 2: Mendeley, putting data into the hands of researchers

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

Page 3: Mendeley, putting data into the hands of researchers

➔ idea behind mendeley

➔ our features

➔ our technical challenges and solutions

➔ what does this mean for you?

Summary

Page 4: Mendeley, putting data into the hands of researchers

works like this:

1) Install “Audioscrobbler”

2) Listen to music

3) Last.fm builds your music profile and recommends you music you also could like... and it’s the world‘s biggest open music database

Last.fmMendeley

Page 5: Mendeley, putting data into the hands of researchers

research libraries

researchers

papers

disciplines

music libraries

artists

songs

genres

Last.fmMendeley

Page 6: Mendeley, putting data into the hands of researchers

➔ idea behind mendeley

➔ our features

➔ our technical challenges and solutions

➔ what does this mean for you?

Summary

Page 7: Mendeley, putting data into the hands of researchers

Mendeley helps researchers work smarter

Page 8: Mendeley, putting data into the hands of researchers

Mendeley extracts research data..

Install Mendeley Desktop

Mendeley helps researchers work smarter

Page 9: Mendeley, putting data into the hands of researchers

..and aggregates research data in the cloud

Mendeley extracts research data..

Mendeley helps researchers work smarter

Page 10: Mendeley, putting data into the hands of researchers

By doing this, Mendeley makes science more collaborative and transparent

Page 11: Mendeley, putting data into the hands of researchers
Page 12: Mendeley, putting data into the hands of researchers
Page 13: Mendeley, putting data into the hands of researchers
Page 14: Mendeley, putting data into the hands of researchers
Page 15: Mendeley, putting data into the hands of researchers
Page 16: Mendeley, putting data into the hands of researchers
Page 17: Mendeley, putting data into the hands of researchers
Page 18: Mendeley, putting data into the hands of researchers

➔ idea behind mendeley

➔ our features

➔ our technical challenges and solutions

➔ what does this mean for you?

Summary

Page 19: Mendeley, putting data into the hands of researchers

500,000+ users; the 20 largest userbases:

University of CambridgeStanford University

MITUniversity of Michigan

Harvard UniversityUniversity of OxfordSao Paulo University

Imperial College LondonUniversity of Edinburgh

Cornell UniversityUniversity of California at Berkeley

RWTH AachenColumbia University

Georgia TechUniversity of Wisconsin

UC San DiegoUniversity of California at LA

University of FloridaUniversity of North Carolina

39,000,000+ articles

Page 20: Mendeley, putting data into the hands of researchers

we can only use algorithms that scale up

related research

searchreadership statistics

+ dozens of other servicesmost frequent tags

Page 21: Mendeley, putting data into the hands of researchers

most frequent tags on our scale

related research

readership statistics search

most frequent tags

Page 22: Mendeley, putting data into the hands of researchers

for each documentfor each tag in document

increment count for tag

sort tags by frequency

for each documentfor each tag in document

increment count for tag

for each documentfor each tag in document

increment count for tag

for each documentfor each tag in document

increment count for tag

called 39,000,000 times

for each documentfor each tag in document

increment count for tagcalled ~3 times

called ~39,000,000 x 3 = ~117,000,000 times

for each documentfor each tag in document

increment count for tag

for each documentfor each tag in document

increment count for tag

sort tags by frequency

most frequent tags

most frequent tags on our scale

Page 23: Mendeley, putting data into the hands of researchers

for each documentfor each tag in document

increment count for tag

sort tags by frequencyfor each tag counted

emit the tag and frequency

solution: distributed computing

map reduce

for each documentfor each tag in document

increment count for tag

sort tags by frequency

MapReduce: Simplified Data Processing on Large ClustersIn Proceedings of OSDI 2004, San Francisco, CA, 2004.Jeffrey Dean and Sanjay Ghemawat

Page 24: Mendeley, putting data into the hands of researchers

hadoop

MapReduce: Simplified Data Processing on Large ClustersIn Proceedings of OSDI 2004, San Francisco, CA, 2004.Jeffrey Dean and Sanjay Ghemawat

solution: distributed computing

Page 25: Mendeley, putting data into the hands of researchers

support vector machines

hidden markov models

Page 26: Mendeley, putting data into the hands of researchers

conditional random fields

Isaac G. Councill, C. Lee Giles, Min-Yen Kan. (2008) ParsCit: An open-source CRF reference string parsing package. In Proceedings of the LREC 08, Marrakesh, Morrocco.

Page 27: Mendeley, putting data into the hands of researchers

deduplication

file hash check

crowd sourcing new articles from users

39,000,000 canonical documentsdocument fingerprinting

collapse metadata and update canonical docs

metadata comparison

Page 28: Mendeley, putting data into the hands of researchers

pig

statistics

Page 29: Mendeley, putting data into the hands of researchers

readerrank

Page 30: Mendeley, putting data into the hands of researchers

currently tf-idf similarity between documentsdeveloping collaborative filtering

currently tf-idf similarity between documents

Page 31: Mendeley, putting data into the hands of researchers

contact recommendations

currently recommendations based on contact networkdeveloping version based on interests

currently recommendations based on contact network

Page 32: Mendeley, putting data into the hands of researchers

➔ idea behind mendeley

➔ our features

➔ our technical challenges and solutions

➔ what does this mean for you?

Summary

Page 33: Mendeley, putting data into the hands of researchers

access to data

Page 34: Mendeley, putting data into the hands of researchers

datatel data setonline catalog

online article view logs article tags

library readership library stars

Page 35: Mendeley, putting data into the hands of researchers

Mendeley's API

Page 36: Mendeley, putting data into the hands of researchers

*new* you can get all of the articles in a group - data for you to test related research algos?

Page 37: Mendeley, putting data into the hands of researchers

Mashups with data on:

Chemical compounds

Locations

Alzheimer’s researchGrant funding

Twitter streams

Mendeley's API

Page 38: Mendeley, putting data into the hands of researchers
Page 39: Mendeley, putting data into the hands of researchers

want more?

let us know...

Page 40: Mendeley, putting data into the hands of researchers

“All the time we are very conscious of the huge challenges that human society has now – curing cancer, understanding the brain for Alzheimer‘s [...].

But a lot of the state of knowledge of the human race is sitting in the scientists’ computers, and is currently not shared […] We need to get it unlocked so we can tackle those huge problems.“

Page 41: Mendeley, putting data into the hands of researchers

www.mendeley.com

we're hiring!