Top Banner
Theodore Vasiloudis, @thvasilo Swedish Institute of Computer Science (SICS) Uncovering concepts and similarities at scale
56

Uncovering concepts and similarities at scale

Mar 20, 2017

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Uncovering concepts and similarities at scale

Theodore Vasiloudis, @thvasiloSwedish Institute of Computer Science (SICS)

Uncovering concepts and similarities at scale

Page 2: Uncovering concepts and similarities at scale

Similarity

Page 3: Uncovering concepts and similarities at scale

Objects

Page 4: Uncovering concepts and similarities at scale

Objects

Users, words, artists, movies, …

Page 5: Uncovering concepts and similarities at scale

Objects and correlations

Friendships, co-occurrences, transitions, …

Page 6: Uncovering concepts and similarities at scale

Similarity?

Page 7: Uncovering concepts and similarities at scale

You shall know a word by the company it keeps.

- J. R. Firth, 1957

Page 8: Uncovering concepts and similarities at scale

You shall know a word by the company it keeps.

an object

Page 9: Uncovering concepts and similarities at scale

Object with company

Page 10: Uncovering concepts and similarities at scale

Object with company

Page 11: Uncovering concepts and similarities at scale

Object in context

Page 12: Uncovering concepts and similarities at scale

Similar objects have similar contexts

Page 13: Uncovering concepts and similarities at scale

Similar objects have similar contexts

Similar objects are exchangeable

Page 14: Uncovering concepts and similarities at scale

Similar objects have similar contexts

Similar objects are exchangeable

How can we discover such objects?

Page 15: Uncovering concepts and similarities at scale

Calculating similarities

Similar objects are one neighbor apart

Page 16: Uncovering concepts and similarities at scale

Calculating similarities

Compare edges to shared neighbors

Page 17: Uncovering concepts and similarities at scale

Calculating similarities

Transform to two-hop graph

Page 18: Uncovering concepts and similarities at scale

Calculating similarities

Transform to two-hop graph

Page 19: Uncovering concepts and similarities at scale

Calculating similarities

Correlation graph

Page 20: Uncovering concepts and similarities at scale

Calculating similarities

Correlation graph Similarity graph

Page 21: Uncovering concepts and similarities at scale

Calculating similarities

edges // (i, j, correlation)pairs = edges.join(edges).on(j) // ((i, j), (corr_ik, corr_jk))similarities = pairs.mapValues((corr_ik, corr_jk)) => f(corr_ik, corr_ij))

.reduceByKey(_ + _)

Page 22: Uncovering concepts and similarities at scale

First example: Natural Language

Page 23: Uncovering concepts and similarities at scale

Second example: Music

Page 24: Uncovering concepts and similarities at scale

Third example: Molecular biology

Page 25: Uncovering concepts and similarities at scale

Discovering concepts

Page 26: Uncovering concepts and similarities at scale

Discovering concepts

Concept = group of inter-similar objects

Page 27: Uncovering concepts and similarities at scale

Discovering concepts

Cluster objects in the similarity graph

Concept = group of inter-similar objects

Page 28: Uncovering concepts and similarities at scale

Discovering concepts

Cluster objects in the similarity graph

Algorithm based on SLPA by Xie et al.

Concept = group of inter-similar objects

Page 29: Uncovering concepts and similarities at scale
Page 30: Uncovering concepts and similarities at scale
Page 31: Uncovering concepts and similarities at scale
Page 32: Uncovering concepts and similarities at scale

Example concepts

Page 33: Uncovering concepts and similarities at scale

Fine-grain artist concepts

Page 34: Uncovering concepts and similarities at scale

Use for recommendation

Page 35: Uncovering concepts and similarities at scale

Use for recommendation

Graph structure gives us unique possibilities

Page 36: Uncovering concepts and similarities at scale

2 Dope Kgp

Soopa Villainz

Natas

Claas

Mastamind

Marz

D. Snipes

Lo-Key

Mcnastee

Jamie Madrox

Snipes

Heavy Hittaz

Castro The Savage

Astray

Violent J

Proof Halfbreed

Rukus

Level Jumpers

Mc Breed

Doomsday Productions

TwzitdProject Born

Q-StrangeBoondox

Blaze Ya Dead Homie

House Of Krazees

Obie Trice

Psychopathic Rydas

Dark Lotus

Monoxide

Royce Da 5'9"V-Sinizter

Detroit Warriors

Native Funk

Joe Bruce

The R.O.C.

Psychopathic Family

Bedlam

Shoestring

Bootleg

Mc Supernatural Ft. DilatedPe

Gza/Genius / Rza & Gza

Diabolic Diabolic Of Triple Optics

Awol One & Factor

Mr. Lif

Rza

Masta Ace

Big L

Az

CappadonnaBlack Star

Ghostface Killah

Raekwon

Nas

Method Man

Jay-Z

Gang StarrThe Notorious B.I.G.

Mobb Deep

Wu-Tang Clan

The Game

Gza/Genius Q-Unique

Jc001

Sick Since

Non PhixionJean Grae

Remo'Conscious

Ill BillGravediggaz

Akir

Immortal TechniqueSabac

Method Man & Redman

Killah Priest

Classic Wu-TangInstrumentals

Dj Muggs Vs. Gza/Genius

U-God

Ol' Dirty Bastard

Intricate Minds

Ice-T

Abk

Jumpsteady

Twiztid

Masta Killa

Inspectah Deck

Wordsworth

Jeru The Damaja

D.I.T.C.

Jedi Mind Tricks

Shyheim

Lloyd Banks

Rakim

Afu-Ra

Army Of The Pharaohs

Use for recommendation

Use case: Graph analytics

Page 37: Uncovering concepts and similarities at scale

Use for recommendation

Use case: Walk the graph

Page 38: Uncovering concepts and similarities at scale

Use for recommendation

Use case: Walk the graph

Page 39: Uncovering concepts and similarities at scale

Use for recommendation

Use case: Walk the graph

Page 40: Uncovering concepts and similarities at scale

Scalability

Page 41: Uncovering concepts and similarities at scale

Locality: Similar objects are close in the graph

Sparseness: Most objects are unrelated

Scalability

Page 42: Uncovering concepts and similarities at scale

Locality: Similar objects are close in the graph

Sparseness: Most objects are unrelated

Scalability

Page 43: Uncovering concepts and similarities at scale

Scalability

Locality + Sparseness = Scalability

Billion words on laptop Google books on Amazon EC2

Page 44: Uncovering concepts and similarities at scale

Scalability

Locality + Sparseness = Scalability

● Similarity calculation is done in one iteration● Runtimes in the order of minutes vs. hours/days for word2vec and collaborative

filtering.● Unsupervised learning method, ideal for implicit usage data.

Page 45: Uncovering concepts and similarities at scale

Future work

Apply

Page 46: Uncovering concepts and similarities at scale

Future work

Apply

as recommendation system

Page 47: Uncovering concepts and similarities at scale

Future work

Apply

as recommendation system

to find higher-order dynamics …

Page 48: Uncovering concepts and similarities at scale

Dark Lotus

ProofProject Born

Q-StrangeMc Breed

Obie Trice

Bootleg

Joe Bruce

Shoestring

The R.O.C.Bedlam

Psychopathic Family

Castro The Savage

Jamie Madrox

Lo-Key

Snipes

Astray

Heavy Hittaz

Mcnastee

D. Snipes

Violent JTwzitd

Doomsday Productions

BoondoxRukus

Level Jumpers

Halfbreed

Psychopathic Rydas

Royce Da 5'9"

Detroit WarriorsMonoxide

Native Funk

V-SinizterBlaze Ya Dead Homie

House Of Krazees

Soopa Villainz

Natas

Kgp

Claas

Mastamind

Marz

2 Dope

Abk

Remo'ConsciousJc001

JumpsteadyIll Bill

Jay-Z

The Game

Wu-Tang Clan

Gza/Genius The Notorious B.I.G.

Mobb Deep

Gang Starr

Mc Supernatural Ft. DilatedPe

Awol One & Factor

Diabolic Diabolic Of Triple Optics

Gza/Genius / Rza & Gza

Mr. Lif

GravediggazBlack Star Nas

Ghostface Killah

RaekwonCappadonna

Classic Wu-TangInstrumentals

U-GodJeru The Damaja

Dj Muggs Vs. Gza/Genius

Shyheim

WordsworthKillah Priest

Jean Grae Ice-T

Akir

Non Phixion

SabacSick Since Intricate Minds

Twiztid

Immortal Technique

Q-Unique

Lloyd Banks

Army Of The Pharaohs

Az

Big L

Rakim

Afu-Ra

Masta Ace

D.I.T.C.

Jedi Mind TricksRza

Inspectah Deck

Masta Killa

Ol' Dirty Bastard

Method Man

Method Man & Redman

Page 49: Uncovering concepts and similarities at scale

Dark Lotus

ProofProject Born

Q-StrangeMc Breed

Obie Trice

Bootleg

Joe Bruce

Shoestring

The R.O.C.Bedlam

Psychopathic Family

Castro The Savage

Jamie Madrox

Lo-Key

Snipes

Astray

Heavy Hittaz

Mcnastee

D. Snipes

Violent JTwzitd

Doomsday Productions

BoondoxRukus

Level Jumpers

Halfbreed

Psychopathic Rydas

Royce Da 5'9"

Detroit WarriorsMonoxide

Native Funk

V-SinizterBlaze Ya Dead Homie

House Of Krazees

Soopa Villainz

Natas

Kgp

Claas

Mastamind

Marz

2 Dope

Abk

Remo'ConsciousJc001

JumpsteadyIll Bill

Jay-Z

The Game

Wu-Tang Clan

Gza/Genius The Notorious B.I.G.

Mobb Deep

Gang Starr

Mc Supernatural Ft. DilatedPe

Awol One & Factor

Diabolic Diabolic Of Triple Optics

Gza/Genius / Rza & Gza

Mr. Lif

GravediggazBlack Star Nas

Ghostface Killah

RaekwonCappadonna

Classic Wu-TangInstrumentals

U-GodJeru The Damaja

Dj Muggs Vs. Gza/Genius

Shyheim

WordsworthKillah Priest

Jean Grae Ice-T

Akir

Non Phixion

SabacSick Since Intricate Minds

Twiztid

Immortal Technique

Q-Unique

Lloyd Banks

Army Of The Pharaohs

Az

Big L

Rakim

Afu-Ra

Masta Ace

D.I.T.C.

Jedi Mind TricksRza

Inspectah Deck

Masta Killa

Ol' Dirty Bastard

Method Man

Method Man & Redman

Node 8

Node 3

Node 10Node 1

Node 9

Node 5

Node 11

Node 6

Node 7Node 4

Page 50: Uncovering concepts and similarities at scale

Thank you!

@[email protected]

Code: github.com/sics-dna/concepts

Page 51: Uncovering concepts and similarities at scale

Similarity calculation

Page 52: Uncovering concepts and similarities at scale

Similarity calculation

Page 53: Uncovering concepts and similarities at scale

Binary correlation example

Page 54: Uncovering concepts and similarities at scale

Binary correlation example

Similarity:

Page 55: Uncovering concepts and similarities at scale

Binary correlation example

Similarity:

i

j

Page 56: Uncovering concepts and similarities at scale

Binary correlation example

Similarity:

σi,j = 4/5

i

j