Introduction Method Results Summary Contextualization of Topics browsing through terms, authors, journals and cluster allocations Rob Koopman 1 Shenghui Wang 1 Andrea Scharnhorst 2 1 OCLC Research 2 DANS-KNAW ISSI 2015
Aug 12, 2015
Introduction Method Results Summary
Contextualization of Topicsbrowsing through terms, authors, journals and cluster allocations
Rob Koopman1 Shenghui Wang1 Andrea Scharnhorst2
1OCLC Research 2DANS-KNAW
ISSI 2015
Introduction Method Results Summary
Introduction
What are essence and boundary of a scientific field?
Different ways to find clusters in scientific literature based onconnectivity in terms of authorship, citations, languagesimilarity, etc.
Ambiguous nature in science
Introduction Method Results Summary
Ariadne: interactive context explorer
Ariadne is an interactive interface which allows users toexplore the context of entities such as authors, journals,topical terms, etc.
It builds on semantic indexing statistically computed from alarge scale bibliographic corpus
It was originally implemented to explore 1M topical terms, 3Mauthors, 35K journals and 700+ Dewey decimal classesassociated with 65M articles.
Introduction Method Results Summary
Research questions
Q1: How does the Ariadne algorithm work on a much smaller,field specific dataset?
Q2: Can we use Ariadne to label the clusters produce by thedifferent methods?
Q3: Can we use Ariadne to compare different clusteringsolutions?
Introduction Method Results Summary
LittleAriadne
LittleAriadne: context explorer over Astrophysics data
Offline: generates a semantic representation for each entity
Online: finds the most related entities and usingmultidimensional scaling to display
Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Transfer Rate in SS Cyg
Abstract The mass transfer rate in SS Cyg at quiescence, estimatedfrom the observed luminosity of the hot spot, is log M-tr= 16.8 +/- 0.3. This is safely below the critical masstransfer rates of log M-crit = 18.1 (corresponding to logT-crit(0) = 3.88) or log M-crit = 17.2 (corresponding tothe “revised” value of log T-crit(0) = 3.65). The masstransfer rate during outbursts is strongly enhanced
Author [author:smak j]
ISSN [issn:0001-5237]
Subject [subject:accretion, accretion disks] [subject:cataclysmicvariables] [subject:disc instability model] [subject:dwarf novae][subject:novae, cataclysmic variables] [subject:outbursts][subject:parameters] [subject:stars] [subject:stars dwarf novae][subject:stars individual ss cyg] [subject:state] [subject:superoutbursts]
Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Transfer Rate in SS Cyg
Abstract The mass transfer rate in SS Cyg at quiescence, estimatedfrom the observed luminosity of the hot spot, is log M-tr= 16.8 +/- 0.3. This is safely below the critical masstransfer rates of log M-crit = 18.1 (corresponding to logT-crit(0) = 3.88) or log M-crit = 17.2 (corresponding tothe “revised” value of log T-crit(0) = 3.65). The masstransfer rate during outbursts is strongly enhanced
Author [author:smak j]
ISSN [issn:0001-5237]
Subject [subject:accretion, accretion disks] [subject:cataclysmicvariables] [subject:disc instability model] [subject:dwarf novae][subject:novae, cataclysmic variables] [subject:outbursts][subject:parameters] [subject:stars] [subject:stars dwarf novae][subject:stars individual ss cyg] [subject:state] [subject:superoutbursts]
Cluster label [cluster:a 19] [cluster:b 16] [cluster:c 15][cluster:d 51] [cluster:e 17] [cluster:f 1]
Introduction Method Results Summary
LittleAriadne
Six different clustering solutions
x Source y=#Cluster #Cluster in LittleAriadnea cwts 1.8 23 23b UMSI 23 23c oclc 20 20 20d hu 139 48e sts 5664 229f ECOOM 15 15
Introduction Method Results Summary
LittleAriadne
Entities in the Astrophysis dataset
There are in total 90,343 entities associated with 111,616astrophysics articles
59 journals
27,027 author names (no disambiguation applied)
39,577 topical terms
23,322 subjects (extracted from ”Author Keywords” and”Keywords Plus”)
358 cluster labels (source + cluster id)
Introduction Method Results Summary
LittleAriadne
Build semantic representation
Basic assumptions
Entities can be represented by its contextEntities which share more context are more likely to be related
Context is the textual environment where an entity occurs
Introduction Method Results Summary
LittleAriadne
An example article
Article ID ISI:000276828000006
Title On the Mass Transfer Rate in SS Cyg
Abstract The mass transfer rate in SS Cyg at quiescence, estimatedfrom the observed luminosity of the hot spot, is log M-tr= 16.8 +/- 0.3. This is safely below the critical masstransfer rates of log M-crit = 18.1 (corresponding to logT-crit(0) = 3.88) or log M-crit = 17.2 (corresponding tothe “revised” value of log T-crit(0) = 3.65). The masstransfer rate during outbursts is strongly enhanced
Author [author:smak j]
ISSN [issn:0001-5237]
Subject [subject:accretion, accretion disks] [subject:cataclysmicvariables] [subject:disc instability model] [subject:dwarf novae][subject:novae, cataclysmic variables] [subject:outbursts][subject:parameters] [subject:stars] [subject:stars dwarf novae][subject:stars individual ss cyg] [subject:state] [subject:superoutbursts]
Cluster label [cluster:a 19] [cluster:b 16] [cluster:c 15][cluster:d 51] [cluster:e 17] [cluster:f 1]
Introduction Method Results Summary
LittleAriadne
Dimension reduction using Random Projection
masstransfer rate
[subject:outburst][subject:sstars][subject:parameters]
[author:smak j]
[cluster: a19][issn:0001-5237]
Introduction Method Results Summary
LittleAriadne
Dimension reduction using Random Projection
masstransfer rate
[subject:outburst][subject:sstars][subject:parameters]
[author:smak j]
[cluster: a19][issn:0001-5237]
Introduction Method Results Summary
LittleAriadne
From semantic representation to visualisation and more
Each entity has its semantic representation
Cosine similarity between entities can be computed very fast,based on which the 2D visualisation is implemented
For each article, we collected the semantic representation ofall the entities in which it involves, and take an average as itssemantic representation
We applied a standard K-means clustering method to clusterthese articles based on their semantic representations
Introduction Method Results Summary
LittleAriadne
From semantic representation to visualisation and more
Each entity has its semantic representation
Cosine similarity between entities can be computed very fast,based on which the 2D visualisation is implemented
For each article, we collected the semantic representation ofall the entities in which it involves, and take an average as itssemantic representation
We applied a standard K-means clustering method to clusterthese articles based on their semantic representations
Introduction Method Results Summary
Experiment 1: Exploring context
Experiment 1: Exploring context
Now we can explore
Let’s start with starsAn overview of all journals
Introduction Method Results Summary
Experiment 2: Labelling clusters
Experiment 2: Labelling clusters
What is cluster a 2?
Introduction Method Results Summary
Experiment 2: Labelling clusters
Experiment 2: Labelling clusters
Introduction Method Results Summary
Experiment 2: Labelling clusters
Experiment 2: Labelling clusters
Cluster ID Top 9 most related topical terms
a 2 ”cosmology” ”dark energy” ”density perturbations””cosmologies” ”planck” ”cosmological” ”spatialcurvature” ”inflationary” ”inflation”
b 2 ”cosmology” ”cosmological constant” ”cosmologies””cosmological” ”universes” ”dark energy” ”quadratic””tensor” ”planck”
c 17 ”power spectrum” ”cosmological parameters” ”cmb””last scattering” ”anisotropies” ”microwave background””power spectra” ”planck” ”cosmic microwave”
d 28 ”density perturbations” ”inflationary” ”inflation””dark energy” ”scale invariant” ”spatial curvature””cosmological perturbations” ”inflationary models””cosmologies”
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
Experiment 3: Comparing clustering solutions
Cluster labels are treated as entities
Let’s compare
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
Highly similar clustering solutions
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
Partially agreeing clustering solutions
Introduction Method Results Summary
Experiment 3: Comparing clustering solutions
An overview of all clustering solutions
Introduction Method Results Summary
Summary
Summary
We present a method and an interface that allows visualexploration through the contexts of entities
We can provide the most related topical terms to clustersalthough expert knowledge is needed to transform them intoreal labels/topics
LittleAriadne provides a visual way of comparing differentclustering solutions
Our naıve way of clustering is worth exploring further
Introduction Method Results Summary
Future extensions
Future extensions
Add more types of entities, such as citations, publishers,conferences, etc, to provide richer context
Add direct links to articles to answer information retrievalneeds
Study context sensitivity
compare ”young” and ”young”
Introduction Method Results Summary
Thank you
Thank you
http://thoth.pica.nl/astro/relate
Rob Koopman ([email protected])Shenghui Wang ([email protected])Andrea Scharnhorst ([email protected])