Research Interests : Research Interests : Their Dynamics, Structures and Their Dynamics, Structures and Applications in Personalized Web Search Applications in Personalized Web Search Yi Zeng 1 , Erzhong Zhou 1 , Xu Ren 1 , Yulin Qin 1,3 , Ning Zhong 1,2 , Zhisheng Huang 4 1. International WIC Institute, Beijing University of Technology, China 2. Maebashi Institute of Technology, Japan 3. Carnegie Mellon University, USA 4. Vrije University Amsterdam, the Netherlands
42
Embed
Research Interests : Their Dynamics, Structures and Applications in Personalized Web Search
About how user interests (more specifically research interests of scientists) can be quantitatively analized and used in personalized Web search (Invited talk at Microsoft Research Asia NLC Group).
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research Interests : Research Interests : Their Dynamics, Structures andTheir Dynamics, Structures and
Applications in Personalized Web SearchApplications in Personalized Web Search
Yi Zeng1, Erzhong Zhou1, Xu Ren1, Yulin Qin1,3, Ning Zhong1,2 , Zhisheng Huang4
1. International WIC Institute, Beijing University of Technology, China
2. Maebashi Institute of Technology, Japan
3. Carnegie Mellon University, USA
4. Vrije University Amsterdam, the Netherlands
Web Intelligence Consortium
The Large Knowledge Collider Project
33
13 partner institutions (from 11 countries, 2 from Asia)
a platform for infinitely scalable querying and reasoning on the linked data-web.
Motivation
Vague/Incomplete queries over large scale data.
(How to get more refined queries to reduce the size of the result set?).
Large scale data vs most relevant data for a specific user.
Diversity for different users in the context of large scale data.
Realizing Diversity of Users by user interests.
Understanding the structural and dynamical characteristics of user interests is the foundation for its utilization in Web search refinement.
The Acquisition, Structure and Dynamics of Research Interests
Why?
Human Learning Theory [Bransford 2000]
Basic Level Advantage [Rogers 2007]
How?
Identifying key interests
Utilizing interests for the unification of knowledge retrieval and reasoning.
What if the interests are dynamic changing? And is it really changing all the time? And how?
Different Interests Evaluation Functions
(Frequency) Cumulative Interest :
1( ( ), ) ( ), .
n
jCI t i n yt i j
An analysis of cumulative interests in different time intervals. (Paul Erdos, with more than 1400 papers involved)
Statistical Characteristics analysis: All the plots are distributed around a strait line, and by Shapiro wilks measurement, the significance value is 0.058, which is greater than 0.05, hence the distribution of Erdos’s publication number over years is a normal distribution. Cumulative Interests of an author may follow different kinds of distributions.
The “ Basic level advantage ” [Rogers2007]. Concepts in a basic level -- > more frequently than other terms [Wisniewski1989].
Weights of Interests
Users’ interests will be distracted if they hold various interests at the same time.
For each of the interests, they have ups and downs. It can be discovered by the change of relative weights of the interests compared to other interests.
An analysis of Ricardo Baeza-Yates’ weighted interests w(t(i), j).
( ),
( ),1
( ( ), )t i j
nt i j
i
yw t i j
y
Obtaining the Retained Interests
Except for frequency, what else is important to correctly obtain retained interests?
Forgetting mechanism in cognitive memory retention
(exponential function model, power function model) [Anderson, Schooler 1991].
Pictures from: [Schooler 1993] Schooler, L. J. & Anderson, J. R.: Recency and Context: An Environmental Analysis of Memory. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pp. 889-894, 1993.
(Frequency and Recency) Memory Retention:
;bT bP Ae P AT
Obtaining the Retained Interests (cont.)
[Zeng 2009a] Cognitive Memory Retention Based Starting Point for Query Extension and Granular Selection, Yi Zeng, Haiyan Zhou, Ning Zhong, Yulin Qin, Shengfu Lu, Yiyu Yao, Yang Gao. In: Cognitive Memory Component (v1), LarKC deliverable 2-3-1, Coordinated by Jose Quesada and Yi Zeng, March 30, 2009.[Zeng 2009b] Yi Zeng, Yiyu Yao, Ning Zhong. DBLP-SSE: A DBLP Search Support Engine, In: Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Milan, Italy, September 15-18, 2009.[Maanen 2009] Leendert van Maanen, Julian N. Marewski.: Recommender Systems for Literature Selection: A Competition between Decision Making and Memory Models, CogSci 2009, July 31-August 1, 2009.
(Frequency and Recency) Exponential Model for Interest Retention :
(Frequency and Recency) Power Model for Interest Retention :
,
1( ) ( , ) i j
n bT
jEIR i m i j Ae
,1( ) ( , )
n bi jj
PIR i m i j AT
Obtaining the Top N Interests
A comparative study of total research interests from 1990 to 2008 and retained interests in 2009 (based on both the power law and exponential law models).
Difference on the contribution values from papers published in different years.
• Retained interest vs future interests.• publication numbers are within [200, 300]• top 9 interests • 2001 to 2008 • 140 persons• 51.14% predict 5 out of 9 interests. • Spearman rank correlation: rho = 0.66• 1-tail t-test: 0.02 (close to statistically
significant)
Building and Analyzing the Structure of Research Interests
Observed Phenomenon:
[1] main research interests (pivotal nodes) are dynamically changing all the time. With older ones disappear and new ones emerged.
[2] Relations among research interests varies as time passed (strengthen or weaken).
[3] main research interests are closely related to each other. (The closeness is getting stronger from time to time, which made the degree of separation around 2-3. It indicates that for an author, research interests are not isolated but highly relevant.
[4] Many top research interests (pivotal nodes) remain active in the interest network (e.g. search, analysis, match).
Figure 7. Ricardos research interest dynamic evolution network from 1991 to 2009. (Based on DBLP publication list, with 232 papers involved). The network is a graph with weighted edges and weighted vertices.
An Author’s Research Interest Evolution Network
Statistical Characteristics on the Dynamics of Total Research Interests
Not a pure random Process !
There might be some universal characteristics and hidden rules!
Pictures from math.ucsd.edu. and math.tsukuba.ac.jp
Figure 2: Power-law distribution on weights of research interests for Leonhard Euler (Publication list is from Euler's Archive, with 856 papers), Paul Erdos (publication list is from Erdos' publication collection project (1929-1989) and MathSciNet (1990-2004), with 1437 papers involved. Translation of titles from German, French, Hungarian has been made by google translation and Babylon translation), and Ricardo Baeza-Yates (from DBLP). (With processing on meaningless words, tense, singular, plural form, third person, etc.
Figure 12. Zdzislaw Pawlak’s Interest connection network (1984-2008, with 62.1% interests directly connected to “rough”).
Timing characteristics of research interests
Dynamic characteristics of lnterest Longest Duration and Interest Cumulative Duration.
Figure 9: Ricardo's research interest lasting time and appear time distribution statistics.
• a few large spikes in the plot, corresponding to very long interest longest duration and interest cumulative duration for some research interests : non-Poisson process;
Figure 9(b) : the probability of having n research interests whose lasting time is a fixed time interval .
( )
statistical distribution approximation: ( )P 2.30( 0.26) ' 1.64( 0.18) (by linear
fit)
Inspired by Human Dynamics [Barabasi 2005]
Explanations on the Observed Power Law Distributions
What causes the “Scale-free Phenomenon” in research interests?
Researchers are likely to work around a few more general topics and the more specific topics are changing from time to time, but around or very related with these general topics.
The picture is from: Peter Csermely. Weak Links: Stabilizers of Complex Systems from Proteins to Social Networks, Springer, 2006.
[Simon 1955] Simon, H.: On a class of skew distribution functions. Biometrika 42, 425–440, 1955.[Barabasi 1999] Barabasi, A.L. and Albert, R. : Emergence of scaling in random networks. Science 286, 509–512.‘the rich get richer’ effect [Simon1955]
A Comparative Study of Different Interest Evaluation Methods
'
1
( ), ( ( ( )), ).n
n
ICD(t i n) ID t i n
Interests Longest Duration
Interests Cumulative Duration
( ), max( ( ( )), ).ILD(t i n) ID t i n
Zhisheng Huang’s Interests Evaluation from CI, ILD and ICD
Social Network based Group Interests Models
Carlos Castillo
Ricardo A. Baeza-Yates
Web
PageRank
Network
Spam
Search
DetectionAnalysis
Link
ContentWeb
Search
RetrievalInformation
Query
Analysis
Challenge
Engine Mining
1( ( ), ) ( ( ), , ),
1 ( (i) I )( ( ), , ) .
0 ( (i) I )
m
c
topNc
topNc
GI t i u E t i u c
tE t i u c
t
• An example of Group Interest.
How to acquire the top N interests?
• Group Interest Function:
Overlap of User Interests and Group Interests
Top 9
Retained Interests
Top 9 Group Retained Interests
Web 7.81 Search 35
Search 5.59 Retrieval 30
Retrieval 3.19 Web 28
Information 2.27 Information 26
Query 2.14 System 19
Engine 2.10 Query 18
Minining 1.26 Analysis 14
Challenge … Text …
Analysis … Model …
Top 9 interests retention of a user and his group interests retention. (Ricardo A. Baeza-Yates, based on May 2008 version of SwetoDBLP).
A Step Forward : Semantic Similarity---- Obtaining More Accurate Interest Descriptions
Consistent interests without consideration of semantic similarity.
Carlos Castillo
Ricardo A. Baeza-Yates
Web
PageRank
Network
Spam
Search
DetectionAnalysis
Link
ContentWeb
Search
RetrievalInformation
Query
Analysis
Challenge
Engine Mining
Consistent interests with consideration of semantic similarity.
Carlos Castillo
Ricardo A. Baeza-Yates
Web
PageRank
Network
Spam
Search
DetectionAnalysis
Link
ContentWeb
Search
RetrievalInformation
Query
Analysis
Challenge
Engine Mining
Semantic Similarity and Interests Re-ranking
Semantic Similarity judges by Normalized Google Distance
[Rudi and Paul 2007]max{log ( ), log ( )} log ( , )
( , ) ,log min{log ( ), log ( )}
f x f y f x yNGD x y
M f x f y
Normalized Google Distance
interest x
interest y
NGD interest x interest y NGD
search retrieval 0.529 logic reasoning 0.239
search query 0.483 logic semantic 0.276
search pagerank 0.490 ontology semantic -0.003
retrieval query 0.403 reasoning semantic 0.050
retrieval pagerank 0.497 logic ontology 0.332
Query pagerank 0.460 ontology reasoning 0.080
( , ) 0.3NGD x y
Google, Bing as the Knowledge base.
A comparative study of interests ranking without and with re-ranking strategy
Semantic Similarity and Interests Re-ranking (cont.)
( ), '( ) ( ) 1 ( ( ) ( ))'( ) .
, '( ) ( ( ) ( ))
rank x rank y rank x rank x rank yrank x
rank(y)+1 rank y rank(y) rank y rank x
Interests Re-ranking Function
( ( ), )CI t i n ( ( ), )PRI t i n ( ( ), )ERI t i n ( ( ), )CI t i n ( ( ), )PRI t i n ( ( ), )ERI t i n
Without semantic similarity based re-ranking (a) With semantic similarity based re-ranking (b)
An extension of current FOAF vocabulary in the Semantic Web community. Following the definition of “user interests” in the above slide. Describe user interests quantitatively from various perspectives.
Integration of WI and e-FOAF:interests by FOAF community
By Balthasar A.C. Schopman from Vrije University Amsterdam
Integration of WI and e-FOAF:interests by FOAF community (cont.)
The wi:ComplexInterest concept as graph with relations:
This photo is taken by Professor Lora Aroyo from Vrije University Amsterdam at Vocamp 2010.
Computer Scientists’ Research Interests Dataset
We analyzed research interests of all the computer scientists in DBLP from different perspectives.
We released the “computer scientists’ research interest RDF dataset : http://wiki.larkc.eu/csri-rdf ” (0.19 billion triples)
The Utilization of e-FOAF:interests Vocabulary
Accessing user interests and downloading them as an RDF file.
The utilization of the interests dataset.
The SPARQL endpoint for DBLP user interests is available at http://www.wici-lab.org/wici/dblp-sse/
Dieter & Frank 2007
Bring User Interests to Literature Search Refinement
User interests
“ They come to formal education with a range of prior knowledge, skills, beliefs, and concepts that significantly influence what they notice about the environment and how they organize and interpret it. This, in turn, affects their abilities to remember, reason, solve problems, and acquire new knowledge. ” [Bransford 2000]
Human acquire new knowledge based on pre-existing knowledge. People with different background knowledge will have various personal understanding of the same knowledge source.
Literature search systems are for researchers to acquire knowledge for their needs based on their queries.
Pre-existing Knowledge Search+ Acquired
Knowledge
Useful literatures that are relevant to the query and authors’ research interests
Search Refinement by Interests from Different Perspectives
Vague/incomplete queries may produce too many results that the users have to wade through.
Research interests may be very related with search tasks.
Research interests can be evaluated from various perspectives.
(1) Cumulative Interests;
(2) Retained Interests;
(3) Interests Longest Duration;
(4) Interests Cumulative Duration;
(5) Group interests;
DBLP-SSE : DBLP Search Support Engine
* Web Intelligence and Artificial Intelligence in Education. * Artificial Intelligence Exchange and Service Tie to All Test Environments (AI-ESTATE)-A New Standard for System Diagnostics. * Semantic Model for Artificial Intelligence Based on Molecular Computing. * Open Information Systems Semantics for Distributed Artificial Intelligence. * Artificial Intelligence and Financial Services.* …
with current interests constraints (Top 5 results)List 2 :
* PROLOG Programming for Artificial Intelligence, Second Edition. * Artificial Intelligence Architectures for Composition and Performance Environment. * Artificial Intelligence in Music Education: A Critical Review. * Music, Intelligence and Artificiality. Artificial Intelligence and Music Education. * Musical Knowledge: What can Artificial Intelligence Bring to the Musician?* ...
without current interests constraints (Top 5 results)List 1 :
Artificial Intelligence Query :
Web, Service, Semantic, Architecture, Model, Ontology, Knowledge, Computing, Language
Top 9 interests
Dieter Fensel Log in
The DBLP dataset
Web Semantic
Knowledge
Sub datasets pre-selection
Search Results without any Refinement
Search Results with Interests-based Refinement
http://www.wici-lab.org/wici/dblp-sse/
User Evaluation of Refinement Strategy
Participants 7 DBLP authors:
Preference order 100% :
Preference order 100% :
Preference order 83.3% :
Preference order 16.7% :
2, 3 1List List List
2 3List List
2 3 1List List List
3 2 1List List List
Social Relation Based Search Refinement: Let Your Friends Help You!. Xu Ren, Yi Zeng, Yulin Qin, Ning Zhong, Zhisheng Huang, Yan Wang, and Cong Wang. Proceedings of the 2010 International Conference on Active Media Technology, Lecture Notes in Computer Science 6335, 475-485, 2010.
Scalability for Query Time
Unrefined query
Refined query based on interests
Interest based selection before querying
Query Time
medium much slower the fastest
Results may be very far from user needs
much closer to user needs
equivalent to Refined query based on interests
With selection: approximately 80% of the time can be saved.
The Effect of Query Constraints Numbers
Recall and Spent Time(Unrefined queries vs Interest-based Selection
As the data goes to larger scale, getting almost the same recall compared to unrefined queries, the ratio of spent time is almost linear growing.
Some times one can get bigger recall while the ratio of spent time is lower.
Context-Aware Linked Life Data Search
Utilizing user interests to refine vague and incomplete search
Publications related to this talk
Research Interests : Their Dynamics, Structures and Applications in Web Search Refinement. Yi Zeng, Erzhong Zhou, Yulin Qin, and Ning Zhong. Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Toronto, Canada, August 31- September 3, 2010.
User Interests: Definition, Vocabulary, and Utilization in Unifying Search and Reasoning. Yi Zeng, Yan Wang, Zhisheng Huang, Danica Damljanovic, Ning Zhong, and Cong Wang. Proceedings of the 2010 International Conference on Active Media Technology, Lecture Notes in Computer Science 6335, 98-107, 2010.
Social Relation Based Search Refinement: Let Your Friends Help You!. Xu Ren, Yi Zeng, Yulin Qin, Ning Zhong, Zhisheng Huang, Yan Wang, and Cong Wang. Proceedings of the 2010 International Conference on Active Media Technology, Lecture Notes in Computer Science 6335, 475-485, 2010.
Normalized Medline Distance and Its Utilization in Context-aware Life Science Literature Search. Yan Wang, Cong Wang, Yi Zeng, Zhisheng Huang, Vassil Momtchev, Bo Andersson, Xu Ren, and Ning Zhong. Proceedings of the 4th Chinese Semantic Web Symposium, August 19-21, 2010 (Recommended to Tsinghua Science and Technology, Elsevier).
User-centric Query Refinement and Processing Using Granularity Based Strategies. Yi Zeng, Ning Zhong, Yan Wang, Yulin Qin, Zhisheng Huang, Haiyan Zhou, Yiyu Yao, and Frank van Harmelen. Knowledge and Information Systems, Springer.
DBLP-SSE: A DBLP Search Support Engine, Yi Zeng, Yiyu Yao, Ning Zhong. In: Proceedings of the 2009 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Milan, Italy, September 15-18, 2009.