Parallelizing Parallelizing Random Walk with Restart for Random Walk with Restart for Large-Scale Query Recommendation Large-Scale Query Recommendation Meng-Fen Chiang, Tsung-Wei Wang and Meng-Fen Chiang, Tsung-Wei Wang and Wen-Chih Peng Wen-Chih Peng Department of Computer Science Department of Computer Science National Chiao Tung University (R.O.C.) National Chiao Tung University (R.O.C.)
33
Embed
Parallelizing Random Walk with Restart for Large-Scale Query Recommendation
Parallelizing Random Walk with Restart for Large-Scale Query Recommendation. Meng -Fen Chiang, Tsung -Wei Wang and Wen-Chih Peng Department of Computer Science National Chiao Tung University (R.O.C.). Outline. Introduction Related Work problem Definition Parallel RWR - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Parallelizing Parallelizing Random Walk with Restart for Random Walk with Restart for
Mining Temporal Following Mining Temporal Following Patterns in ParallelPatterns in Parallel
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
13
Temporal Following RelationTemporal Following Relation
• Frequent QA browsing behaviors of Frequent QA browsing behaviors of users within a pre-defined time users within a pre-defined time windowwindow– E.g., window size = 150 sec.
• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)– Duration : 1-week in July, 2009– #clicks : 90 M– #items : 4 M– #users : 2 M
• Performance evaluationPerformance evaluation– Quality study– Scalability study– Case study
25
Quality StudyQuality Study
• User access logsUser access logs– Train 80% – Test 20%
• GroundtruthGroundtruth– For each item I clicked by user U– The set of items clicked by U after I within T sec.
• Measure the similarity with historical Measure the similarity with historical user click logsuser click logs– Item-precision– Item-recall
26
Quality Study (contd.)Quality Study (contd.)
– Top-k hot items in the category of test item (HC)
– Temporal following pattern (TFP)– RWR based on temporal following pattern
(RWRTFP)• Higher precision & recall
27
Scalability StudyScalability Study
• Temporal following pattern (TFP)– 4.1M items– 40 sec.• RWR based on temporal following pattern
(RWRTFP)– #sizes of input data – #computing nodes
28
Scalability Study (contd.)Scalability Study (contd.)
• Computational cost is significantly reduced as number of machines increases
• More queries, more computation effective– 0.74 sec. (2K queries) 0.49 sec. (10K
queries)
29
Case StudyCase Study
• Query ItemQuery Item– “What can I do if I do not have Word?”
30
ConclusionConclusion
• Proposes a parallel RWR for multiple Proposes a parallel RWR for multiple query recommendationquery recommendation– Parallelize mining frequent navigation
behavior– Parallelize RWR– Compute RWR for multiple queries in parallel
• The recommender systemThe recommender system– General– Content- agnostic
31
Q & AQ & A
32
Temporal Following Pattern Temporal Following Pattern MiningMining
33
Mapper 1 : Emit temporal
following pairs for each item
Mapper N : Emit temporal
following pairs for each item
Reducer 1 : Aggregate temporal following relation for
each item
Reducer N : Aggregate temporal following relation for