Top Banner
A File-based Approach for Recommender Systems in High-Performance Computing Environments Simon Dooms @sidooms
15

A File-Based Approach for Recommender Systems in High-Performance Computing Environments

May 10, 2015

Download

Technology

Simon Dooms

How to create a recommender system that works without a database backend and therefore allows perfect scaling across an arbitrary number of computing nodes and multiple cores?
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

A File-based Approach for Recommender Systems in High-

Performance Computing Environments

Simon Dooms

@sidooms

Page 2: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Introduction

Is a database always the best option?

IntroIntro Hardware Workflow Item User Calc Results Concl.

09/02/2011 Simon Dooms - Ghent University - RSmeetDB '11 2

0.5%

99.5%

Page 3: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Hardware

Shared storage (RAID5)

Infiniband connectX DDR

194 computing nodes:8 cores @ 2.5 GHz16 GB RAM146 GB local storage

IntroHardwareHardware Workflow Item User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 309/02/2011

Page 4: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Recommendation workflowIntro Hardware

WorkflowWorkflow Item User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 4

Consumptions Item Metadata

Item SimilarityCalculation

Item

Similarities

RecommendationCalculation

User Similarities

User SimilarityCalculation

Consumptions

Consumptions Item

Similarities

Phase 1: Item Similarity

Phase 2: User Similarity

Phase 3: Recommendation

09/02/2011

Page 5: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Item similarityIntro Hardware Workflow

ItemItem User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 5

Item Metadata

Item SimilarityCalculation

Item

Similarities

09/02/2011

Page 6: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Item similarityIntro Hardware Workflow

ItemItem User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 6

node node node node node

C C C C C C C C C C

09/02/2011

Page 7: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

File bucketsIntro Hardware Workflow

ItemItem User Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 7

MODULO

Example for 3 buckets

09/02/2011

Page 8: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Writing item similaritiesIntro Hardware Workflow

ItemItem User Calc Results Concl.

C C C C C C

Local Storage

Shared Storage

Simon Dooms - Ghent University - RSmeetDB '11 809/02/2011

Page 9: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

User SimilarityIntro Hardware Workflow Item

UserUser Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 9

Item

Similarities

User

Similarities

User SimilarityCalculation

Consumptions

09/02/2011

Page 10: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

User SimilarityIntro Hardware Workflow Item

UserUser Calc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 10

C C C C

nodenodenode

node

09/02/2011

Page 11: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Recommendation calculationIntro Hardware Workflow Item User

CalcCalc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 11

User

Similarities

RecommendationCalculation

Consumptions Item

Similarities

09/02/2011

Page 12: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Recommendation calculationIntro Hardware Workflow Item User

CalcCalc Results Concl.

Simon Dooms - Ghent University - RSmeetDB '11 12

SimilaritiesItem

SimilaritiesUser

09/02/2011

Page 13: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

ResultsIntro Hardware Workflow Item User Calc

ResultsResults Concl.

Simon Dooms - Ghent University - RSmeetDB '11 13

• Proof of concept implementation• Cultural events dataset– 5 months of data– 53,000 items– 1,700 users– 14,000 => 6,800 consumptions

09/02/2011

Used number of nodes: 10, 20, 40, 80, 160Execution time scales inversely with number of nodes

Page 14: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Conclusion

• A file-based approach for HPC• Workflow as independent subjobs • Workflow ≈ embarrasingly parallel• Approach both scalable and memory efficient

Intro Hardware Workflow Item User Calc ResultsConcl.Concl.

Simon Dooms - Ghent University - RSmeetDB '11 1409/02/2011

Page 15: A File-Based Approach for Recommender Systems in High-Performance Computing Environments

Simon Dooms

@sidooms

A File-based Approach for Recommender Systems in High-

Performance Computing Environments

With the support of IWT Vlaanderen, Stevin Supercomputer Infrastructure at Ghent University, the Hercules Foundation and EWI