Talk given at CBMI 2013 (Veszprém, Hungary) on 19.06.2013
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Scalable high-dimensional indexing with HadoopTEXMEX team, INRIA Rennes, France
Denis Shestakov, PhDdenis.shestakov at {aalto.fi,inria.fr}linkedin: linkedin.com/in/dshestakovmendeley: mendeley.com/profiles/denis-shestakov
○ 1TB dataset => 1186 blocks of 1024MB size○ Assuming 8-core nodes and reported searching
method: no scaling after 149 nodes (i.e. 8x149=1192)
○ Solutions:■ Smaller HDFS blocks, e.g., scaling up to 280 nodes for
512MB blocks■ Re-visit search process: e.g., partial-loading of lookup
table● Big data is here but not resources to process
○ E.g, indexing&searching >10TB not possible given resources we had
Things to share● Our methods/system can be applied to audio datasets
○ No major changes expected○ Contact me if interested
● Code for MapReduce-eCP algorithm available on request○ Should run smoothly on your Hadoop cluster○ Interested in comparisons
● Hadoop job history logs behind our experiments (not only for those reported at CBMI) available on request○ Describe indexing/searching our dataset by giving details
on map/reduce tasks execution○ Insights on better analysis/visualization are welcome○ Job logs for CBMI'13 experiments: http://goo.gl/e06wE