Methodology Illud: Utilizing Semantic Similarity for Image Search Team Members: Kristene Aguinaldo, Seerat Aziz, and Kristian Wu Advisor: Jorge Ortiz, Department of Electrical & Computer Engineering Introduction References Results Doc2vec Conceptual Captions (3.3M [caption, image] pairs) LDA Topic Model Caption URL Vector Topic Neighborhood Parse Document Apply LDA Topic Model StaySense Cosine Similarity on Records in Topic Execution Pipeline Data Management Pipeline Search engines commonly use properties such as key words to query and return the most appropriate results. However, this procedure does not always return the most relevant results. For this reason, our project explores the use of natural language processing to enhance image search as numerous image captioning datasets are available. Through this project, we seek to: • Bridge the gap between visual and textual communication • Make texts more digestible by breaking them down and finding relevant images Acknowledgements We would like to thank our advisor, Professor Jorge Ortiz, for his input and guidance through this project. We would also like to thank our friends and family for their support and response to our project. Lastly, we would to thank the ECE department for making this project happen. LDA Topic Model - Intertopic Distance Map Challenges and Future Steps A musical instrument is a device created to make musical sounds Anything that makes a sound can be used as a musical instrument The history of musical instruments goes back to the beginning of culture People first used instruments as ritual a hunter might use a trumpet to signal a successful hunt a drum might be used in a religious ceremony Cultures later composed and performed a set of sounds called a melody for entertainment Musical instruments were needed Some historians report that the earliest musical instrument was a simple flute. Many of the earliest musical instruments were made from animal skins bone wood and other non-durable materials Snippet of a Sample Text: Musical Instruments Resulting Images (Decreasing Cosine Similarity) Version 1 Challenges • Postgres SQL very slow with querying results from database (stored all 3.3 million records) • Difficult to use KNN to search high dimensional vectors Version 2 Challenges • AWS Elasticsearch (ES) service does not allow the ability to install custom plugins • AWS ES stores 3.1 million results (35 GB maximum) • EC2 instance did not have enough storage to store conceptual captions in local ES index Dataset Challenges • Short captions and LDA model resulted in many overlapping clusters Next Steps • Generate multiple captions per image so that LDA model is more robust (short captions → noise) • Check correlation between image features and caption 1. Q. Le, T. Mikolov. 2014. Distributed Representations of Sentences and Documents . In Proceedings of ICML 2014. 2. T. Doll, “LDA Topic Modeling,” Towards Data Science, 24-Jun-2018. [Online]. Available: https://towardsdatascience.com/lda-topic-modeling-an-explanation-e184c90aadcd. [Accessed: 20-Apr- 2019]. 3. https://github.com/StaySense/fast-cosine-similarity Fig1 LDA Clusters Fig2 Doc2Vec Algorithm [1] Fig3 Fig4 Fig5 • Quality of output was acquired through a survey answered by 54 people, rating the relevance between the paragraph-to- image and paragraph-to-caption from 1 – 4 (Fig 3) • Survey participants responded that images seemed more relevant than their corresponding captions (Fig 5) Fig 6 • Doc 2Vec: utilizes paragraph vectors for predicting words in a paragraph and providing the context of the paragraph [1] • LDA Topic Model: statistical model for classifying text in a document to a set of topics [2] • StaySense : fast vector scoring on ElasticSearch 6.4.x+ using vector embeddings. [3]