Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History Priagung Khusumanegara, Rischan Mafrur, and Deokjai Choi School of Electronics & Computer Engineering Chonnam National University {priagung.123, rischanlab}@gmail.com, [email protected]The 23rd IFIP World Computer Congress (WCC 2015) Tuesday, October 6
15
Embed
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Profiler for Smartphone Users Interests Using Modified Hierarchical
Agglomerative Clustering Algorithm Based on Browsing History
Priagung Khusumanegara, Rischan Mafrur, and Deokjai Choi
School of Electronics & Computer Engineering Chonnam National University
The 23rd IFIP World Computer Congress (WCC 2015)Tuesday, October 6
Outline• Introduction• Proposed System• Data Collection• Data Extraction• URL Filtering Categories• Modified Hierarchical Agglomerative
Clustering• Experimental Results
IntroductionMotivations:• Smartphone provides many applications to support our activity
which one of the applications is web browser applications.• People spend much time on browsing activity for finding useful
information that they are interested on it. • It is not easy to find the particular pieces of information that they
interested on it.
Proposed System:
• We proposed a Modified Hierarchical Agglomerative Clustering Algorithms on a server-based application to provides smartphone users profiling for interests-focused based on browsing history.
Proposed System
Figure: Illustration of Proposed System
Data Collection• Data Collection Application
– It is built using the Funf framework. Funf framework is an extensible sensing and data processing framework for mobile devices.
• Number of participants– 30 Chonnam National University Students (aged 19-22)
• Period of data collection– 1 month (July 2014)
Figure: Data Collection Application
Data Extraction• Example of Collected Data
• We extract collected data to observe a resource name part of URL structure which is useful information to analyze user interest.
• Example of data extraction
URL Filtering Categories• URL Labeling
The URL data is labeled based on a popular Web directory.Open Directory Project (ODP) (dmoz.org).
• URL Filtering Categories
• The matrix of filtering categories
Category Group Category Type
Business Business/Economy, Job Search/Careers, Real Estate, and Shopping
Communications and search Blog/Web Communication, Social Networks, Email, and Search Engines
General Computer/Internet, Education, News/Media, and Reference
Lifestyle Entertainment, Games, Arts, Humor, Religion, Restaurants/Food, and Travel
28: set 𝐶𝑔𝑟𝑜𝑢𝑝 𝑖 which has max ሺ𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡 ሻ𝑎𝑠 𝑢𝑠𝑒𝑟 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡𝑠
Experimental Results
Figure: Degree of Users Interests
Figure: Users Interests Profile
Experimental Results (Cont’d)• C 4.5 is the best known classification
algorithm used to generate decision trees for continuous and discrete attributes.
• Attributes:– Total session time– Total time a user stays at the site– Total number of accessed pages during the
whole session
Experimental Results (Cont’d)
Figure : Execution time comparison with C4.5 algorithm
Experimental Results (Cont’d)
Memory Size (MB) MHAC (%) C4.5 (%)
25 83.3 76.7
50 86.7 80.0
75 93.3 80.0
100 96.7 83.3
125 96.7 86.7
Table. Accuracy comparison with C4.5 algorithm
Conclusion and Futures Work• We proposed a modified Hierarchical Agglomerative Clustering
that can automatically provides a user interests profile.
• The proposed algorithms can measure degree of users’ interests and inferring particular pieces of information that they interested on it based on browsing history
• Our work can outperforms the C4.5 algorithm in execution time and accuracy on all memory utilization.
• In the future, we need to implement Map-Reduce algorithm on modified Hierarchical Agglomerative Clustering to enhance performance of clustering algorithm.
AcknowledgementsThis research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2012R1A1A2007014).