Top Banner
Zhiwen Yu, Zhiyong Yu, Xingshe Zhou, Christian Becker, Yuichi Nakamura, "Tree-Based Mining for Discovering Patterns of Human Interaction in Meetings," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 759-768, Apr. 2012 Discovering semantic knowledge is significant for understanding and interpreting how people interact in a meeting discussion. In this paper, we propose a mining method to extract frequent patterns of human interaction based on the captured content of face-to-face meetings. Human interactions, such as proposing an idea, giving comments, and expressing a positive opinion, indicate user intention toward a topic or role in a discussion. Human interaction flow in a discussion session is represented as a tree. Tree-based interaction mining algorithms are designed to analyze the structures of the trees and to extract interaction flow patterns. The experimental results show that we can successfully extract several interesting patterns that are useful for the interpretation of human behavior in meeting discussions, such as determining frequent interactions, typical interaction flows, and relationships between different types of interactions. Index Terms: Human interaction, interaction flow, interaction pattern, meeting, tree-based mining. Tetsuya Nakatoh, Chengjiu Yin, Sachio Hirokawa, "Extraction and Disambiguation of Name of Place from Tourism Blogs," ssne, pp.73-78, 2011 First ACIS International Symposium on Software and Network Engineering, 2011 By development of the Internet in recent years, tourism portal sites and blog articles about tourism increased on WWW. Acquisition of various tourism information became easy. When gathering and classifying the information automatically from blog articles, it is not easy to decide automatically place names used as the key. In this paper, we propose a method of extracting place names from blog articles automatically. Moreover, we also tried disambiguation of a place name.
54
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: i Eee Papers

Zhiwen Yu, Zhiyong Yu, Xingshe Zhou, Christian Becker, Yuichi Nakamura, "Tree-Based Mining for Discovering Patterns of Human Interaction in Meetings," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 4, pp. 759-768, Apr. 2012

Discovering semantic knowledge is significant for understanding and interpreting how people interact in a meeting discussion. In this paper, we propose a mining method to extract frequent patterns of human interaction based on the captured content of face-to-face meetings. Human interactions, such as proposing an idea, giving comments, and expressing a positive opinion, indicate user intention toward a topic or role in a discussion. Human interaction flow in a discussion session is represented as a tree. Tree-based interaction mining algorithms are designed to analyze the structures of the trees and to extract interaction flow patterns. The experimental results show that we can successfully extract several interesting patterns that are useful for the interpretation of human behavior in meeting discussions, such as determining frequent interactions, typical interaction flows, and relationships between different types of interactions.Index Terms: Human interaction, interaction flow, interaction pattern, meeting, tree-based mining.

Tetsuya Nakatoh, Chengjiu Yin, Sachio Hirokawa, "Extraction and Disambiguation of Name of Place from Tourism Blogs," ssne, pp.73-78, 2011 First ACIS International Symposium on Software and Network Engineering, 2011

By development of the Internet in recent years, tourism portal sites and blog articles about tourism increased on WWW. Acquisition of various tourism information became easy. When gathering and classifying the information automatically from blog articles, it is not easy to decide automatically place names used as the key. In this paper, we propose a method of extracting place names from blog articles automatically. Moreover, we also tried disambiguation of a place name.

Content Based Multimedia Retrieval on nontextual documents is often constrained by available metadata. User-generated tags constitute an important source of information about a resource. To enable search scenarios exceeding traditional text-based search, such as exploratory and semantic search, this textual information must be complemented with semantic entities. Due to tag ambiguities and creative neologisms automatic semantic annotation based on user tags represents a major challenge. In this work, we show how to adopt context information and ontological knowledge to automatically assign semantic entities to user-generated tags for video data. Thus, a sophisticated semantic search on semantic entities is enabled. The algorithm combines co-occurence and link graph analysis using Linked Data. Also, a definition of context reliability in audio-visual content is described.Index Terms: Named Entity Recognition, Disambiguation, User-Generated TagsCitation:

Page 2: i Eee Papers

Nadine Ludwig, Harald Sack, "Named Entity Recognition for User-Generated Tags," dexa, pp.177-181, 2011 22nd International Workshop on Database and Expert Systems Applications, 2011

Web Usage mining, also known as Web Log mining, is an application of data mining algorithms to Web access logs to find trends and regularities in Web users' traversal patterns. The results of Web Usage Mining have been used in improving Web site design, business and marketing decision support, user profiling, and Web server system performance. Web page prediction technique is a very important research area in web technologies. Mining is useful for web path traversal pattern from web logs. This paper presents an efficient algorithm for web page prediction from large web logs visited by a user. We assign a significant weight to each page based on time spent by user on each page, visiting frequency and click event done on each page.Index Terms: Weighted Association Rule, Web path traversal, Web usage miningCitation: Rohit Agarwal, K.V. Arya, Shashi Shekhar, Rakesh Kumar, "An Efficient Weighted Algorithm for Web Information Retrieval System," cicn, pp.126-131, 2011 International Conference on Computational Intelligence and Communication Networks, 2011

An Entity Relation Extraction Model Based on Semantic Pattern MatchingChongqing, China October 21-October 23ISBN: 978-0-7695-4555-4Tiezheng Nie Derong Shen Yue Kou Ge Yu Dejun Yue DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/WISA.2011.9 This paper proposes a relation extraction model based on semantic pattern matching in Web environment. It consists of frequent pattern extraction, pattern clustering based on density, and pattern matching based on semantic similarity. First, based on the entities with known relations in a limited training set, we extract relation patterns containing these named entities from the web page. Then the relations between entities from the web page in specific areas can be extracted based on these relation patterns extracted. Experiments show the affectivity and the self-adaptive of our method on extracting relations between entities from dynamic web environment.Index Terms: relation pattern extraction, pattern matching, semantic similarityCitation: Tiezheng Nie, Derong Shen, Yue Kou, Ge Yu, Dejun Yue, "An Entity Relation Extraction Model Based on Semantic Pattern Matching," wisa, pp.7-12, 2011 Eighth Web Information Systems and Applications Conference, 2011

Page 3: i Eee Papers

Mining Information of Anonymous User on a Social Network ServiceKaohsiung, Taiwan July 25-July 27ISBN: 978-0-7695-4375-8Kyung Soo Cho Jae Yoel Yoon Iee Joon Kim Ji Yeon Lim Seung Kwan Kim Ung-Mo Kim DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ASONAM.2011.19 The growing number of individuals is recently writing their own opinions or information freely at the network space on the web such as the blog or Online Cafe and these network spaces are developed toward a new service called social network. Consequently, a lot of researchers are studying this social network lively. The social network is not only a tool for providing real time news, but also having an effect on an opinion or a policy decision. It uses mainly texts to present information in real-time, and its users use a computer or a mobile device to upload their own opinions or ideas. A lot of texts from users contain important and various information. We will get this information we need real-time if semantic can be extracted from texts on a social network. In this case, Opinion mining should be useful to extract semantic from social network. This paper suggests a noble method to grasp information of anonymous users through relationship information available and their psychology that is reflected on texts and also understand the meanings of contents in depth.Index Terms: Anonymous, Opinion mining, LIWC, Social network service, Relation numberCitation: Kyung Soo Cho, Jae Yoel Yoon, Iee Joon Kim, Ji Yeon Lim, Seung Kwan Kim, Ung-Mo Kim, "Mining Information of Anonymous User on a Social Network Service," asonam, pp.450-453, 2011 International Conference on Advances in Social Networks Analysis and Mining, 2011

Ranking the Authority of Name Aliases for Email UsersShanghai, China November 04-November 06ISBN: 978-0-7695-4559-2Yin Meijuan Wang Qingxian Chen Shuming Liu Xiaonan Luo Xiangyang DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MINES.2011.39 Identifying information telling real identity of users from emails is an important topic in email mining. This paper focuses on the problem to identify the most authoritative aliases of email users. Based on the facts that different aliases of the same user take different positions in the user's email communication relation and that af final aliases of the same user have similar morphology in linguistics, a novel method to rank alias authority and obtain the authoritative aliases of an email user by email communication relation analysis and morphologically similar

Page 4: i Eee Papers

alias clustering is proposed. Experimental results show that the proposed approach can efficiently find the authoritative aliases of a user and can be applied to many applications on emails such as identity identifying, social relation discovering, important people finding and so on.Index Terms: Email nining, Alias authority ranking, Email communication relation analysis, Morphologically similar alias clusteringCitation: Yin Meijuan, Wang Qingxian, Chen Shuming, Liu Xiaonan, Luo Xiangyang, "Ranking the Authority of Name Aliases for Email Users," mines, pp.425-430, 2011 Third International Conference on Multimedia Information Networking and Security, 2011

Towards Automatic Discovery of co-authorship Networks in the Brazilian Academic AreasStockholm, Sweden December 05-December 08ISBN: 978-0-7695-4598-1Jesús P. Mena-Chalco Roberto M. Cesar Junior DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/eScienceW.2011.31 In Brazil, individual curricula vitae of academic researchers, that are mainly composed of professional information and scientific productions, are managed into a single software platform called Lattes. Currently, the information gathered from this platform is typically used to evaluate, analyze and document the scientific productions of Brazilian research groups. Despite the fact that the Lattes curricula has semi-structured information, the analysis procedure for medium and large groups becomes a time consuming and highly error-prone task. In this paper, we describe an extension of the script Lattés (an open-source knowledge extraction system from the Lattes platform), for analysing individuals Lattes curricula and automatically discover large-scale co-authorship networks for any academic area. Given some knowledge domain (academic area), the system automatically allows to identify researchers associated with the academic area, extract every list of scientific productions of the researchers, discretized by type and publication year, and for each paper, identify the co-authors registered in the Lattes Platform. The system also allows the generation of different types of networks which may be used to study the characteristics of academic areas at large scale. In particular, we explored the node's degree and Author Rank measures for each identified researcher. Finally, we confirm through experiments that the system facilitates a simple way to generate different co-authorship networks. To the best of our knowledge, this is the first study to examine large-scale co-authorship networks for any Brazilian academic area.Index Terms: knowledge extraction, co-authorship networks, academic areasCitation: Jesús P. Mena-Chalco, Roberto M. Cesar Junior, "Towards Automatic Discovery of co-authorship Networks in the Brazilian Academic Areas," esciencew, pp.53-60, 2011 IEEE Seventh International Conference on e-Science Workshops, 2011

Extracting Academic Information from Conference Web PagesBoca Raton, Florida USA

Page 5: i Eee Papers

November 07-November 09ISBN: 978-0-7695-4596-7Peng Wang Yue You Baowen Xu Jianyu Zhao DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICTAI.2011.164 Conference Web pages are the main platforms to share the conference information and organize conference events. To discover the academic knowledge from such Web pages for building academic ontologies or social networks, it is necessary to extract academic information from conference Web pages. This paper proposes an approach to extract academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by analyzing the visual feature and DOM structure. Then Bayes Network is used to classify these text blocks into predefined categories, and the quality of initial classification results are improved after post-processing. Finally, the academic information is extracted from the classified text blocks. Our experimental results on the real world datasets show that the proposed method is highly effective and efficient for extracting academic information from conference Web pages, and it has average 90% precision and 89% recall.Index Terms: Web Information Extraction, Visual Feature, DOM structure, Bayes NetworkCitation: Peng Wang, Yue You, Baowen Xu, Jianyu Zhao, "Extracting Academic Information from Conference Web Pages," ictai, pp.952-959, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence, 2011

I ICICTA 2012 2012 Fifth International Conference on Intelligent Computation Technology and

Automation Abstract - Frequent Itemsets Mining in Network Traffic Data

This Article

Subscribers, please Login Purchase article: $19 PDF RSS feed

Share

Email this Article to a friend

Bibliographic References

Page 6: i Eee Papers

ASCII Text BibTex RefWorks Procite/RefMan

Add to:

Digg FurlSpurl BlinkSimpy GoogleDel.icio.us Y!MyWeb

Search

Similar Articles Articles by Xin Li Articles by Xuefeng Zheng Articles by Jingchun Li Articles by Shaojie Wang

2012 Fifth International Conference on Intelligent Computation Technology and AutomationFrequent Itemsets Mining in Network Traffic DataZhangjiajie, Hunan ChinaJanuary 12-January 14ISBN: 978-0-7695-4637-7Xin LiXuefeng ZhengJingchun LiShaojie WangDOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICICTA.2012.105Many projects have tried to analyze the structure and dynamics of application overlay networks on the Internet using packet analysis and network flow data. While such analysis is essential for a variety of network management and security tasks, it is difficult on many networks: either the volume of data is so large as to make packet inspection intractable, or privacy concerns forbid packet capture and require the dissociation of network flows from users' actual IP addresses. In this paper, an algorithm for mining privacy preserving item sets is proposed. On the one hand, only maximal item set is considered, which reduces the number of item sets greatly. On the other hand, the intermediate mining results are encrypted for the security concern. Experimental results show that the proposed algorithm is both accurate and efficient.Index Terms:data mining, frequent itemset, privacy preserving, network traffic dataCitation:

Page 7: i Eee Papers

Xin Li, Xuefeng Zheng, Jingchun Li, Shaojie Wang, "Frequent Itemsets Mining in Network Traffic Data," icicta, pp.394-397, 2012 Fifth International Conference on Intelligent Computation Technology and Automation, 2012

I ICICTA 2012 2012 Fifth International Conference on Intelligent Computation Technology and

Automation Abstract - Frequent Itemsets Mining in Network Traffic Data

This Article

Subscribers, please Login Purchase article: $19 PDF RSS feed

Share

Email this Article to a friend

Bibliographic References

ASCII Text BibTex RefWorks Procite/RefMan

Add to:

Digg FurlSpurl BlinkSimpy GoogleDel.icio.us Y!MyWeb

Search

Similar Articles Articles by Xin Li Articles by Xuefeng Zheng Articles by Jingchun Li Articles by Shaojie Wang

Page 8: i Eee Papers

2012 Fifth International Conference on Intelligent Computation Technology and AutomationFrequent Itemsets Mining in Network Traffic DataZhangjiajie, Hunan ChinaJanuary 12-January 14ISBN: 978-0-7695-4637-7Xin LiXuefeng ZhengJingchun LiShaojie WangDOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICICTA.2012.105Many projects have tried to analyze the structure and dynamics of application overlay networks on the Internet using packet analysis and network flow data. While such analysis is essential for a variety of network management and security tasks, it is difficult on many networks: either the volume of data is so large as to make packet inspection intractable, or privacy concerns forbid packet capture and require the dissociation of network flows from users' actual IP addresses. In this paper, an algorithm for mining privacy preserving item sets is proposed. On the one hand, only maximal item set is considered, which reduces the number of item sets greatly. On the other hand, the intermediate mining results are encrypted for the security concern. Experimental results show that the proposed algorithm is both accurate and efficient.Index Terms:data mining, frequent itemset, privacy preserving, network traffic dataCitation:Xin Li, Xuefeng Zheng, Jingchun Li, Shaojie Wang, "Frequent Itemsets Mining in Network Traffic Data," icicta, pp.394-397, 2012 Fifth International Conference on Intelligent Computation Technology and Automation, 2012

The World in a Nutshell: Concise Range QueriesJanuary 2011 (vol. 23 no. 1) pp. 139-154Ke Yi, Hong Kong University of Science and Technology, Hong KongXiang Lian, Hong Kong University of Science and Technology, Hong KongFeifei Li, Florida State University , Tallahassee, FLLei Chen, Hong Kong University of Science and Technology, Hong KongDOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.35 Web Extra: View Supplemental Material(PDF) With the advance of wireless communication technology, it is quite common for people to view maps or get related services from the handheld devices, such as mobile phones and PDAs. Range queries, as one of the most commonly used tools, are often posed by the users to retrieve needful information from a spatial database. However, due to the limits of communication bandwidth and hardware power of handheld devices, displaying all the results of a range query on a handheld device is neither communication-efficient nor informative to the users. This is simply because that there are often too many results returned from a range query. In view of this problem, we present a novel idea that a concise representation of a specified size for the range query results, while incurring minimal information loss, shall be computed and returned to the

Page 9: i Eee Papers

user. Such a concise range query not only reduces communication costs, but also offers better usability to the users, providing an opportunity for interactive exploration. The usefulness of the concise range queries is confirmed by comparing it with other possible alternatives, such as sampling and clustering. Unfortunately, we prove that finding the optimal representation with minimum information loss is an NP-hard problem. Therefore, we propose several effective and nontrivial algorithms to find a good approximate result. Extensive experiments on real-world data have demonstrated the effectiveness and efficiency of the proposed techniques.

[1] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang, "Selecting Stars: The k Most Representative Skyline Operator," Proc. Int'l Conf. Data Eng. (ICDE), 2007.[2] C. Jermaine, S. Arumugam, A. Pol, and A. Dobra, "Scalable Approximate Query Processing with the dbo Engine," Proc. ACM SIGMOD, 2007.[3] G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis, "Fast Data Anonymization with Low Information Loss," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2007.[4] G. Aggarwal, T. Feder, K. Kenthapadi, S. Khuller, R. Panigrahy, D. Thomas, and A. Zhu, "Achieving Anonymity via Clustering," Proc. Symp. Principles of Database Systems (PODS), 2006.[5] J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A.W.-C. Fu, "Utility-Based Anonymization Using Local Recoding," Proc. ACM SIGKDD, 2006.[6] C. Böhm, C. Faloutsos, J.-Y. Pan, and C. Plant, "RIC: Parameter-Free Noise-Robust Clustering," ACM Trans. Knowledge Discovery from Data, vol. 1, no. 3, pp. 10-1-10-28, 2007.[7] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. Int'l Conf. Very Large Data Bases (VLDB), 1994.[8] D. Lichtenstein, "Planar Formulae and Their Uses," SIAM J. Computing, vol. 11, no. 2, pp. 329-343, 1982.[9] R. Tamassia and I.G. Tollis, "Planar Grid Embedding in Linear Time," IEEE Trans. Circuits and Systems, vol. 36, no. 9, pp. 1230-1234, Sept. 1989.[10] H.V. Jagadish, B.C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, "iDistance: An Adaptive B+-Tree Based Indexing Method for Nearest Neighbor Search," ACM Trans. Database Systems, vol. 30, no. 2, pp. 364-397, 2005.[11] H. Samet, The Design and Analysis of Spatial Data Structures. Addison-Wesley Longman Publishing Co., Inc., 1990.[12] B. Moon, H.v. Jagadish, C. Faloutsos, and J.H. Saltz, "Analysis of the Clustering Properties of the Hilbert Space-Filling Curve," IEEE Trans. Knowledge and Data Eng., vol. 13, no. 1, pp. 124-141, Jan. 2001.[13] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD, 1984.[14] N. Beckmann, H.P. Kriegel, R. Schneider, and B. Seeger, "The R$^{\ast}$ -Tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. ACM SIGMOD, 1990.[15] T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: An Efficient Data Clustering Method for Very Large Databases," Proc. ACM SIGMOD, 1996.[16] V. Ganti, R. Ramakrishnan, J. Gehrke, and A. Powell, "Clustering Large Datasets in Arbitrary Metric Spaces," Proc. Int'l Conf. Data Eng. (ICDE), 1999.[17] K. Mouratidis, D. Papadias, and S. Papadimitriou, "Tree-Based Partition Querying: A Methodology for Computing Medoids in Large Spatial Datasets," VLDB J., vol. 17, no. 4, pp. 923-945, 2008.

Page 10: i Eee Papers

[18] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), 1996.[19] M.L. Yiu and N. Mamoulis, "Clustering Objects on a Spatial Network," Proc. ACM SIGMOD, 2004.[20] C.S. Jensen, D. Lin, B.C. Ooi, and R. Zhang, "Effective Density Queries on Continuously Moving Objects," Proc. Int'l Conf. Data Eng. (ICDE), 2006.[21] C.R. Palmer and C. Faloutsos, "Density Biased Sampling: An Improved Method for Data Mining and Clustering," Proc. ACM SIGMOD, 2000.[22] K. Yi, X. Lian, F. Li, and L. Chen, "The World in a Nutshell: Concise Range Queries," Proc. Int'l Conf. Data Eng. (ICDE), 2009.[23] P.K. Agarwal, L. Arge, and J. Erickson, "Indexing Moving Points," Proc. Symp. Principles of Database Systems (PODS), 2000.[24] Y. Tao, D. Papadias, and J. Sun, "The TPR∗-Tree: An Optimized Spatio-Temporal Access Method for Predictive Queries," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2003.[25] N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases," Proc. Int'l Conf. Very Large Data Bases (VLDB), 2004.[26] R. Cheng, D. Kalashnikov, and S. Prabhakar, "Evaluating Probabilistic Queries over Imprecise Data," Proc. ACM SIGMOD, 2003.[27] A.D. Sarma, O. Benjelloun, A. Halevy, and J. Widom, "Working Models for Uncertain Data," Proc. Int'l Conf. Data Eng. (ICDE), 2006.[28] Y.E. Ioannidis and V. Poosala, "Balancing Histogram Optimality and Practicality for Query Result Size Estimation," Proc. ACM SIGMOD, 1995.[29] S. Acharya, V. Poosala, and S. Ramaswamy, "Selectivity Estimation in Spatial Databases," Proc. ACM SIGMOD, 1999.[30] H.V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K.C. Sevcik, and T. Suel, "Optimal Histograms with Quality Guarantees," Proc. Int'l Conf. Very Large Data Bases (VLDB), 1998.[31] T. Brinkhoff, "A Framework for Generating Network-Based Moving Objects," Geoinformatica, vol. 6, pp. 153-180, 2002.

Index Terms: Spatial databases, range queries, algorithms.Citation: Ke Yi, Xiang Lian, Feifei Li, Lei Chen, "The World in a Nutshell: Concise Range Queries," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 1, pp. 139-154, Jan. 2011, doi:10.1109/TKDE.2010.35

A Pattern Mining Approach to Sensor-Based Human Activity RecognitionSeptember 2011 (vol. 23 no. 9) pp. 1359-1372Tao Gu, University of Southern Denmark, OdenseLiang Wang, Nanjing University, NanjingZhanqing Wu, Nanjing University, NanjingXianping Tao, Nanjing University, NanjingJian Lu, Nanjing University, Nanjing

Page 11: i Eee Papers

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.184 Recognizing human activities from sensor readings has recently attracted much research interest in pervasive computing due to its potential in many applications, such as assistive living and healthcare. This task is particularly challenging because human activities are often performed in not only a simple (i.e., sequential), but also a complex (i.e., interleaved or concurrent) manner in real life. Little work has been done in addressing complex issues in such a situation. The existing models of interleaved and concurrent activities are typically learning-based. Such models lack of flexibility in real life because activities can be interleaved and performed concurrently in many different ways. In this paper, we propose a novel pattern mining approach to recognize sequential, interleaved, and concurrent activities in a unified framework. We exploit Emerging Pattern—a discriminative pattern that describes significant changes between classes of data—to identify sensor features for classifying activities. Different from existing learning-based approaches which require different training data sets for building activity models, our activity models are built upon the sequential activity trace only and can be applied to recognize both simple and complex activities. We conduct our empirical studies by collecting real-world traces, evaluating the performance of our algorithm, and comparing our algorithm with static and temporal models. Our results demonstrate that, with a time slice of 15 seconds, we achieve an accuracy of 90.96 percent for sequential activity, 88.1 percent for interleaved activity, and 82.53 percent for concurrent activity.

[1] S. Katz, A.B. Ford, R.W. Moskowitz, B.A. Jackson, and M.W. Jaffe, "Studies of Illness in the Aged. The Index of ADL: A Standardized Measure of Biological and Psychological Function," J. Am. Medical Assoc., vol. 185, pp. 914-919, Sept. 1963.[2] L. Bao and S.S. Intille, "Activity Recognition from User-Annotated Acceleration Data," Proc. Second Int'l Conf. Pervasive Computing (PERVASIVE '04), pp. 1-17, 2004.[3] E.M. Tapia, S.S. Intille, and K. Larson, "Activity Recognition in the Home Setting Using Simple and Ubiquitous Sensors," Proc. Second Int'l Conf. Pervasive Computing (PERVASIVE '04), pp. 158-175, 2004.[4] B. Logan, J. Healey, M. Philipose, E.M. Tapia, and S.S. Intille, "A Long-Term Evaluation of Sensing Modalities for Activity Recognition," Proc. Ninth Int'l Conf. Ubiquitous Computing (UbiComp), Sept. 2007.[5] T. Huynh, U. Blanke, and B. Schiele, "Scalable Recognition of Daily Activities from Wearable Sensors," Proc. Int'l Symp. Location and Context-Awareness (LoCA), Sept. 2007.[6] M. Stikic, T. Huynh, K. Van-Laerhoven, and B. Schiele, "ADL Recognition Based on the Combination of RFID and Accelerometer Sensing," Proc. Int'l Conf. Pervasive Computing Technologies for Healthcare, 2008.[7] Y. Nakauchi, K. Noguchi, P. Somwong, and T. Matsubara, "Human Intention Detection and Activity Support System for Ubiquitous Sensor Room," J. Robotics and Mechatronics, vol. 16, no. 5, pp. 545-551, 2004.[8] C. Lombriser, N.B. Bharatula, D. Roggen, and G. Tröster, "On-Body Activity Recognition in a Dynamic Sensor Network," Proc. Int'l Conf. Body Area Networks (BodyNets), 2007.[9] J.B.J. Bussmann, W.L.J. Martens, J.H.M. Tulen, F. Schasfoort, H.J.G. van den Berg-Emons, and H. Stam, "Measuring Daily Behavior Using Ambulatory Accelerometry: The Activity Monitor," Behavior Research Methods, Instruments, and Computers, vol. 33, no. 3, pp. 349-356, 2001.[10] M. Philipose, K.P. Fishkin, M. Perkowitz, D.J. Patterson, D. Fox, H. Kautz, and D. Hähnel,

Page 12: i Eee Papers

"Inferring Activities from Interactions with Objects," IEEE Pervasive Computing, vol. 3, no. 4, pp. 50-57, Oct. 2004.[11] D. Wilson and C. Atkeson, "Simultaneous Tracking and Activity Recognition (STAR) Using Many Anonymous, Binary Sensors," Proc. Int'l Conf. Pervasive Computing, pp. 62-79, 2005.[12] J. Lester, T. Choudhury, and G. Borriello, "A Practical Approach to Recognizing Physical Activities," Proc. Int'l Conf. Pervasive Computing, 2006.[13] J.A. Ward, P. Lukowicz, G. Tröster, and T.E. Starner, "Activity Recognition of Assembly Tasks Using Body-Worn Microphones and Accelerometers," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1553-1567, Oct. 2006.[14] D. Patterson, D. Fox, H. Kautz, and M. Philipose, "Fine-Grained Activity Recognition by Aggregating Abstract Object Usage," Proc. IEEE Int'l Symp. Wearable Computers, Oct. 2005.[15] D. Wyatt, M. Philipose, and T. Choudhury, "Unsupervised Activity Recognition Using Automatically Mined Common Sense," Proc. Am. Assoc. for Artificial Intelligence (AAAI) Conf., July 2005.[16] W. Pentney, A.M. Popescu, S. Wang, H. Kautz, and M. Philipose, "Sensor-Based Understanding of Daily Life via Large-Scale Use of Common Sense," Proc. Am. Assoc. for Artificial Intelligence (AAAI) Conf., July 2006.[17] S. Wang, W. Pentney, A.M. Popescu, T. Choudhury, and M. Philipose, "Common Sense Based Joint Training of Human Activity Recognizers," Proc. Int'l Joint Conf. Artificial Intelligence, Jan. 2007.[18] D.L. Vail, M.M. Veloso, and J.D. Lafferty, "Conditional Random Fields for Activity Recognition," Proc. Int'l Conf. Autonomous Agents and Multi-Agent Systems (AAMAS), 2007.[19] T.Y. Wu, C.C. Lian, and J.Y. Hsu, "Joint Recognition of Multiple Concurrent Activities Using Factorial Conditional Random Fields," Proc. Am. Assoc. for Artificial Intelligence (AAAI) Workshop Plan, Activity, and Intent Recognition, July 2007.[20] T.L.M. van Kasteren, A.K. Noulas, G. Englebienne, and B.J.A. Kröse, "Accurate Activity Recognition in a Home Setting," Proc. 10th Int'l Conf. Ubiquitous Computing (UbiComp), Sept. 2008.[21] D.H. Hu and Q. Yang, "CIGAR: Concurrent and Interleaving Goal and Activity Recognition," Proc. Am. Assoc. for Artificial Intelligence (AAAI) Conf., 2008.[22] D.H. Hu, S.J. Pan, V.W. Zheng, N.N. Liu, and Q. Yang, "Real World Activity Recognition with Multiple Goals," Proc. 10th Int'l Conf. Ubiquitous Computing (UbiComp), Sept. 2008.[23] J. Modayil, T.X. Bai, and H. Kautz, "Improving the Recognition of Interleaved Activities Research Note," Proc. 10th Int'l Conf. Ubiquitous Computing (UbiComp), Sept. 2008.[24] T. Huynh, M. Fritz, and B. Schiele, "Discovery of Activity Patterns Using Topic Models," Proc. 10th Int'l Conf. Ubiquitous Computing (UbiComp), Sept. 2008.[25] R. Hamid, S. Maddi, A. Bobick, and I. Essa, "Unsupervised Analysis of Activity Sequences Using Event Motifs," Proc. ACM Int'l Workshop Video Surveillance and Sensor, 2006.[26] R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, and C. Isbell, "A Novel Sequence Representation for Unsupervised Analysis of Human Activities," Artificial Intelligence, vol. 173, no. 14, pp. 1221-1244, 2009.[27] S.S. Intille, K. Larson, E.M. Tapia, J. Beaudin, P. Kaushik, J. Nawyn, and R. Rockinson, "Using a Live-In Laboratory for Ubiquitous Computing Research," Proc. Int'l Conf. Pervasive Computing, pp. 349-365, 2006.[28] Y. Yacoob and M.J. Black, "Parameterized Modeling and Recognition of Activities," Proc.

Page 13: i Eee Papers

IEEE Int'l Conf. Computer Vision, 1998.[29] D. Moore, I. Essa, and M. Hayes, "Exploiting Human Actions and Object Context for Recognition Tasks," Proc. IEEE Int'l Conf. Computer Vision, 1999.[30] Y.A. Ivanov and A.F. Bobick, "Recognition of Visual Activities and Interactions by Stochastic Parsing," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 852-872, Aug. 2000.[31] I. Haritaoglu, D. Harwood, and L.S. Davis, "W4: Real-Time Surveillance of People and Their Activities," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809-830, Aug. 2000.[32] J. Yamato, J. Ohya, and K. Ishii, "Recognizing Human Action in Time-Sequential Images Using Hidden Markov Model," Proc. Int'l IEEE CS Conf. Computer Vision and Pattern Recognition, 1992.[33] N. Oliver, A. Garg, and E. Horvitz, "Layered Representations for Learning and Inferring Office Activity from Multiple Sensory Channels," Computer Vision and Image Understanding, vol. 96, no. 2, pp. 163-180, 2004.[34] X.D. Sun, C.W. Chen, and B.S. Manjunath, "Probabilistic Motion Parameter Models for Human Activity Recognition," Proc. IEEE Int'l Conf. Pattern Recognition, 2002.[35] J.B. Arie, Z.Q. Wang, P. Pandit, and S. Rajaram, "Human Activity Recognition Using Multidimensional Indexing," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1091-1104, Aug. 2002.[36] M. Brand and V. Kettnaker, "Discovery and Segmentation of Activities in Video," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 844-851, Aug. 2000.[37] N. Robertson and I. Reid, "A General Method for Human Activity Recognition in Video," Computer Vision and Image Understanding, vol. 104, no. 2, pp. 232-248, 2006.[38] F. Fusier, V. Valentin, F. Brémond, M. Thonnat, M. Borg, D. Thirde, and J. Ferryman, "Video Understanding for Complex Activity Recognition," Machine Vision and Applications, vol. 18, nos. 3/4, pp. 167-188, Aug. 2007.[39] T. Huang, D. Koller, J. Malik, G.H. Ogasawara, B. Rao, S.J. Russell, and J. Weber, "Automatic Symbolic Traffic Scene Analysis Using Belief Networks," Proc. Nat'l Conf. Artificial Intelligence, 1994.[40] A.A. Efros, A.C. Berg, G. Mori, and J. Malik, "Recognizing Action at a Distance," Proc. IEEE Int'l Conf. Computer Vision, pp. 726-733, 2003.[41] E. Shechtman and M. Irani, "Space-Time Behavior Based Correlation," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 405-412, 2005.[42] H. Kautz, "A Formal Theory of Plan Recognition," PhD dissertation, Univ. of Rochester, 1987.[43] G.Z. Dong and J.Y. Li, "Efficient Mining of Emerging Patterns: Discovering Trends and Differences," Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining, pp. 43-52, Aug. 1999.[44] G.Z. Dong, X. Zhang, L. Wong, and J.Y. Li, "CAEP: Classification by Aggregating Emerging Patterns," Proc. Second Int'l Conf. Discovery Science, Dec. 1999.[45] J.Y. Li, H.Q. Liu, J.R. Downing, A.E. Yeoh, and L. Wong, "Simple Rules Underlying Gene Expression Profiles of More Than Six Subtypes of Acute Lymphoblastic Leukemia (ALL) Patients," Bioinformatics, vol. 19, no. 1, p. 71-78, 2003.[46] J.Y. Li, H.Q. Liu, S.K. Ng, and L. Wong, "Discovery of Significant Rules for Classifying Cancer Diagnosis Data," Bioinformatics, vol. 19, pp. ii93-ii102, 2003.

Page 14: i Eee Papers

[47] J.Y. Li, G.M. Liu, and L. Wong, "Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns," Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining, pp. 430-439, 2007.[48] U. Fayyad and K. Irani, "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning," Proc. Int'l Joint Conf. Artificial Intelligence, 1993.

Index Terms: Human activity recognition, pattern analysis, emerging pattern, classifier design and evaluation.Citation: Tao Gu, Liang Wang, Zhanqing Wu, Xianping Tao, Jian Lu, "A Pattern Mining Approach to Sensor-Based Human Activity Recognition," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 9, pp. 1359-1372, Sept. 2011, doi:10.1109/TKDE.2010.184

Publication 2011 Issue No. 7 - July Abstract - A Web Search Engine-Based Approach to Measure Semantic Similarity

between Words

This Article

Subscribers, please Login Purchase article: $19 PDF HTML RSS feed

Share

Email this Article to a friend

Bibliographic References

ASCII Text BibTex RefWorks Procite/RefMan

Add to:

Digg FurlSpurl BlinkSimpy

Page 15: i Eee Papers

GoogleDel.icio.us Y!MyWeb

Search

Similar Articles Articles by Danushka Bollegala Articles by Yutaka Matsuo Articles by Mitsuru Ishizuka

A Web Search Engine-Based Approach to Measure Semantic Similarity between WordsJuly 2011 (vol. 23 no. 7)pp. 977-990Danushka Bollegala, The University of Tokyo, TokyoYutaka Matsuo, The University of Tokyo, TokyoMitsuru Ishizuka, The University of Tokyo, TokyoDOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.172Measuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. We propose an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a web search engine for two words. Specifically, we define various word co-occurrence measures using page counts and integrate those with lexical patterns extracted from text snippets. To identify the numerous semantic relations that exist between two given words, we propose a novel pattern extraction algorithm and a pattern clustering algorithm. The optimal combination of page counts-based co-occurrence measures and lexical pattern clusters is learned using support vector machines. The proposed method outperforms various baselines and previously proposed web-based semantic similarity measures on three benchmark data sets showing a high correlation with human ratings. Moreover, the proposed method significantly improves the accuracy in a community mining task.

[1] A. Kilgarriff, "Googleology Is Bad Science," Computational Linguistics, vol. 33, pp. 147-151, 2007.[2] M. Sahami and T. Heilman, "A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets," Proc. 15th Int'l World Wide Web Conf., 2006.[3] D. Bollegala, Y. Matsuo, and M. Ishizuka, "Disambiguating Personal Names on the Web Using Automatically Extracted Key Phrases," Proc. 17th European Conf. Artificial Intelligence, pp. 553-557, 2006.[4] H. Chen, M. Lin, and Y. Wei, "Novel Association Measures Using Web Search with Double Checking," Proc. 21st Int'l Conf. Computational Linguistics and 44th Ann. Meeting of the Assoc. for Computational Linguistics (COLING/ACL '06), pp. 1009-1016, 2006.[5] M. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora," Proc. 14th

Page 16: i Eee Papers

Conf. Computational Linguistics (COLING), pp. 539-545, 1992.[6] M. Pasca, D. Lin, J. Bigham, A. Lifchits, and A. Jain, "Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge," Proc. Nat'l Conf. Artificial Intelligence (AAAI '06), 2006.[7] R. Rada, H. Mili, E. Bichnell, and M. Blettner, "Development and Application of a Metric on Semantic Nets," IEEE Trans. Systems, Man and Cybernetics, vol. 19, no. 1, pp. 17-30, Jan./Feb. 1989.[8] P. Resnik, "Using Information Content to Evaluate Semantic Similarity in a Taxonomy," Proc. 14th Int'l Joint Conf. Aritificial Intelligence, 1995.[9] D. Mclean, Y. Li, and Z.A. Bandar, "An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources," IEEE Trans. Knowledge and Data Eng., vol. 15, no. 4, pp. 871-882, July/Aug. 2003.[10] G. Miller and W. Charles, "Contextual Correlates of Semantic Similarity," Language and Cognitive Processes, vol. 6, no. 1, pp. 1-28, 1998.[11] D. Lin, "An Information-Theoretic Definition of Similarity," Proc. 15th Int'l Conf. Machine Learning (ICML), pp. 296-304, 1998.[12] R. Cilibrasi and P. Vitanyi, "The Google Similarity Distance," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 370-383, Mar. 2007.[13] M. Li, X. Chen, X. Li, B. Ma, and P. Vitanyi, "The Similarity Metric," IEEE Trans. Information Theory, vol. 50, no. 12, pp. 3250-3264, Dec. 2004.[14] P. Resnik, "Semantic Similarity in a Taxonomy: An Information Based Measure and Its Application to Problems of Ambiguity in Natural Language," J. Artificial Intelligence Research, vol. 11, pp. 95-130, 1999.[15] R. Rosenfield, "A Maximum Entropy Approach to Adaptive Statistical Modelling," Computer Speech and Language, vol. 10, pp. 187-228, 1996.[16] D. Lin, "Automatic Retrieval and Clustering of Similar Words," Proc. 17th Int'l Conf. Computational Linguistics (COLING), pp. 768-774, 1998.[17] J. Curran, "Ensemble Methods for Automatic Thesaurus Extraction," Proc. ACL-02 Conf. Empirical Methods in Natural Language Processing (EMNLP), 2002.[18] C. Buckley, G. Salton, J. Allan, and A. Singhal, "Automatic Query Expansion Using Smart: Trec 3," Proc. Third Text REtreival Conf., pp. 69-80, 1994.[19] V. Vapnik, Statistical Learning Theory. Wiley, 1998.[20] K. Church and P. Hanks, "Word Association Norms, Mutual Information and Lexicography," Computational Linguistics, vol. 16, pp. 22-29, 1991.[21] Z. Bar-Yossef and M. Gurevich, "Random Sampling from a Search Engine's Index," Proc. 15th Int'l World Wide Web Conf., 2006.[22] F. Keller and M. Lapata, "Using the Web to Obtain Frequencies for Unseen Bigrams," Computational Linguistics, vol. 29, no. 3, pp. 459-484, 2003.[23] M. Lapata and F. Keller, "Web-Based Models for Natural Language Processing," ACM Trans. Speech and Language Processing, vol. 2, no. 1, pp. 1-31, 2005.[24] R. Snow, D. Jurafsky, and A. Ng, "Learning Syntactic Patterns for Automatic Hypernym Discovery," Proc. Advances in Neural Information Processing Systems (NIPS), pp. 1297-1304, 2005.[25] M. Berland and E. Charniak, "Finding Parts in Very Large Corpora," Proc. Ann. Meeting of the Assoc. for Computational Linguistics on Computational Linguistics (ACL '99), pp. 57-64,

Page 17: i Eee Papers

1999.[26] D. Ravichandran and E. Hovy, "Learning Surface Text Patterns for a Question Answering System," Proc. Ann. Meeting on Assoc. for Computational Linguistics (ACL '02), pp. 41-47, 2001.[27] R. Bhagat and D. Ravichandran, "Large Scale Acquisition of Paraphrases for Learning Surface Patterns," Proc. Assoc. for Computational Linguistics: Human Language Technologies (ACL '08: HLT), pp. 674-682, 2008.[28] J. Pei, J. Han, B. Mortazavi-Asi, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "Mining Sequential Patterns by Pattern-Growth: The Prefixspan Approach," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1424-1440, Nov. 2004.[29] Z. Harris, "Distributional Structure," Word, vol. 10, pp. 146-162, 1954.[30] J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods," Advances in Large Margin Classifiers, pp. 61-74, MIT Press, 2000.[31] P. Gill, W. Murray, and M. Wright, Practical Optimization. Academic Press, 1981.[32] H. Rubenstein and J. Goodenough, "Contextual Correlates of Synonymy," Comm. ACM, vol. 8, pp. 627-633, 1965.[33] L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin, "Placing Search in Context: The Concept Revisited," ACM Trans. Information Systems, vol. 20, pp. 116-131, 2002.[34] D. Bollegala, Y. Matsuo, and M. Ishizuka, "Measuring Semantic Similarity between Words Using Web Search Engines," Proc. Int'l Conf. World Wide Web (WWW '07), pp. 757-766, 2007.[35] M. Strube and S.P. Ponzetto, "Wikirelate! Computing Semantic Relatedness Using Wikipedia," Proc. Nat'l Conf. Artificial Intelligence (AAAI '06), pp. 1419-1424, 2006.[36] A. Gledson and J. Keane, "Using Web-Search Results to Measure Word-Group Similarity," Proc. Int'l Conf. Computational Linguistics (COLING '08), pp. 281-288, 2008.[37] Z. Wu and M. Palmer, "Verb Semantics and Lexical Selection," Proc. Ann. Meeting on Assoc. for Computational Linguistics (ACL '94), pp. 133-138, 1994.[38] C. Leacock and M. Chodorow, "Combining Local Context and Wordnet Similarity for Word Sense Disambiguation," WordNet: An Electronic Lexical Database, vol. 49, pp. 265-283, MIT Press, 1998.[39] J. Jiang and D. Conrath, "Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy," Proc. Int'l Conf. Research in Computational Linguistics (ROCLING X), 1997.[40] M. Jarmasz, "Roget's Thesaurus as a Lexical Resource for Natural Language Processing," technical report, Univ. of Ottowa, 2003.[41] V. Schickel-Zuber and B. Faltings, "OSS: A Semantic Similarity Function Based on Hierarchical Ontologies," Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '07), pp. 551-556, 2007.[42] E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Pasca, and A. Soroa, "A Study on Similarity and Relatedness Using Distributional and Wordnet-Based Approaches," Proc. Human Language Technologies: The 2009 Ann. Conf. North Am. Chapter of the Assoc. for Computational Linguistics (NAACL-HLT '09), 2009.[43] G. Hirst and D. St-Onge, "Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms," WordNet: An Electronic Lexical Database, pp. 305-332, MIT Press, 1998.[44] T. Hughes and D. Ramage, "Lexical Semantic Relatedness with Random Graph Walks,"

Page 18: i Eee Papers

Proc. Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '07), pp. 581-589, 2007.[45] E. Gabrilovich and S. Markovitch, "Computing Semantic Relatedness Using Wikipedia-Based Explicit Semantic Analysis," Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '07), pp. 1606-1611, 2007.[46] Y. Matsuo, J. Mori, M. Hamasaki, K. Ishida, T. Nishimura, H. Takeda, K. Hasida, and M. Ishizuka, "Polyphonet: An Advanced Social Network Extraction System," Proc. 15th Int'l World Wide Web Conf., 2006.[47] A. Bagga and B. Baldwin, "Entity-Based Cross Document Coreferencing Using the Vector Space Model," Proc. 36th Ann. Meeting of the Assoc. for Computational Linguistics and 17th Int'l Conf. Computational Linguistics (COLING-ACL), pp. 79-85, 1998.

Index Terms:Web mining, information extraction, web text analysis.Citation:Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka, "A Web Search Engine-Based Approach to Measure Semantic Similarity between Words," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7, pp. 977-990, July 2011, doi:10.1109/TKDE.2010.172Peer Review Notice | Give Us FeedbackUsage of this product signifies your acceptance of the Terms of Use.

Publication 2011 Issue No. 8 - August Abstract - Efficient and Accurate Discovery of Patterns in Sequence Data Sets

This Article

Subscribers, please Login Purchase article: $19 PDF HTML RSS feed

Share

Email this Article to a friend

Bibliographic References

ASCII Text BibTex RefWorks Procite/RefMan

Page 19: i Eee Papers

Add to:

Digg FurlSpurl BlinkSimpy GoogleDel.icio.us Y!MyWeb

Search

Similar Articles Articles by Avrilia Floratou Articles by Sandeep Tata Articles by Jignesh M. Patel

Efficient and Accurate Discovery of Patterns in Sequence Data SetsAugust 2011 (vol. 23 no. 8)pp. 1154-1168Avrilia Floratou, University of Wisconsin, MadisonSandeep Tata, IBM Almaden, San JoseJignesh M. Patel, University of Wisconsin, MadisonDOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.69Existing sequence mining algorithms mostly focus on mining for subsequences. However, a large class of applications, such as biological DNA and protein motif mining, require efficient mining of “approximate” patterns that are contiguous. The few existing algorithms that can be applied to find such contiguous approximate pattern mining have drawbacks like poor scalability, lack of guarantees in finding the pattern, and difficulty in adapting to other applications. In this paper, we present a new algorithm called FLexible and Accurate Motif DEtector (FLAME). FLAME is a flexible suffix-tree-based algorithm that can be used to find frequent patterns with a variety of definitions of motif (pattern) models. It is also accurate, as it always finds the pattern if it exists. Using both real and synthetic data sets, we demonstrate that FLAME is fast, scalable, and outperforms existing algorithms on a variety of performance metrics. In addition, based on FLAME, we also address a more general problem, named extended structured motif extraction, which allows mining frequent combinations of motifs under relaxed constraints.

[1] M.O. Dayhoff, R.M. Schwartz, and B. Orcutt, "A Model for Evolutionary Changes in Proteins," Atlas of Protein Sequence and Structure, vol. 5, pp. 345-352, Nat'l Biomedical Research Foundation, 1978.[2] S. Henikoff and J. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of Sciences USA, vol. 89, no. 22, pp. 10915-10919, 1992.

Page 20: i Eee Papers

[3] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 487-499, 1994.[4] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th IEEE Int'l Conf. Data Eng. (ICDE), pp. 3-14, 1995.[5] M.J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, vol. 42, nos. 1/2, pp. 31-60, 2001.[6] J. Wang and J. Han, "BIDE: Efficient Mining of Frequent Closed Sequences," Proc. 20th IEEE Int'l Conf. Data Eng. (ICDE), pp. 79-90, 2004.[7] X. Yan, J. Han, and R. Afshar, "CloSpan: Mining Closed Sequential Patterns in Large Datasets," Proc. SIAM Int'l Conf. Data Mining (SDM), 2003.[8] J. Yang, W. Wang, P.S. Yu, and J. Han, "Mining Long Sequential Patterns in a Noisy Environment," Proc. ACM SIGMOD, pp. 406-417, 2002.[9] S. Sinha and M. Tompa, "YMF: A Program for Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation," Nucleic Acids Research, vol. 31, no. 13, pp. 3586-3588, 2003.[10] G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole, "Weeder Web: Discovery of Transcription Factor Binding Sites in a Set of Sequences From Co-Regulated Genes," Nucleic Acids Research, vol. 32, pp. W199-W203, 2004.[11] E. Eskin and P.A. Pevzner, "Finding Composite Regulatory Patterns in DNA Sequences," Proc. 10th Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. S354-S363, 2002.[12] J. Buhler and M. Tompa, "Finding Motifs Using Random Projections," J. Computational Biology, vol. 9, no. 2, pp. 225-242, 2002.[13] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth, "Rule Discovery from Time Series," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 16-22, 1998.[14] S. Hoppner, "Discovery of Temporal Patterns—Learning Rules about the Qualitative Behaviour of Time Series," Proc. Fifth European Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 192-203, 2001.[15] P. Patel, E. Keogh, J. Lin, and S. Lonardi, "Mining Motifs in Massive Time Series Databases," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 370-377, 2002.[16] H. Wu, B. Salzberg, G.C. Sharp, S.B. Jiang, H. Shirato, and D. Kaeli, "Subsequence Matching on Structured Time Series Data," Proc. ACM SIGMOD, pp. 682-693, 2005.[17] M.J. Zaki, "Sequence Mining in Categorical Domains: Incorporating Constrains," Proc. Ninth Int'l Conf. Information and Knowledge Management (CIKM), pp. 442-429, 2000.[18] B.Y.-C. Chiu, E.J. Keogh, and S. Lonardi, "Probabilistic Discovery of Time Series Motifs," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 493-498, 2003.[19] W. Wang and J. Yang, Mining Sequential Patterns from Large Data Sets, vol. 28, Springer-Verlag, 2005.[20] M. Das and H.K. Dai, "A Survey of DNA Motif Finding Algorithms," BMC Bioinformatics, vol. 8, p. S21-S33, 2007.[21] G.K. Sandve and F. Drabløs, "A Survey of Motif Discovery Methods in an Integrated Framework," Biology Direct, vol. 1, pp. 11-26, 2006.[22] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 215-224, 2001.[23] J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints in Large

Page 21: i Eee Papers

Databases," Proc. 11th Int'l Conf. Information and Knowledge Management (CIKM), pp. 18-25, 2002.[24] A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert, "Approaches to the Automatic Discovery of Patterns in Biosequences," J. Computational Biology, vol. 5, pp. 279-305, 1998.[25] L. Marsan and M.-F. Sagot, "Algorithms for Extracting Structured Motifs Using a Suffix Tree with Application to Promoter and Regulatory Site Consensus Identification," J. Computational Biology, vol. 7, nos. 3/4, pp. 345-360, 2000.[26] F. Zhu, X. Yan, J. Han, and P.S. Yu, "Efficient Discovery of Frequent Approximate Sequential Patterns," Proc. Seventh IEEE Int'l Conf. Data Mining (ICDM), 2007.[27] S. Rajasekaran, S. Balla, C.-H. Huang, V. Thapar, M.R. Gryk, M.W. Maciejewski, and M.R. Schiller, "Exact Algorithms for Motif Search," Proc. Asia-Pacific Bioinformatics Conf. (APBC), pp. 239-248, 2005.[28] J. Davila, S. Balla, and S. Rajasekaran, "Fast and Practical Algorithms for Planted (l, d) Motif Search," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, Oct.-Dec. 2007.[29] J. Davila, S. Balla, and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted Motif Search," Proc. Int'l Conf. Computational Science, pp. 822-829, 2006.[30] S. Rajasekaran, S. Balla, and C.-H. Huang, "Exact Algorithms for Planted Motif Challenge Problems," Proc. Asia-Pacific Bioinformatics Conf. (APBC), pp. 249-259, 2005.[31] T.L. Bailey and C. Elkan, "Unsupervised Learning of Multiple Motifs in Biopolymers Using EM," Machine Learning, vol. 21, nos. 1/2, pp. 51-80, 1995.[32] W. Thompson, E.C. Rouchka, and C.E. Lawrence, "Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites," Nucleic Acids Research, vol. 31, no. 13, pp. 3580-3585, 2003.[33] G. Narasimhan, C. Bu, Y. Gao, X. Wang, N. Xu, and K. Mathee, "Mining Protein Sequences for Motifs," J. Computational Biology, vol. 9, no. 5, pp. 707-720, 2002.[34] I. Rigoutsos and A. Floratos, "Motif Discovery without Alignment or Enumeration (Extended Abstract)," Proc. Second Ann. Int'l Conf. Computational Molecular Biology (RECOMB), pp. 221-227, 1998.[35] M. Tompa et al., "Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites," Nature Biotechnology, vol. 23, pp. 137-144, 2005.[36] A.W.-C. Fu, E.J. Keogh, L.Y.H. Lau, and C.A. Ratanamahatana, "Scaling and Time Warping in Time Series Querying," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 649-660, 2005.[37] M. Vlachos, G. Kollios, and D. Gunopulos, "Discovering Similar Multidimensional Trajectories," Proc. 18th IEEE Int'l Conf. Data Eng. (ICDE), pp. 673-684, 2002.[38] L. Chen, M. Tamer Ozsu, and V. Oria, "Robust and Fast Similarity Search for Moving Object Trajectories," Proc. ACM SIGMOD, pp. 491-502, 2005.[39] Y. Zhu and D. Shasha, "Warping Indexes with Envelope Transforms for Query by Humming," Proc. ACM SIGMOD, pp. 181-192, 2003.[40] A. Udechukwu, K. Barker, and R. Alhajj, "Discovering all Frequent Trends in Time Series," Proc. Winter Int'l Symp. Information and Comm. Technologies, vol. 58, pp. 1-6, 2004.[41] Y. Zhang and M.J. Zaki, "SMOTIF: Efficient Structured Pattern and Profile Motif Search," Algorithms for Molecular Biology, vol. 1, pp. 22-45, 2006.[42] G. Navarro and M. Raffinot, "Fast and Simple Character Classes and Bounded Gaps Pattern

Page 22: i Eee Papers

Matching, with Applications to Protein Searching," J. Computational Biology, vol. 10, no. 6, pp. 903-923, 2003.[43] A. Policriti, N. Vitacolonna, M. Morgante, and A. Zuccolo, "Structured Motifs Search," Proc. Eighth Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 133-139, 2004.[44] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.-F. Sagot, "Efficient Extraction of Structured Motifs Using Box-Links," Proc. Int'l Symp. String Processing and Information Retrieval (SPIRE), pp. 267-268, 2004.[45] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.-F. Sagot, "A Highly Scalable Algorithm for the Extraction of Cis-Regulatory Regions," Proc. Asia-Pacific Bioinformatics Conf. (APBC), pp. 273-282, 2005.[46] A.M. Carvalho et al., "An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 126-140, Apr.-June 2006.[47] N. Pisanti, A.M. Carvalho, L. Marsan, and M.-F. Sagot, "Risotto: Fast Extraction of Motifs with Mismatches," Proc. Seventh Latin Am. Theoretical Informatics Symp. (LATIN), pp. 757-768, 2006.[48] Y. Zhang and M.J. Zaki, "EXMOTIF: Efficient Structured Motif Extraction," Algorithms for Molecular Biology, vol. 1, pp. 21-38, 2006.[49] F. Fassetti, G. Greco, and G. Terracina, "Mining Loosely Structured Motifs from Biological Data," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 1472-1489, Nov. 2008.[50] L. DS, "Transcription Factors: An Overview," Int'l J. Biochemistry and Cell Biology, vol. 29, no. 12, pp. 1305-1312, 1997.[51] I. Jonassen, J.F. Collins, and D.G. Higgins, "Finding Flexible Patterns in Unaligned Protein Sequences," Protein Science, vol. 4, no. 8, pp. 1587-1595, 1995.[52] "Data Sets from Analysis of Financial Time Series," http://www.gsb.uchicago.edu/fac/ruey.tsay/ teachingfts/, 2010.[53] R.S. Tsay, Analysis of Financial Time Series, first ed., Wiley-Interscience, Oct. 2001.[54] P.A. Pevzner and S.-H. Sze, "Combinatorial Approaches to Finding Subtle Signals in DNA Sequences," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 269-278, 2000.[55] "cSPADE Source Code," http://www.cs.rpi.edu/zakisoftware/, 2010.[56] "CloSpan Source Code," http:/illimine.cs.uiuc.edu/, 2011.[57] "YMF Source Code," http://bio.cs.washington.edusoftware. html , 2010.[58] "Weeder Source Code," http://www.pesolelab.it/Toolind.php, 2010.[59] "Random Projections Source Code," http://www.cse.wustl.edu/jbuhlerpgt/, 2010.[60] S. Tata, R.A. Hankins, and J.M. Patel, "Practical Suffix Tree Construction," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 36-47, 2004.[61] A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 518-529, 1999.[62] T.L. Bailey and C. Elkan, "Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers," Proc. Second Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 28-36, 1994.[63] "TRANSFAC," http://www.gene-regulation.com/pub databases.html , 2011.

Page 23: i Eee Papers

Index Terms:Motif, sequence mining, suffix tree.Citation:Avrilia Floratou, Sandeep Tata, Jignesh M. Patel, "Efficient and Accurate Discovery of Patterns in Sequence Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 8, pp. 1154-1168, Mar. 2011, doi:10.1109/TKDE.2011.69

Publication 2011 Issue No. 1 - January Abstract - Data Leakage Detection

This Article

Subscribers, please Login Purchase article: $19 PDF HTML RSS feed

Share

Email this Article to a friend

Bibliographic References

ASCII Text BibTex RefWorks Procite/RefMan

Add to:

Digg Furl

Spurl Blink

Simpy Google

Del.icio.us Y!MyWeb

Page 24: i Eee Papers

Search

Similar Articles Articles by Panagiotis Papadimitriou Articles by Hector Garcia-Molina

Data Leakage Detection

January 2011 (vol. 23 no. 1)

pp. 51-63

Panagiotis Papadimitriou, Stanford University, Stanford

Hector Garcia-Molina, Stanford University, Stanford

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.100

We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody's laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party.

[1] R. Agrawal and J. Kiernan, "Watermarking Relational Databases," Proc. 28th Int'l Conf. Very Large Data Bases (VLDB '02), VLDB Endowment, pp. 155-166, 2002.[2] P. Bonatti, S.D.C. di Vimercati, and P. Samarati, "An Algebra for Composing Access Control Policies," ACM Trans. Information and System Security, vol. 5, no. 1, pp. 1-35, 2002.[3] P. Buneman, S. Khanna, and W.C. Tan, "Why and Where: A Characterization of Data Provenance," Proc. Eighth Int'l Conf. Database Theory (ICDT '01), J.V. den Bussche and V. Vianu, eds., pp. 316-330, Jan. 2001.[4] P. Buneman and W.-C. Tan, "Provenance in Databases," Proc. ACM SIGMOD, pp. 1171-1173, 2007.[5] Y. Cui and J. Widom, "Lineage Tracing for General Data Warehouse Transformations," The VLDB J., vol. 12, pp. 41-58, 2003.[6] S. Czerwinski, R. Fromm, and T. Hodes, "Digital Music Distribution and Audio Watermarking," http://www.scientificcommons. org43025658 , 2007.[7] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, "An Improved Algorithm to Watermark Numeric Relational Data," Information Security Applications, pp. 138-149, Springer, 2006.[8] F. Hartung and B. Girod, "Watermarking of Uncompressed and Compressed Video," Signal Processing, vol. 66, no. 3, pp. 283-301, 1998.

Page 25: i Eee Papers

[9] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, "Flexible Support for Multiple Access Control Policies," ACM Trans. Database Systems, vol. 26, no. 2, pp. 214-260, 2001.[10] Y. Li, V. Swarup, and S. Jajodia, "Fingerprinting Relational Databases: Schemes and Specialties," IEEE Trans. Dependable and Secure Computing, vol. 2, no. 1, pp. 34-45, Jan.-Mar. 2005.[11] B. Mungamuru and H. Garcia-Molina, "Privacy, Preservation and Performance: The 3 P's of Distributed Data Management," technical report, Stanford Univ., 2008.[12] V.N. Murty, "Counting the Integer Solutions of a Linear Equation with Unit Coefficients," Math. Magazine, vol. 54, no. 2, pp. 79-81, 1981.[13] S.U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani, "Towards Robustness in Query Auditing," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB '06), VLDB Endowment, pp. 151-162, 2006.[14] P. Papadimitriou and H. Garcia-Molina, "Data Leakage Detection," technical report, Stanford Univ., 2008.[15] P.M. Pardalos and S.A. Vavasis, "Quadratic Programming with One Negative Eigenvalue Is NP-Hard," J. Global Optimization, vol. 1, no. 1, pp. 15-22, 1991.[16] J.J.K.O. Ruanaidh, W.J. Dowling, and F.M. Boland, "Watermarking Digital Images for Copyright Protection," IEE Proc. Vision, Signal and Image Processing, vol. 143, no. 4, pp. 250-256, 1996.[17] R. Sion, M. Atallah, and S. Prabhakar, "Rights Protection for Relational Data," Proc. ACM SIGMOD, pp. 98-109, 2003.[18] L. Sweeney, "Achieving K-Anonymity Privacy Protection Using Generalization and Suppression," http://en.scientificcommons. org43196131 , 2002.

Index Terms:

Allocation strategies, data leakage, data privacy, fake records, leakage model.

Citation:

Panagiotis Papadimitriou, Hector Garcia-Molina, "Data Leakage Detection," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 1, pp. 51-63, Jan. 2011, doi:10.1109/TKDE.2010.100

Ranking Spatial Data by Quality Preferences

March 2011 (vol. 23 no. 3)

pp. 433-446

Man Lung Yiu, Hong Kong Politechnic University, Hong Kong

Hua Lu, Aalborg University, Aalborg

Nikos Mamoulis, University of Hong Kong, Hong Kong

Michail Vaitis, University of the Aegean, Mytilene

Page 26: i Eee Papers

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.119

A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to different parameters.

[1] M.L. Yiu, X. Dai, N. Mamoulis, and M. Vaitis, "Top-k Spatial Preference Queries," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.[2] N. Bruno, L. Gravano, and A. Marian, "Evaluating Top-k Queries over Web-Accessible Databases," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2002.[3] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD, 1984.[4] G.R. Hjaltason and H. Samet, "Distance Browsing in Spatial Databases," ACM Trans. Database Systems, vol. 24, no. 2, pp. 265-318, 1999.[5] R. Weber, H.-J. Schek, and S. Blott, "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces," Proc. Int'l Conf. Very Large Data Bases (VLDB), 1998.[6] K.S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, "When is 'Nearest Neighbor' Meaningful?" Proc. Seventh Int'l Conf. Database Theory (ICDT), 1999.[7] R. Fagin, A. Lotem, and M. Naor, "Optimal Aggregation Algorithms for Middleware," Proc. Int'l Symp. Principles of Database Systems (PODS), 2001.[8] I.F. Ilyas, W.G. Aref, and A. Elmagarmid, "Supporting Top-k Join Queries in Relational Databases," Proc. 29th Int'l Conf. Very Large Data Bases (VLDB), 2003.[9] N. Mamoulis, M.L. Yiu, K.H. Cheng, and D.W. Cheung, "Efficient Top-k Aggregation of Ranked Inputs," ACM Trans. Database Systems, vol. 32, no. 3, p. 19, 2007.[10] D. Papadias, P. Kalnis, J. Zhang, and Y. Tao, "Efficient OLAP Operations in Spatial Data Warehouses," Proc. Int'l Symp. Spatial and Temporal Databases (SSTD), 2001.[11] S. Hong, B. Moon, and S. Lee, "Efficient Execution of Range Top-k Queries in Aggregate R-Trees," IEICE Trans. Information and Systems, vol. 88-D, no. 11, pp. 2544-2554, 2005.[12] T. Xia, D. Zhang, E. Kanoulas, and Y. Du, "On Computing Top-t Most Influential Spatial Sites," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), 2005.[13] Y. Du, D. Zhang, and T. Xia, "The Optimal-Location Query," Proc. Int'l Symp. Spatial and Temporal Databases (SSTD), 2005.[14] D. Zhang, Y. Du, T. Xia, and Y. Tao, "Progessive Computation of The Min-Dist Optimal-Location Query," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), 2006.[15] Y. Chen and J.M. Patel, "Efficient Evaluation of All-Nearest-Neighbor Queries," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2007.[16] P.G.Y. Kumar and R. Janardan, "Efficient Algorithms for Reverse Proximity Query

Page 27: i Eee Papers

Problems," Proc. 16th ACM Int'l Conf. Advances in Geographic Information Systems (GIS), 2008.[17] M.L. Yiu, P. Karras, and N. Mamoulis, "Ring-Constrained Join: Deriving Fair Middleman Locations from Pointsets via a Geometric Constraint," Proc. 11th Int'l Conf. Extending Database Technology (EDBT), 2008.[18] M.L. Yiu, N. Mamoulis, and P. Karras, "Common Influence Join: A Natural Join Operation for Spatial Pointsets," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2008.[19] Y.-Y. Chen, T. Suel, and A. Markowetz, "Efficient Query Processing in Geographic Web Search Engines," Proc. ACM SIGMOD, 2006.[20] V.S. Sengar, T. Joshi, J. Joy, S. Prakash, and K. Toyama, "Robust Location Search from Text Queries," Proc. 15th Ann. ACM Int'l Symp. Advances in Geographic Information Systems (GIS), 2007.[21] S. Berchtold, C. Boehm, D. Keim, and H. Kriegel, "A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space," Proc. ACM Symp. Principles of Database Systems (PODS), 1997.[22] E. Dellis, B. Seeger, and A. Vlachou, "Nearest Neighbor Search on Vertically Partitioned High-Dimensional Data," Proc. Seventh Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK), pp. 243-253, 2005.[23] N. Mamoulis and D. Papadias, "Multiway Spatial Joins," ACM Trans. Database Systems, vol. 26, no. 4, pp. 424-475, 2001.[24] A. Hinneburg and D.A. Keim, "An Efficient Approach to Clustering in Large Multimedia Databases with Noise," Proc. Fourth Int'l Conf. Knowledge Discovery and Data Mining (KDD), 1998.

Index Terms:

Query processing, spatial databases.

Citation:

Man Lung Yiu, Hua Lu, Nikos Mamoulis, Michail Vaitis, "Ranking Spatial Data by Quality Preferences," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 3, pp. 433-446, Mar. 2011, doi:10.1109/TKDE.2010.119

Publication 2011 Issue No. 8 - August Abstract - Energy Time Series Forecasting Based on Pattern Sequence Similarity

This Article

Subscribers, please Login Purchase article: $19 PDF HTML

Page 28: i Eee Papers

RSS feed

Share

Email this Article to a friend

Bibliographic References

ASCII Text BibTex RefWorks Procite/RefMan

Add to:

Digg Furl

Spurl Blink

Simpy Google

Del.icio.us Y!MyWeb

Search

Similar Articles Articles by Francisco Martínez-Álvarez Articles by Alicia Troncoso Articles by José C. Riquelme Articles by Jesús S. Aguilar-Ruiz

Energy Time Series Forecasting Based on Pattern Sequence Similarity

August 2011 (vol. 23 no. 8)

pp. 1230-1243

Francisco Martínez-Álvarez, Pablo de Olavide University, Seville

Page 29: i Eee Papers

Alicia Troncoso, Pablo de Olavide University, Seville

José C. Riquelme, Pablo de Olavide University, Seville

Jesús S. Aguilar-Ruiz, Pablo de Olavide University, Seville

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.227

This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences. First, clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the prediction of a data point is provided as follows: first, the pattern sequence prior to the day to be predicted is extracted. Then, this sequence is searched in the historical data and the prediction is calculated by averaging all the samples immediately after the matched sequence. The main novelty is that only the labels associated with each pattern are considered to forecast the future behavior of the time series, avoiding the use of real values of the time series until the last step of the prediction process. Results from several energy time series are reported and the performance of the proposed method is compared to that of recently published techniques showing a remarkable improvement in the prediction.

[1] A. Abraham and B. Nath, "A Neuro-Fuzzy Approach for Forecasting Electricity Demand in Victoria," Applied Soft Computing J., vol. 1, no. 2, pp. 127-138, 2001.[2] S.K. Aggarwal, L.M. Saini, and A. Kumar, "Electricity Price Forecasting in Deregulated Markets: A Review and Evaluation," Int'l J. Electrical Power and Energy Systems, vol. 31, no. 1, pp. 13-22, 2009.[3] S.K. Aggarwal, L.M. Saini, and A. Kumar, "Price Forecasting Using Wavelet Transform and Lse Based Mixed Model in Australian Electricity Market," Int'l J. Energy Sector Management, vol. 2, no. 4, pp. 521-546, 2008.[4] N. Amjady, "Day-Ahead Price Forecasting of Electricity Markets by a New Fuzzy Neural Network," IEEE Trans. Power Systems, vol. 21, no. 2, pp. 887-896, May 2006.[5] J.P.S. Catalao, S.J.P.S. Mariano, V.M.F. Mendes, and L.A.F.M. Ferreira, "Short-Term Electricity Prices Forecasting in a Competitive Market: A Neural Network Approach," Electric Power Systems Research, vol. 77, pp. 1297-1304, 2007.[6] J. Chen, S.J. Deng, and X. Huo, "Electricity Price Curve Modeling and Forecasting by Manifold Learning," IEEE Trans. Power Systems, vol. 23, no. 3, pp. 877-888, Aug. 2008.[7] A.J. Conejo, M.A. Plazas, R. Espínola, and B. Molina, "Day-Ahead Electricity Price Forecasting Using the Wavelet Transform and ARIMA Models," IEEE Trans. Power Systems, vol. 20, no. 2, pp. 1035-1042, May 2005.[8] R. Cottet and M. Smith, "Bayesian Modeling and Forecasting of Intraday Electricity Load," J. Am. Statistical Assoc., vol. 98, no. 464, pp. 839-849, 2003.[9] D.L. Davies and D.W. Bouldin, "A Cluster Separation Measure," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224-227, Apr. 1979.[10] J. Dunn, "Well Separated Clusters and Optimal Fuzzy Partitions," J. Cybernetics, vol. 4, pp. 95-104, 1974.[11] M. El-Telbany and F. El-Karmi, "Short-Term Forecasting of Jordanian Electricity Demand Using Particle Swarm Optimization," Electric Power Systems Research, vol. 78, pp. 425-433, 2008.

Page 30: i Eee Papers

[12] S. Fan, C. Mao, J. Zhang, and L. Chen, "Forecasting Electricity Demand by Hybrid Machine Learning Model," Lecture Notes in Computer Science, vol. 4233, pp. 952-963, 2006.[13] E.A. Feinberg and D. Genethliou, Applied Mathematics for Restructured Electric Power Systems. Springer, 2005.[14] R.C. García, J. Contreras, M. van Akkeren, and J.B. García, "A GARCH Forecasting Model to Predict Day-Ahead Electricity Prices," IEEE Trans. Power Systems, vol. 20, no. 2, pp. 867-874, May 2005.[15] C. García-Martos, J. Rodríguez, and M.J. Sánchez, "Mixed Models for Short-Run Forecasting of Electricity Prices: Application for the Spanish Market," IEEE Trans. Power Systems, vol. 22, no. 2, pp. 544-552, May 2007.[16] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 1990.[17] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection," Proc. Int'l Joint Conf. Artificial Intelligence, pp. 1137-1143, 1995.[18] G. Li, C.C. Liu, C. Mattson, and J. Lawarrée, "Day-Ahead Electricity Price Forecasting in a Grid Environment," IEEE Trans. Power Systems, vol. 22, no. 1, pp. 266-274, Feb. 2007.[19] Australia's Nat'l Electricity Market, http:/www.nemmco.com. au, 2011.[20] F. Martínez-Álvarez, A. Troncoso, J.C. Riquelme, and J.M. Riquelme, "Partitioning-Clustering Techniques Applied to the Electricity Price Time Series," Lecture Notes in Computer Science, vol. 4881, pp. 990-999, 2007.[21] F. Martínez-Álvarez, A. Troncoso, J.C. Riquelme, and J.S. Aguilar Ruiz, "LBF: A Labeled-Based Forecasting Algorithm and Its Application to Electricity Price Time Series," Proc. Eighth IEEE Int'l Conf. Data Mining, pp. 453-461, 2008.[22] K. Metaxiotis, A. Kagiannas, D. Askounis, and J. Psarras, "Artificial Intelligence in Short Term Electric Load Forecasting: A State-of-the-Art Survey for the Researcher," Energy Conversion and Management, vol. 44, pp. 1525-1534, 2003.[23] Z. Mohamed and P. Bodger, "Forecasting Electricity Consumption in New Zealand Using Economic and Demographic Variables," Energy, vol. 30, pp. 1833-1843, 2005.[24] F.J. Nogales and A.J. Conejo, "Electricity Price Forecasting through Transfer Function Models," J. Operational Research Soc., vol. 57, pp. 350-356, 2006.[25] Spanish Electricity Price Market Operator, http:/www.omel.es, 2011.[26] The New York Independent System Operator, http:/www.nyiso. com, 2011.[27] S. Pezzulli, P. Frederic, S. Majithia, S. Sabbagh, E. Black, R. Sutton, and D. Stephenson, "The Seasonal Forecast of Electricity Demand: A Hierarchical Bayesian Model with Climatological Weather Generator," Applied Stochastic Models in Business and Industry, vol. 22, pp. 113-125, 2006.[28] N.M. Pindoria, S.N. Singh, and S.K. Singh, "An Adaptative Wavelet Neural Network-Based Energy Price Forecasting in Electricity Markets," IEEE Trans. Power Systems, vol. 23, no. 3, pp. 1423-1432, Aug. 2008.[29] M.A. Plazas, A.J. Conejo, and F.J. Prieto, "Multimarket Optimal Bidding for a Power Producer," IEEE Trans. Power Systems, vol. 20, no. 4, pp. 2041-2050, Nov. 2005.[30] J.M. Riquelme, J.L. Martínez, A. Gómez, and D. Cros, "Load Pattern Recognition and Load Forecasting by Artificial Neural Networks," Int'l J. Power and Energy Systems, vol. 22, no. 1, pp. 74-79, 2002.[31] L.F. Sugianto and X.B. Lu, "Demand Forecasting in the Deregulated Market: A

Page 31: i Eee Papers

Bibliography Survey," Proc. Australasian Univ. Power Eng. Conf., pp. 1-6, 2002.[32] J.W. Taylor, L.M. de Menezes, and P.E. McSharry, "A Comparison of Univariate Methods for Forecasting Electricity Demand Up to a Day Ahead," Int'l J. Forecasting, vol. 22, pp. 1-16, 2006.[33] A. Troncoso, J.C. Riquelme, J.M. Riquelme, J.L. Martínez, and A. Gómez, "Electricity Market Price Forecasting Based on Weighted Nearest Neighbours Techniques," IEEE Trans. Power Systems, vol. 22, no. 3, pp. 1294-1301, Aug. 2007.[34] A. Troncoso, J.M. Riquelme, J.C. Riquelme, A. Gómez, and J.L. Martínez, "Time-Series Prediction: Application to the Short Term Electric Energy Demand," Lecture Notes in Artificial Intelligence, vol. 3040, pp. 577-586, 2004.[35] J. Wang and L. Wang, "A New Method for Short-Term Electricity Load Forecasting," Trans. Inst. of Measurement and Control, vol. 30, no. 3, pp. 331-344, 2008.[36] X. Wang and M. Meng, "Forecasting Electricity Demand Using Grey-Markov Model," Proc. Seventh Int'l Conf. Machine Learning and Cybernetics, pp. 1244-1248, 2008.[37] R. Weron, Modeling and Forecasting Electricity Loads and Prices. Wiley, 2006.[38] R. Weron and A. Misiorek, "Forecasting Spot Electricity Prices: A Comparison of Parametric and Semiparametric Time Series Models," Int'l J. Forecasting, vol. 24, pp. 744-763, 2008.[39] R. Xu and D.C. WunschII, "Survey of Clustering Algorithms," IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005.[40] Z. Xu, Z.Y. Dong, and W. Liu, Neural Networks Applications in Information Technology and Web Engineering. Borneo Publishing, 2005.[41] J.H. Zhao, Z.Y. Dong, X. Li, and K.P. Wong, "A Framework for Electricity Price Spike Analysis with Advanced Data Mining Methods," IEEE Trans. Power Systems, vol. 22, no. 1, pp. 376-385, Feb. 2007.

Index Terms:

Time series, forecasting, patterns.

Citation:

Francisco Martínez-Álvarez, Alicia Troncoso, José C. Riquelme, Jesús S. Aguilar-Ruiz, "Energy Time Series Forecasting Based on Pattern Sequence Similarity," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 8, pp. 1230-1243, Aug. 2011, doi:10.1109/TKDE.2010.227

Peer Review Notice | Give Us Feedback

Publication 2011 Issue No. 8 - August Abstract - Integration of the HL7 Standard in a Multiagent System to Support Personalized

Access to e-Health Services

Page 32: i Eee Papers

This Article

Subscribers, please Login Purchase article: $19 PDF HTML RSS feed

Share

Email this Article to a friend

Bibliographic References

ASCII Text BibTex RefWorks Procite/RefMan

Add to:

Digg Furl

Spurl Blink

Simpy Google

Del.icio.us Y!MyWeb

Search

Similar Articles Articles by Pasquale De Meo Articles by Giovanni Quattrone Articles by Domenico Ursino

Integration of the HL7 Standard in a Multiagent System to Support Personalized Access to e-Health Services

Page 33: i Eee Papers

August 2011 (vol. 23 no. 8)

pp. 1244-1260

Pasquale De Meo, University Mediterranea de Reggio Calabria, Reggio Calabria

Giovanni Quattrone, University Mediterranea de Reggio Calabria, Reggio Calabria

Domenico Ursino, University Mediterranea de Reggio Calabria, Reggio Calabria

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.174

In this paper, we present a multiagent system to support patients in search of healthcare services in an e-health scenario. The proposed system is HL7-aware in that it represents both patient and service information according to the directives of HL7, the information management standard adopted in medical context. Our system builds a profile for each patient and uses it to detect Healthcare Service Providers delivering e-health services potentially capable of satisfying his needs. In order to handle this search it can exploit three different algorithms: the first, called PPB, uses only information stored in the patient profile; the second, called DS-PPB, considers both information stored in the patient profile and similarities among the e-health services delivered by the involved providers; the third, called AB, relies on {\rm A}{\bf^*}, a popular search algorithm in Artificial Intelligence. Our system builds also a social network of patients; once a patient submits a query and retrieves a set of services relevant to him, our system applies a spreading activation technique on this social network to find other patients who may benefit from these services.

[1] Health Level Seven (HL7), http:/www.hl7.org, 2011.[2] Logical Observation Identifiers Names and Codes (LOINC), http://www.regenstrief.orgloinc/, 2011.[3] Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), http:/www.snomed.org, 2011.[4] O. Baujard, V. Baujard, S. Aurel, C. Boyer, and R.D. Appel, "MARVIN, Multi-Agent Softbot to Retrieve Multilingual Medical Information on the Web," Medical Informatics, vol. 23, no. 3, pp. 187-191, 1998.[5] N.J. Belkin, D. Kelly, G. Kim, J.Y. Kim, H.J. Lee, G. Muresan, M.C. Tang, X.J. Yuan, and C. Cool, "Query Length in Interactive Information Retrieval," Proc. ACM SIGIR, pp. 205-212, 2003.[6] L. Braun, F. Wiesman, H.J. van den Herik, A. Hasmanb, and E. Korstenc, "Towards Patient-Related Information Needs," Int'l J. Medical Informatics, vol. 76, nos. 2/3, pp. 246-251, 2007.[7] L. Braun, F. Wiesman, J. van den Herik, and A. Hasman, "Agent Support in Medical Information Retrieval," Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI) Workshop Agents Applied in Health Care, pp. 16-25, 2005.[8] C. Cáceres, A. Fernández, S. Ossowski, and M. Vasirani, "Agent-Based Semantic Service Discovery for Healthcare: An Organizational Approach," IEEE Intelligent Systems, vol. 21, no. 6, pp. 11-20, Nov./Dec. 2006.[9] A. Cesta and D. D'Aloisi, "Building Interfaces as Personal Agents: A Case Study," ACM SIGCHI Bulletin, vol. 28, no. 3, pp. 108-113, 1996.

Page 34: i Eee Papers

[10] W.W. Chu, V. Liu, W. Mao, and Q. Zou, "KMeX: A Knowledge-Based Approach for Retrieving Scenario-Specific Medical Text Documents," Biomedical Information Technology, D. Feng, ed., chapter 14, pp. 307-342, Elsevier Academic Press, 2008.[11] M. Eichelberg, T. Aden, J. Riesmeier, A. Dogac, and G.B. Laleci, "A Survey and Analysis of Electronic Healthcare Record Standards," ACM Computing Surveys, vol. 37, no. 4, pp. 277-315, 2005.[12] European Commission, "Reliable Health Information at the Click of a Mouse European Commission Launches New Health Portal," technical report, http://europa.eu/rapid/ pressReleasesAction.do?reference=IP/ 06597&format= HTML&aged=0&language=EN&guiLanguage=en , 2006.[13] L. Francisco-Revilla and F.M. Shipman III, , "Adaptive Medical Information Delivery Combining User, Task and Situation Models," Proc. ACM Int'l Conf. Intelligent User Interfaces (IUI '00), pp. 94-97, 2000.[14] M. Hay, G. Miklau, D. Jensen, D.F. Towsley, and P. Weis, "Resisting Structural Re-Identification in Anonymized Social Networks," Proc. VLDB Endowment, vol. 1, no. 1, pp. 102-114, 2008.[15] A.L. Houston, H. Chen, B.R. Schatz, S.M. Hubbard, R.R. Sewell, and T.D. Ng, "Exploring the Use of Concept Spaces to Improve Medical Information Retrieval," Decision Support Systems, vol. 30, no. 2, pp. 171-186, 2000.[16] D. Isern, D. Sánchez, and A. Moreno, "HeCaSe2: A Multi-Agent Ontology-Driven Guideline Enactment Engine," Proc. Int'l Central and Eastern European Conf. Multi-Agent Systems (CEEMAS '07), pp. 322-324, 2007.[17] D. Isern, D. Sánchez, A. Moreno, and A. Valls, "HeCaSe: An Agent-Based System to Provide Personalised Medical Services," Proc. Workshop Agentes Inteligentes en el Tercer Milenio en X Conf. Asoc. Espanola Para la Inteligencia Artificial (CAEPIA '03), 2003.[18] D. Isern, D. Sánchez, A. Moreno, and A. Valls, "HeCaSe: Provision of Secure Personalised Medical Services," Proc. European Workshop Multi-Agent Systems (EUMAS '03), 2003.[19] D. Isern, A. Valls, and A. Moreno, "Using Aggregation Operators to Personalize Agent-Based Medical Services," Proc. Int'l Knowledge-Based Intelligent Information and Eng. Systems, (KES '06), part II, pp. 1256-1263, 2006.[20] G. Koutrika and Y. Ioannidis, "Personalized Queries under a Generalized Preference Model," Proc. IEEE Int'l Conf. Data Eng. (ICDE '05), pp. 841-852, 2005.[21] Z. Liu and W.W. Chu, "Knowledge-Based Query Expansion to Support Scenario-Specific Retrieval of Medical Free Text," Information Retrieval, vol. 10, no. 2, pp. 173-202, 2007.[22] K.R. McKeown, N. Elhadad, and V. Hatzivassiloglou, "Leveraging a Common Representation for Personalized Search and Summarization in a Medical Digital Library," Proc. ACM/IEEE Joint Conf. Digital Libraries (JCDL '03), pp. 159-170, 2003.[23] A. Moreno, A. Valls, D. Isern, and D. Sánchez, "Applying Agent Technology to Healthcare: The GruSMA Experience," IEEE Intelligent Systems, vol. 21, no. 6, pp. 63-67, Nov./Dec. 2006.[24] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall, 2002.[25] L. Sweeney, "K-Anonymity: A Model for Protecting Privacy," Int'l J. Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pp. 557-570, 2002.[26] A. Valls, A. Moreno, and D. Sánchez, "A Multi-Criteria Decision Aid Agent Applied to the Selection of the Best Receiver in a Transplant," Proc. Int'l Conf. Enterprise Information Systems (ICEIS '02), vol. 1, pp. 431-438, 2002.

Page 35: i Eee Papers

[27] S. Walczak, "A Multiagent Architecture for Developing Medical Information Retrieval Agents," J. Medical Systems, vol. 27, no. 5, pp. 479-498, 2003.

Index Terms:

Intelligent agents, multiagent systems, healthcare, human-centered computing, knowledge personalization and customization, HL7, personalized search of services.

Citation:

Pasquale De Meo, Giovanni Quattrone, Domenico Ursino, "Integration of the HL7 Standard in a Multiagent System to Support Personalized Access to e-Health Services," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 8, pp. 1244-1260, Aug. 2011, doi:10.1109/TKDE.2010.174

Publication 2011 Issue No. 12 - December Abstract - On Producing High and Early Result Throughput in Multijoin Query Plans

This Article

Subscribers, please Login Purchase article: $19 PDF HTML RSS feed

Share

Email this Article to a friend

Bibliographic References

ASCII Text BibTex RefWorks Procite/RefMan

Add to:

Digg Furl

Spurl

Page 36: i Eee Papers

Blink

Simpy Google

Del.icio.us Y!MyWeb

Search

Similar Articles Articles by Justin J. Levandoski Articles by Mohamed E. Khalefa Articles by Mohamed F. Mokbel

On Producing High and Early Result Throughput in Multijoin Query Plans

December 2011 (vol. 23 no. 12)

pp. 1888-1902

Justin J. Levandoski, University of Minnesota, Minneapolis

Mohamed E. Khalefa, University of Minnesota, Minneapolis

Mohamed F. Mokbel, University of Minnesota, Minneapolis

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2010.182

This paper introduces an efficient framework for producing high and early result throughput in multijoin query plans. While most previous research focuses on optimizing for cases involving a single join operator, this work takes a radical step by addressing query plans with multiple join operators. The proposed framework consists of two main methods, a flush algorithm and operator state manager. The framework assumes a symmetric hash join, a common method for producing early results, when processing incoming data. In this way, our methods can be applied to a group of previous join operators (optimized for single-join queries) when taking part in multijoin query plans. Specifically, our framework can be applied by 1) employing a new flushing policy to write in-memory data to disk, once memory allotment is exhausted, in a way that helps increase the probability of producing early result throughput in multijoin queries, and 2) employing a state manager that adaptively switches operators in the plan between joining in-memory data and disk-resident data in order to positively affect the early result throughput. Extensive experimental results show that the proposed methods outperform the state-of-

Page 37: i Eee Papers

the-art join operators optimized for both single and multijoin query plans.

Index Terms:

Database management, systems, query processing.

Citation:

Justin J. Levandoski, Mohamed E. Khalefa, Mohamed F. Mokbel, "On Producing High and Early Result Throughput in Multijoin Query Plans," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 12, pp. 1888-1902, Sept. 2011, doi:10.1109/TKDE.2010.182

Peer Review Notice | Give Us Feedback

Usage of this product signifies your acceptance of the Terms of Use.

2012 Fifth International Conference on Intelligent Computation Technology and Automation

The Application of Web Data Mining in the Electronic Commerce

Zhangjiajie, Hunan China

January 12-January 14

ISBN: 978-0-7695-4637-7

Weigang Zuo

Qingyi Hua

Weigang Zuo

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICICTA.2012.90

With the increasing of the information on Internet, more and more electronic data are appearing. Then, how should we immidiately discover useful knowledge and improve the utilization rate of information without being confused in the sea of information? Data mining come up with a new way of dealing with such problem. This paper sets force web data mining sources in e-commerce, the flow process and some techniques in dealing with web data mining. Finally, analyses the functions of web data mining used in e-commerce.

Index Terms:

web data mining, electronic commerce, clustering algorithm

Citation:

Weigang Zuo, Qingyi Hua, Weigang Zuo, "The Application of Web Data Mining in the Electronic Commerce," icicta, pp.337-339, 2012 Fifth International Conference on Intelligent Computation

Page 38: i Eee Papers

Technology and Automation, 2012