Fuzzy Clustering: An Approachfor Mining Usage Profilesfrom Web Ms.Archana N. Boob,Prof. D. M. Dakhane Sipna’s COET,Amravati Asst. Prof, Sipna’s COET,Amravati [email protected]Abstract Web usage mining is an application of data mining technology to mining the data of the web server log file. It can discover the browsing patterns of user and some kind of correlations between the web pages. Web usage mining provides the support for the web site design, providing personalization server and other business making decision, etc. Web mining applies the data mining, the artificial intelligence and the chart technology and so on to the web data and traces users' visiting characteristics, and then extracts the users' using pattern.In this paper, we present an approach to cluster Web site users into different groups. By using fuzzy clustering, we enable generation of overlapping clusters that can capture the uncertainty among Web user’s navigation behaviour. Key-Words:Web mining, clustering, fuzzy clustering, and personalization. I. Introduction The last years have been characterized by an exponential growth both of the number of online available Web applications and of the number of their users. This growth has generated huge quantities of data related to user interactions with the Web sites, stored by the servers in user access log files. On the other hand, the degree of personalization that a Web site is able to offer in presenting its services to users represents an important attribute contributing to the site’s success. Hence, the need for a Web site that understands the interests of its users is becoming a fundamental issue. If properly exploited, log files can reveal useful information about user preferences. Therefore data mining, intended as knowledge discovery process from large database, has naturally found application on Web data, leading to the so-called Web Mining [1], [2], [3], [4]. Three principal areas can be identified in Web Mining: Web Content Mining which focuses on the information available in the web pages. Web Structure Mining which searches the information resources in the structure of web sites. Web Usage Mining which deals with the knowledge extraction from server log files in order to derive useful patterns of user access. Recently, several research activities have especially investigated Web Usage Mining techniques and a lot of works have been published on these topics [4], [5], [6], [7], [8], [9],[10]. A variety of traditional machine learning methods have been used for pattern discovery in Web Usage Mining [11], [12]. Among these, unsupervised methods, especially clustering, seem to be the most appropriate to group users with common browsing behaviour. A wide range of applications can benefit from the knowledge discovered by the clustering process, from real time content personalization to dynamic link suggestion. In the choice of the clustering method for Web Usage Mining, one important constraint to be considered is the possibility to obtain overlapping clusters, so that a user can belong to more than one group. To deal with the ambiguity and the uncertainty underlying Web interaction data, fuzzy reasoning appears to be an effective tool. In the dissertation, we use the fuzzy clusteringto categorize user sessions in order to derive groups of users which exhibit similar access patterns from web log data. The obtained clusters which can be exploited to implement different personalization functions, such as dynamic suggestion of links to Web pages retained interesting for the user. II. Related Work Clustering is a process of discovering groups of objects such that the objects belonging to the same group are similar in a certain manner, and the objects belonging to different groups are dissimilar. There are many algorithms in the literature that deal with the problem of clustering large number of objects. The different algorithms can be classified regarding different aspects. One of the key issue, which also determines another features of the algorithm is the basic approach of the clustering algorithm. The aim of the partition-based algorithmsis to decompose the set of objects into a set of disjoint clusters where the number of the resulting clusters is predefined by the user. The algorithm uses an iterative method, and based on a distance measure it updates the cluster of each object. It is done until any changes can be made. The most representative partition-based clustering
3
Embed
Fuzzy Clustering: An Approachfor Mining Usage Profilesfrom Web
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
algorithms are the kmeans and the k-mediod. The
advantage of the partitionbased
algorithms that they use an iterative way to create
the clusters, but the drawback is, that the number of
clusters have to be determined in advance and only
spherical shapes can be determined as clusters.
Hierarchical algorithmsprovides a
hierarchical grouping of the objects. There exist
two approaches, the bottom-up and the top-down
approach.In case of bottom-up approach, at the
beginning of the algorithm each object represents a
different cluster and at the end all objects belong to
the same cluster. In case of top-down method at the
start of the algorithmall objects belong to the same
cluster which is split, until each object constitute a
different cluster. A key aspect in these kind of
algorithms is the definition of the distance
measurements between the objects and between the
clusters. The drawback of the hierarchical
algorithm is that after an object is assigned to a
given cluster it cannot be modified later.
Furthermore, like in partition-based case, also only
spherical clusters can be obtained. The advantage
of the hierarchical algorithms is that the validation
indices (correlation, inconsistency measure), which
can be defined on the clusters, can be used for
determining the number of the clusters.
Density-based algorithmsstart by
searching for core objects, and they are growing the
clusters based on these cores and by searching for
objects that are in a neighbourhood within a radius
of a given object. The advantage of these type of
algorithms is that they can detect arbitrary form of
clusters and it can filter out the noise.
Grid-based algorithmsthe grid-based
algorithms use a hierarchical grid structure to
decompose the object space into finite number of
cells. For each cell statistical information is stored
about the objects and the clustering is achieved on
these cells. The advantage of this approach is the
fast processing time that is in general independent
of the number of data objects.
Fuzzy clusteringsuppose that no hard
clusters exist on the set of objects, but one object
can be assigned to more than one cluster. The best
known fuzzy clustering algorithm is FCM.
III. Analysis of Problem With the explosive growth of information
sources available on the World Wide Web and the
rapidly increasing pace of adoption to internet
commerce, internet has evolved into a gold mine
that contains or dynamically generates information
that is beneficial to E-businesses. A web site is the
most direct link a company has to its current and
potential customers. The companies can study
visitor’s activities through web analysis, and find
the patterns in the visitor’s behavior. Web usage
patterns could be directly applied to efficiently
manage activities related to e-Business, e-CRM, e-
Services, e-Education, e-Newspapers, and e-
Government . With the large number of companies
using the internet to distribute and collect
information, knowledge discovery on the web has
become an important research area.
Application like Customer Relationship
Management (CRM) can use data from within and
outside an organization to allow an understanding
of its customer on individual basis or on the group
basis such as by forming customer’s profiles. An
improved knowledge about the customers’
preference and needs forms the basis for effective
CRM. For the better business it’s important to keep
loyalty of their old customers and to lure new
customers. Automated data mining or knowledge
discovery techniques can be used to discover web
user profiles. These mass user profiles can
automatically extract frequent access patterns from
the history of the previous user click streams stored
in web log files. Although there have been
considerable advances in web usage mining ,there
have been no detailed studies presenting a fully
integrated approach to mine a real web sites, such
as evolving profiles, dynamic content and the
availability of taxonomy or database in addition to
web logs.
IV. Proposed Work The general scheme of the proposed approach for
mining usage profiles using fuzzy clustering is as
shown below:
Web Log Data
When users visit a Web site, the Web server
stores the information about their accesses in a log
file. Each record of a log file represents a page
request executed from a Web user. In particular, it
typically contains the following information: user’s
IP address, date and time of the access, URL of the
requested page, request protocol, a code indicating
the status of the request.
Usage Pre-processing
The aim of the pre-processing step is to
identify user sessions starting from the information
contained in the access log file.
Data pre-processing involves two main steps: data
cleaning and user session identification.
Archana N Boob et al ,Int.J.Computer Technology & Applications,Vol 3 (1),329-331