Top Banner
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging tasks. Much of the Web mining is about Data/information extraction from semi- structured objects and free text, and Integration of the extracted data/information
16

The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging tasks. Much of the Web mining is about

Data/information extraction from semi-structured objects and free text, and Integration of the extracted data/information

Page 2: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Problems Faced By Information Users :

• Finding Relevant Information

• Personalization of the information available on the web

• Learning about customers or individual users

Page 3: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

What is

WEB MINING ?

Page 4: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Web Mining is the use of data mining techniques to automatically discover and extractInformation from web documents and services

The world wide web, www or web is becoming a complex universe. Naturally, deriving something valuable out of it is targeted use of web mining.

Page 5: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Three sub categories :

• Web Content Mining

• Web Structure Mining

• Web Usage Mining

Page 6: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Web Content Mining

• Web Content Mining refers to the discovery of useful information from the web content

• Here Content Refers to Text , Audio Video etc. that numerous websites are holding.

• The data could be unstructured, semi structured or structured

Page 7: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Example of Web Content Mining

• Typical Google or Yahoo or Microsoft Bing search that we do, and the resultant links listing page we get is an example of content mining. The process of extracting useful information from the web content happens here.

Page 8: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Web Structure Mining

• Web structure mining is done at the hyper link level. This kind of mining tries to discover the model underlying the link structure of the web.

• A relevant example can be Google’s Page rank.

Page 9: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Example of Web Structure Mining

HITS and Page Rank are applied web structure mining uses.

Page 10: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Web Usage Mining

• Web usage mining process involves the log time of pages. The world’s largest portal like yahoo, msn etc., needs a lot of insights from the behavior of their users’ web visits. Without this usage reports, it will be difficult to structure their monetization efforts. Usage mining has direct impact on businesses

Page 11: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Example of Web Usage Mining

• A particular feature of website that is used by the visitors frequently, that we want to enhance and pronounce so as to increase the usage that can appeal more to users of the website

• Simply by understanding the movement of the guests and the behavior of surfing the net, you can look forward to meet the preferences and the needs in a better manner and popularize your website among the masses in the internet arena.

Page 12: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Web Usage Mining Tools

• Web Usage Mining tools hold the capability to provide valuable information of the user navigation patterns by analyzing the server and client logs. By processing this data, using simple statistical data or complex data mining techniques, we can identify trends and patterns concerning the activity on the Web

Page 13: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Speedtracer

• Uses the referrer page and the URL of the requested page to construct traversal path.

• Each identified user session is mapped into a transaction

• Data mining techniques are applied in order to discover the most frequent user traversal paths and the most frequently visited groups of pages.

Page 14: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Advantages

• Identifies individual user sessions. • It does not require “cookies” or user

registration for session identification. • User privacy is protected.

Page 15: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

WUM (Web Utilization Miner)

• Employs an innovative technique for the discovery of navigation patterns over an aggregated materialized view of the web log.

• It operates on two modules : • Aggregation Service which prepares the web

log data for mining and • MINT-Processor which does the mining.

Page 16: The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.

Thank you!