What is web link mining? ?

Post on 25-Feb-2016

30 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Virtual Knowledge Studio (VKS). Information Studies. What is web link mining? ?. Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK. 1. Definition and scope. Link analysis is: - PowerPoint PPT Presentation

Transcript

What is web link mining? ?

Mike ThelwallStatistical Cybermetrics Research GroupUniversity of Wolverhampton, UK

Virtual Knowledge Studio (VKS) Information Studies

1. Definition and scopeLink analysis is:

mapping and measuring hyperlink networks for collections of web pages or sites

a flexible toolkit of methods and software rather than a field or single technique

A new source of information about: relationships between people, organisations and

information - via the web the impact of information and ideas

Used in: media studies, information science, politics,

marketing, sociology

Link Analysis: MotivationIndividual hyperlinks reflect concrete creation reasons such as connections between web page contents or creatorsCounts of large numbers of hyperlinks may reflect wider underlying social processes Links may reflect phenomena that have previously been difficult to study; e.g., informal scholarly communication informal news discussions friendship patterns “amateur” politics

But link patterns vary by context…Commercial web sites tend not to link muchAcademic and government web sites link moreDisciplinary differences: e.g., History Web use is very low, Chemistry is very highIndividual projects/resources can have an enormous impact upon web sites E.g. Arts web sites are often for specific exhibitions

or for digital media projectsLinks often not frequent enough to reliably reveal underlying patterns

Link Type Definitions

Inlink – a hyperlink to a web page from anywhereSite inlink – a hyperlink to a web page from a different web siteOutlink – a hyperlink from a web page to any otherSite outlink – a hyperlink from a web page to a page in a different site

A

B

Indirect link types - colinksUseful when direct links rare Indirect connectionCo-inlinks B and C co-inlinked

Co-outlinks D and E co-outlinked

B C

A

D E

F

Lennart Björneborn’s terminology

What to count?Links between individual pagesLinks between entire web sites Site A links to site B if any page in site

A links to any page in site B

A B

2. Link Networks – Methods

Draw a network diagram LexiURL Searcher, Issue Crawler, SocSciBot

(web networks) Pajek, UCINET, NetMiner (generic networks) About 10-50 sites/pages is recommended Diagrams should reveal patterns in the data

Social Network Analysis statistics E.g., density, degree centrality

Direct link networksStart with list of web sites (or pages)Build from many linkdomain:A site:B Yahoo searches Powerful and free way to scan the entire web

for links! Returns pages in web site B that link to web

site A Can be automated with LexiURL Searcher Or use SocSciBot to crawl web sites and get

linkse.g., linkdomain:ox.ac.uk site:pku.edu.cn

Top ASEAN universities network

Direct linksexample

(withHan WooPark)

arrowsrepresent> 100 links

unconnecteduniversitiesremoved

Co-inlink networksStart with a list of web sites or pagesBuild from many linkdomain:A linkdomain:B -site:A -site:B Yahoo searches

can be automated in LexiURL SearcherSuitable for commercial or competitive web sites that do not interlink

normally better than direct link diagramsA web environment (co-inlink) network for a single web site

finds web sites that link to it picks the top 50 web sites liked to by these web sites draws a co-inlink diagram of these web sites

Indirect linksexample

The web environment ofZigZagMag

Another example –no patternsbut interesting

3. Link Impact - MethodsInlink counts often used as an impact/visibility indicator Impact = “The effect or impression of

one thing on another”, “to have an effect” *

Compare links to web sites to assess which site/organisation has the most online impact

* http://www.thefreedictionary.com/impact, definition 3

Link Impact ReportsStandardised comparative analysis of the link impact of web sitesExample audit:http://cybermetrics.wlv.ac.uk/audit/101/Similar reports can be created for non-link impact (citation impact)http://cybermetrics.wlv.ac.uk/audit/books/

Total impact example

impact spreadexample

4. ToolsE.g., …

Links to UK universities

against their research

productivityThe reason for the strong correlation is the quantity of Web publication, not its quality

5. Statistical analyses…

More statistical analyses…

Universities tend to link to neighbours

6. Content analysis

Content analysis of random sample of links recommended to get contextExample of usefulness of content analysis results: 90% of links between UK university sites relate to

scholarly activity But less than 1% are equivalent to citations

Link counts do not measure research but are a natural by-product of scholarly activity Use link counts to track (an aspect of) communication

7. SummaryLink networks To investigate relationship patterns within

collections of web sitesLink impact Compare impact of web sites using inlinks

Methods Toolkit of visual and statistical methods Specialist software like LexiURL Searcher &

Issue CrawlerUse to investigate web phenomena or offline phenomena reflected online in web sites

BooksThelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool.Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press.Thelwall, M. (2004). Link analysis: An information science approach. San Diego: Academic Press. http://lexiurl.wlv.ac.uk http://webometrics.wlv.ac.uk http://www.issuecrawler.net

top related