Search engine page rank demystification
Post on 24-Apr-2015
743 Views
Preview:
DESCRIPTION
Transcript
By, Rajanagan R Web Analyst
Search Engines
What is Search Engine.???
A Search Engine is an information retrieval system designed
to help find information stored on a computer system, such
as on the World Wide Web.
A web search tool that automatically visits websites (using
crawlers), records and indexes them within its database, and
generates results based on a user's search criteria.
Unlike Web directories, which are maintained by human
editors, search engines operate algorithmically or are a
mixture of algorithmic and human input.
History of Search Engines
1993: First web robot – World Wide Web WandererMatthew Gray, Physics student from MITObjective: Track all pages on web to monitor growth of the web
1994: First search engine – WebCrawler, Brian Pinkerton, CS student from U of WashingtonObjective: Download web pages, store the links linked to keyword-searchable DB
1994: Jerry’s Guide to the InternetJerry Yang, David Filo, Stanford UniversityObjective: Crawl for web pages, organize them by content into hierarchies Yet Another Hierarchical Officious Oracle (Yahoo)
1994-97: Infoseek, AltaVista, Excite, Lycos, LookSmart (meta engine) Ranking Based on Content & Structure
1998: Google (Sergey Brin, Larry Page, CS students, Stanford University) Ranking Based on Content, Structure & Value
1990: First tool for Searching on Internet - ArchieAlan Emtage, Student from McGill University in MontrealObjective: Tool for Indexing FTP archives, allowing people to find specific files.
How Search Engine Works..????
Step 1: Crawling
Want to See what Crawler looks @
Click Here
Crawler Looks @ Example
Back This is what I look in a
website..!!!
Step 2 : Indexing
Indexed Database Click Here
Back
Step 3 : Processing Query
Step 4 : Ranking
Overall Functioning of Search Engines
Your Browser
The Web
URL1
URL2
URL3 URL4
Crawler
Indexer
SearchEngine
Database Eggs?Eggs.
Eggs - 90%Eggo - 81%Ego- 40%
Huh? - 10%
All AboutEggs
in a fraction of second
SERP
Page Rank???
Google Page Rank Algorithm
Back Bone of Google Technology developed by Larry Page & Sergey Brin in 1998.
Ranks Pages based on the number of other pages that link to it.
Calculated by the nature and the number of Back links producing the SERP Listing.
Google toolbar shows the page rank as scale value from 0 -10, you can find at - www.toolbar.google.com. But it’s just an rough guide not the Actual or the Real PR. Nevertheless, it can be a good indication for SEO practitioners to know whether the website is moving in the right (or wrong) direction.
Definition of Page Rank In order to measure the relative importance of web pages, Page Rank is
proposed. It is a method for computing a ranking for every web page based on the graph (Links) of the web.
We assume,T1...Tn – Links in page A which point to it (i.e., are citations). D - Damping factor which can be set between 0 and 1, usually set d=0.85. C(A) - Number of links going out of page A i.e. Outgoing links
The Page Rank of a page A is given as follows,
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Note: Page Ranks form a probability distribution over web pages, so the average of all web pages Page Ranks will be one.
Calculating Page Rank
The PR of each page depends on the PR of the pages pointing to it. We won’t know what PR those pages have until the pages pointing to them have their PR calculated and it goes on..
Seems impossible in calculating PR..! But there is a Solution..! Here we Go.!!!
Page Rank can be calculated using a simple iterative algorithm, corresponds to the principal eigenvector of the normalized link matrix of the web.
It means, We can calculate a page’s PR without knowing the final value of the PR of the
other pages. What we need to do :- Remember the each value we calculate Repeat the calculations lots of times until
the numbers stop changing much.
Simple hierarchy
Each page has one outgoing link, i.e. C(A) = 1 and C(B) = 1)
We don’t know the PR of the pages, lets assume each has PR = 1.00 , d = 0.85
PR(A) = (1 – d) + d(PR(B)/1) PR(B) = (1 – d) + d(PR(A)/1)
i.e.PR(A) = 0.15 + 0.85 * 1 = 1PR(B) = 0.15 + 0.85 * 1= 1
We started out with a lucky guess..! The numbers aren't changing at all..!
Complex Hierarchy
Average PR : 0.378 PR Loss : 8 – (.92+.41+.41+.41+.22+.22+.22+.22)0.378 = 7.622
For Calculation Click Here
Complex Hierarchy with Avg PR = 1.0000
Average PR : 1.0000 PR Loss : 8 – (3.35+1.1+1.1+1.1+.34+.34+.34+.34) = 0.0000
FinallyObservation:
It doesn't matter how many pages you have in your site, your average PR will always be 1.0 at best. But a hierarchical layout can strongly concentrate votes and therefore the PR.
Page Rank is, in fact, very simple (apart from one scary looking formula). But when a simple calculation is applied hundreds (or billions) of times over the results can seem complicated.
Page Rank is also only part of the story about what results get displayed high up in a Google listing. Google also pays attention to the text in a link's anchor when deciding the relevance of a target page perhaps more than the page's PR.
Page Rank is still part of the listings story though, so it's worth your while as a good designer to make sure you understand it correctly.
DFID 200623
ReferencesThe PageRank paper by Google's founders Sergey Brin and Lawrence Page
http://www-db.stanford.edu/~backrub/google.html
Chris Ridings' "PageRank Explained" paper which, as of April 2002 http://web.archive.org/web/*/
http://www.goodlookingcooking.co.uk/PageRank.pdf
An excellent discussion by Douglas W. Jones http://www.cs.uiowa.edu/~jones/cards/chad.html
http://www.sirgroane.net/google-page-rank/
http://www.youtube.com/watch?feature=player_embedded&v=h3Jup5R1MGY#!
http://www.searchnerd.com/pagerank/
Thank You..!!!
Queries if any please.!!Reach me @ rajanagan.tpgit@gmail.com
Next
Back
top related