SEO Masterclass
Rand Fishkin, CEO & Co-founder, SEOmoz
Webit, Sofia October 2010
Topics for the Masterclass
• Correlation analysis of search results• Changes from Google Instant• Information architecture & navigation structure• Overcoming Twitter’s cannibalization of the link graph• Making Analytics Actionable• New Research: Topic Modeling in the Search Results
Use Statistical Analysis to Answer
Important SEO Questions
Correlation ≠ Causation
The more I wear suits, the more I speak on panels.
Therefore: wearing suits causes me to speak on panels.
Understanding Correlation Significance
No Correlation
Exact Match Domain
Perfect Correlation
Most of our data for search rankings falls in this region(which we’d expect given algorithms w/ 200+ ranking factors)
Question #1:
How to Best “Optimize” a Site for
Search Engine Rankings
Methodology
• 11,351 SERPs via Google AdWords Suggest
• 1st Page Only (usually ~10 results per page)
• Correlations are w/ Higher Position on Page 1
• Controlled for SERPs Where All (or None) of the
Results Matched the Metric
Methodology
Looking for elements that higher ranking
pages have that lower ones do not
NOT looking at raw counts of how many
pages featured a given element
Contains All Query Terms in Domain Name
Exact Match Hyphenated
Domain
Exact Match Domain
Highest Stderr = 0.0241804
Our Interpretation
• Exact match domains remain powerful in both
engines (anchor text could be a factor, too)
• Hyphenated versions are less powerful, though
more frequent in Bing (G: 271 vs. B: 890)
• Just having keywords in the domain name has
substantial positive correlation
Highest Stderr = 0.00350211
KWs in Body
KWs in Alt Attribute
KWs in H1 Tag
KWs in URL
KWs in Title
Our Interpretation
• The Alt attribute of images is interesting
• Putting KWs in URLs is likely a best practice
• Everyone optimizes titles (G: 11,115 vs. B: 11,143).
Differentiating here is hard.
• (Simplistic) on-page optimization isn’t a huge factor
Highest Stderr = 0.0269818
.gov
.edu
.info
.net
.org
.com
Our Interpretation
• More reasons to believe Google when they
say .gov, .info and .edu are not special cased
• The .org TLD extension is surprising – do they earn
more links? Less spam? More non-commercial?
• Don’t forget about branding/user behavior - .com is
still probably a very good thing (at least own it)
Highest Stderr = 0.0033353
Content Length (tokes in body)
URL Length(chars.)
Domain Name Length (chars.)
Our Interpretation
• Shorter URLs are likely a good best practice
(especially on Bing)
• Long domains may not be ideal, but aren’t awful
• Raw content length seems marginal in correlation
Question #2:
What Kind of Links Matter & How
Should We Evaluate Links?
Highest Stderr = 0.00335677
# of Linking Root Domains to URL
# of Links to URL
Our Interpretation
• Links are likely still a major part of the algorithms
• Bing may be slightly more naïve in their usage of
link data than Google, but better than before
• Diversity of link sources remains more important
than raw link quantity
Highest Stderr = 0.00415058
# of Links w/ exact match anchor text
# of linking root domains w/
exact match anchor text
Our Interpretation
• Many anchor text links from the same domain likely
don’t add much value
• Anchor text links from diverse domains, however,
appears highly correlated
• Bing and Google are relatively similar in evaluating
these metrics
Correlation of Page-Level Link Valuation Metrics
Our Interpretation
• PageRank (and similar algorithms) are not particularly
representative of rankings (but are somewhat correlated)
• Linking domains are likely a better metric than raw links
• Page Authority is reasonably good, but has a way to go
Correlation of Domain-Level Link Valuation Metrics
Our Interpretation
• No single domain valuation metric is especially well
correlated with rankings
• Rankings of individual pages may be more
disparate we typical think re: “domain authority”
• Overall, we’re still very naïve when it comes to
understanding how links influence search rankings
Question #3:
How Does Google Instant Change
Keyword Demand / SEO?
http://www.readwriteweb.com/archives/report_google_search_box_in_firefox_accounts_for_9.php
Are Most Users Seeing/Using Google Instant?
Methodology: Keyword Referral Search Data
• Look at keyword sending traffic via analytics
• Distribute into groups by word-length
• Analyze shifts in demand by keywords that
brought visits to the site
• Compare from period prior to Google Instant
and directly after
http://www.mecmanchester.co.uk/blog/google-instant-data-after-12-days.html
Via MEC Manchester (UK)5 Sites, 4 Verticals, 10K+ Keywords
Via Distilled Consulting (UK)11 Sites, Various Sizes (3.5K – 75K weekly visits), 75K+ Keywords
http://www.distilled.co.uk/blog/seo/impact-of-google-instant/
Via ConductorMultiple sites, 880K visits, 10Ks of keywords
http://blog.conductor.com/2010/09/what%E2%80%99s-been-the-impact-of-google-instant-on-searcher-behavior-so-far-not-much/
Interesting Takeaways
• Google Instant seems not to have shifted keyword
demand by much (if at all)
• Google “suggest” has been out for a long time
already; users are likely accustomed to this feature
• The “long tail” may get longer/shorter over time, but
Instant seems less responsible than other factors
Goals of Successful Information
Architecture
Semantically Logical Structure
Minimize Click-Depth
Maximize Usability of Navigation
Tips for Semantically Useful
Navigation
Initially Design without Keyword Research
Add in Keyword Research Based Modifications
Validate Architecture/Path with Non-SEOs
Tips for Minimal Click-Depth
Imitate the Ideal Nav Pyramid
Broad Linking at Top Levels
Editorial Categorization > User-Defined
Editorial Categorization > User-Defined
HACK: Multi-Level HTML Sitemap
Tips for Usable Navigation
Obvious Navigation Elements
Naming Conventions that Match Intent
User & Usability Testing
Avoiding Common “Big Site”
Problems
Duplicate Content Issues
Rel Canonical Tags
Google Webmaster Tools
SEOmoz Web App
Scraping & Re-Publishing
Employ Absolute URLs
Absolute: <a href=“http://www.seomoz.org/blog”> anchor </a>
Relative: <a href=“../blog”> anchor </a>
C&D vs. Large, Credible Orgs that Scrape
Don’t Go Overboard w/ Bot Blocking
Incomplete Indexation
Track Referrals, not Site: Commands
Check Page “Types” that Don’t Receive Traffic
XML Sitemaps
Content Syndication
RSS Feeds
Twitter for Indexation
“Search Results” in the SERPs
Create Category “Landing” Pages
Remove Obvious Traces of “Search” on Landing Pages
Thin Content Issues
Bolster w/ UGC
Employ Scalable Content Production
Keep “Thin” Pages Out of the SERPs
Faceted Navigation
Rel Canonical Can Help
Use AJAX to Reload Pages
Watch Out for Google Crawling Javascript
Offer Facets Only to Logged-In / Cookied Users
Logged-In = 345 / Googlebot = 141
Overcoming Twitter’s
Cannibalization of the Link Graph
Way Back in 2007
Interesting content, blog posts & linkbait earned LOTS of links
Fast Forward to 2010
Not so many links (in comparison)
Fast Forward to 2010
But tons of social sharing (and tweets)
Are Pages Linking Out Less?
Via Linkscape’s web index
How Do We Earn Traditional Web
Links (the kind search engines love)?
Tactic #1:
Embeddable Content
Infographics
http://royal.pingdom.com/2010/02/24/google-facts-and-figures-massive-infographic/
+1,085 links from
356 root domains
Badges
http://www.zillow.com/webtools/badges/
Value-Add Widgets
Tactic #2:
Reference Material
Research & Data
http://www.time.com/time/health/article/0,8599,2014332,00.html
Awards + Rankings
http://www.moneycrashers.com/top-personal-finance-blogs/
Citation-Worthy Explanations
http://www.seomoz.org/blog/googles-algorithm-pretty-charts-math-stuff
Tactic #3:
Syndicated Content
Niches where content is low-supply/high-demand
http://perfectmarket.com/vault_index_summer_2010_infographic
ID sites that already syndicate
from someone else
Tactic #4:
Stick to Niches w/o Twitter Adoption
Find sectors where traditional
blogs/forums dominate conversation
http://www.eng-tips.com/
Many of these “old-school” sites have
followed external links (but don’t abuse these)
Note the nofollow
highlighting
Some “Web 2.0” Ones, Too
Impressive domain and
page level metrics
Tactic #5:
Friends, Partners, Customers &
Vendors
Friends + Family
Partners
Customers
http://www.seomoz.org/blog/headsmacking-tip-1-link-requests-in-order-confirmation-emails
Vendors
http://burton.kontain.com/evogear/
Tactic #6:
Twitter May Take Your Tweets, But
They’ll Never Take Your Content!
Turn Your Tweets Into Content
http://twournal.com/home
Turn Tweets from industry leaders into
content, too (this entices them to share)
http://www.seomoz.org/blog/bing-vs-google-prominence-of-ranking-elements
Tactic #7:
Can’t Beat ‘em? Join ‘em!
Twitter is (almost certainly) influencing (at least some) rankings
Lots of tweets, virtually no links,
but remarkable rankings
Twitter can send lots of direct traffic
You need to target the right Tweeters
http://twitaholic.com/top100/followers/
The Ones Who Send Real Traffic
http://mashable.com/2009/07/07/twitter-clickthrough-rate/
Use the “Rank”Not the “Grade”
http://twittergrader.com
Takeaways
• # of Followers DEFINITELY isn’t everything
• Tweeting heavily may not necessarily hurt the attention
people pay to your tweets
• Klout score probably isn’t great for predicting the reach of
an individual’s tweets/messages
• TwitterGrader Rank may be useful for predicting the
traffic you’d get from a tweeter
• Avg. CTR across 254 shared links = 1.17% (don’t feel
bad if only 1/100 followers clicks a link)
Making Analytics Actionable
To Make Analytics Actionable Always Ask:#1 - “Why am I measuring this?”
#2 – “What would I do if results were different?”
# of Visits Per Search Engine Over Time
Action: Measure against search engine market shares & volume to determine whether you’re making positive strides
# Pages Getting Search Referrals Over Time
Measure this number on a weekly/monthly basis
Action: Discover if indexation is an issue worth effort
This number sucks. Learn more about why at:www.seomoz.org/blog/indexation-for-seo-real-numbers-in-5-easy-steps
# of Keywords Sending Traffic from aSearch Engine over Time
Action: Determine if content additions are accretive and what drives growth/shrinkage in search traffic
Did rankings fall? Or is demand down?
Search Referral Analytics
# of Visits per Keyword
Action: Analyze top traffic drivers from a value perspective, check rankings for potential easy wins & get answers if traffic dips
“SEO Tools” is a big win and we could rank higher
First-Time vs. Returning Visits per Keyword
The keyword “SEO” leans toward first-time visits
Action: Determine value of reaching new visitors vs. converting branded users (focus efforts on the more valuable one)
This metric speaks to business strategy about converting existing fans vs. reaching new customer segments
Distribution of Keyword Referrals
Action: Discover strengths vs. opportunities (60-70% of traffic is typically in the long tail and it converts better)
Keyword Rankings
www.seomoz.org/rank-tracker
Action: Know if traffic spikes/dropoffs are from rankings, indexation or search demand shifts
Rankings and Traffic both Dropped
Page Two Rankings
Referrals from Page 2
Action: Identify low hanging fruit that can be optimized quickly
Could totally 301 this to www.opensiteexplorer.org
Engagement Analytics
Time on Site
Action: Compare to ROI metrics; if they correlate, improve on keywords/landing pages with low time on site
Average “upgrade to PRO” visitor spent a whopping 44 minutes on SEOmoz!
# of Page Views
Action: Depending on your metrics, a “sweet spot” of pages browsed often dictates a conversion event – optimize towards it
Average “upgrade to PRO” visitor visits 12X the pages of an average visitor
Repeat Visit Ratio
Action: Find what content/activities/referrers send engaged traffic and copy those while improving subpar pages
Sharing/Linking Activity
“Sharing Activity” Conversions
Action: Find patterns/sources that predict sharing activities (both content and CTAs) and make them testable conversion events
GA allows you to set custom actions as “goals” then filter, monitor and improve on these metrics
Latent Conversion Tracking
Removing Last-Click Attribution
Full Path Analysis
Initial Referrer
www.seomoz.org/blog/how-to-get-past-last-touch-attribution-with-google-analytics
ROI Analytics
Lifetime Customer Value
Cost of Acquisition
Return on Investment
ROI = CLTV - CAC
No. No. No.
Yes. Yes. Yes.
Always Be Asking “What’s the ROI?”
Get the ROI for every category (and subset)
New Research: Topic Modeling in the
Search Results
Methodology: LDA (Latent Dirichlet Allocation)
• Build an LDA model based on the English
language Wikipedia dataset (8mil+ pages)
• Generate scores for top 10 rankings across
several thousand search results
• Look at correlation of search rankings with
scores (in process)
Chance of word is because of a topic=
(Number of times the document already uses that topic a lot)X
(Number of times that word has been in that topic)
Simplified LDA Formula
Tool to Test it Out
http://www.seomoz.org/labs/lda
Tool to Test it Out
http://www.seomoz.org/labs/lda
Tool to Test it Out
We might need to work the “relevance”
of our content
http://www.seomoz.org/labs/lda
Interesting Takeaways
• There may be more to “on-page” optimization then
just using target keywords in the right places / ways
• Search engines keep saying “make relevant content”
– perhaps we can get more scientific and precise about
what “relevant” means
• Our LDA topic modeling work is still in its infancy.
Expect more data, correlations, etc. in weeks to come.
Q+A
Rand Fishkin, CEO & Co-Founder, SEOmoz
• Twitter: @randfish
• Blog: www.seomoz.org/blog
• Email: [email protected]