Top Banner
London| 18–21 February Double trouble DC issues - Diagnosis & causes Kristjan Mar Hauksson Nordic eMarketing Director Internet Marketing @optimizeyourweb
20
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Duplicate Content Issues

London| 18–21 February

Double troubleDC issues - Diagnosis & causes

Kristjan Mar HaukssonNordic eMarketingDirector Internet Marketing@optimizeyourweb

Page 2: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

@optimizeyourweb

“- They ALL have some degree of Duplicate content problems – Every single site I have ever analyzed does!”

Mikkel DeMib

Page 3: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

@optimizeyourweb

“Duplicate content is in most cases due to the way CMS’ are set up …..or we might have a team of lazy content writers on our hands.”

Page 4: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

“Understand your content management system: Make sure you're familiar with how content is displayed on your website. Blogs, forums, and related systems often show the same content in multiple formats.”

@optimizeyourweb

Page 5: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

@optimizeyourweb

Diagnosis & Causes

Page 6: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Couple of easy to use “tools”

• virante.org/seo-tools/duplicate-content

• Xenu

• Zoom Search Engine

• Google (Search, Webmaster Tools, etc..)

• Manual testing

• Screaming Frog

More on: support.google.com/webmasters/bin/answer.py?hl=en&answer=66359

@optimizeyourweb

Page 7: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Using the site: command

• Site:yoursite.com

• This should show you how Google crawls your site and what it finds

• Does this site have 46,800 products and categories?

@optimizeyourweb

Page 8: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Another simple way to identify DC is to search

• Look at the content you have on your site, take something like a news headline and Google it

• This will in most cases show you how Google is crawling your site and what it finds

@optimizeyourweb

Page 9: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

.dk

.se

.no

.fr

.co.uk

Sample content leak

Page 10: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Page 11: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

@optimizeyourweb

Page 12: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Using Xenu

• If the site allows being crawled you can use Xenu to crawl it and then look at the information that comes out of it

• Arrange it and behold ….

@optimizeyourweb

Page 13: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Using Copyscape

• Copyscape was originally created to find “stolen” copy but works great when it comes to DC

Page 14: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Content ownership

• Websites are often developed on a DEV url, which is in many cases open, but only used for collaboration between developers and site owners, then somebody uses Google mail to share it or it is sniffed by a subdomain finder. Then content ownership can be an issue… for a long time.

@optimizeyourweb

Page 15: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

• A search for ‚ritzy bryan‘ gives 895.000 results

• When you click images... 5 of top 9 top are the photographers

• But the top two are not on his website

• Click on the image

• Click ‚Image details‘ and you get lots of similar images

• Scroll down and you get lots of plagiarizing websites

Image Plagiarism

Page 16: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

@optimizeyourweb

Diagnosis & Causes

Page 17: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

Frequent causes when starting a new site

• Firstly make sure that your dev.server is under lock and key – Close it when you are done

• If you are using something like a news or a product module over multiple sites, make sure that the ownership is clear

• Not all of our content creates duplicate content on your site – Scrapers can give you hell!

• Report plagiarism to Google as soon as you find it – take ownership.

@optimizeyourweb

Page 18: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

301, 404 – Default or not default and ….

• 404s that are not 404s – Things can go a bit crazy if not inserted properly on large commerce sites as an example

• WWW, Non-WWW & Default pages

• Query strings and session IDs

• Template content

• Boilerplate repetition, publishing stubs & similar content

• User generated duplicate (replica) content

@optimizeyourweb

Page 19: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

The mother of all checklists ;-)

• Take everything that is a likely cause and create a checklist and go through these items one by one and make sure they are in order

• This is all common sense stuff and there is so much information online. You should not have to do the same mistakes as those before you….

• Know your CMS before you start implementing it!

@optimizeyourweb

Page 20: Duplicate Content Issues

London| 18–21 February 2013 | #SESLON

@optimizeyourweb

Thank you