Top Banner
2009
168

A Guide to Log Analysis with Big Query

Jan 24, 2018

Download

Marketing

Dominic Woodman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Guide to Log Analysis with Big Query

2009

Page 2: A Guide to Log Analysis with Big Query
Page 3: A Guide to Log Analysis with Big Query
Page 4: A Guide to Log Analysis with Big Query

God it’s bad.

Page 5: A Guide to Log Analysis with Big Query
Page 6: A Guide to Log Analysis with Big Query
Page 7: A Guide to Log Analysis with Big Query
Page 8: A Guide to Log Analysis with Big Query
Page 9: A Guide to Log Analysis with Big Query
Page 10: A Guide to Log Analysis with Big Query
Page 11: A Guide to Log Analysis with Big Query
Page 12: A Guide to Log Analysis with Big Query
Page 13: A Guide to Log Analysis with Big Query
Page 14: A Guide to Log Analysis with Big Query
Page 15: A Guide to Log Analysis with Big Query
Page 16: A Guide to Log Analysis with Big Query

-$1.5 Billion

Page 17: A Guide to Log Analysis with Big Query
Page 18: A Guide to Log Analysis with Big Query

Why hasn’t Google seen the changes on my page?

Page 19: A Guide to Log Analysis with Big Query

How should I prioritise errors in Search Console?

Page 20: A Guide to Log Analysis with Big Query

Are my canonicals being respected?

Page 21: A Guide to Log Analysis with Big Query

Does Google think this page is important?

Page 22: A Guide to Log Analysis with Big Query
Page 23: A Guide to Log Analysis with Big Query
Page 24: A Guide to Log Analysis with Big Query
Page 25: A Guide to Log Analysis with Big Query
Page 26: A Guide to Log Analysis with Big Query
Page 27: A Guide to Log Analysis with Big Query
Page 28: A Guide to Log Analysis with Big Query

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Page 29: A Guide to Log Analysis with Big Query
Page 30: A Guide to Log Analysis with Big Query
Page 31: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

IP Address

Page 32: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Timestamp

Page 33: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Request type

Page 34: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepageHTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Homepage

Page 35: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Protocol

Page 36: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Status Code

Page 37: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Size of the page (in bytes)

Page 38: A Guide to Log Analysis with Big Query

What does a log look like?

123.65.150.10 - - [23/Aug/2010:03:50:59 +0000] "GET /my_homepage HTTP/1.1" 200 2262 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html))"

User Agent

Page 39: A Guide to Log Analysis with Big Query

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Page 40: A Guide to Log Analysis with Big Query

5 things2 3 4 51

Page 41: A Guide to Log Analysis with Big Query

1 Diagnose crawling &

indexation issues

2 3 4 51

Page 42: A Guide to Log Analysis with Big Query
Page 43: A Guide to Log Analysis with Big Query
Page 44: A Guide to Log Analysis with Big Query

Number of requests

Five folders Googlebot crawled the most

Page 45: A Guide to Log Analysis with Big Query

Five folders Googlebot crawled the most

Number of requests

Page 46: A Guide to Log Analysis with Big Query

% of Organic sessions VS % of crawl budget

Sessions Crawl budget

Page 47: A Guide to Log Analysis with Big Query

2 Prioritisation

2 3 4 51

Page 48: A Guide to Log Analysis with Big Query
Page 49: A Guide to Log Analysis with Big Query

example.com/article

Page 50: A Guide to Log Analysis with Big Query

Prioritizing

1

Full Print

Page 51: A Guide to Log Analysis with Big Query

example.com/article/full

Page 52: A Guide to Log Analysis with Big Query

example.com/article/print

Page 53: A Guide to Log Analysis with Big Query

Prioritizing

2

Page 54: A Guide to Log Analysis with Big Query

example.com/article/pdf

Page 55: A Guide to Log Analysis with Big Query

Prioritizing

3

Page 56: A Guide to Log Analysis with Big Query

Prioritizing

1

Full Print

Page 57: A Guide to Log Analysis with Big Query

3 Spot bugs &

view site health

2 3 4 51

Page 58: A Guide to Log Analysis with Big Query

Delayed errors with a limit of 1000

Page 59: A Guide to Log Analysis with Big Query
Page 60: A Guide to Log Analysis with Big Query

4 How important does Google

see parts of your site?

2 3 4 51

Page 61: A Guide to Log Analysis with Big Query

My SEO was as bad as my design

Page 62: A Guide to Log Analysis with Big Query

But at least my hair was better

Page 63: A Guide to Log Analysis with Big Query

teflsearch.com

Page 64: A Guide to Log Analysis with Big Query

teflsearch.com/job-results

Page 65: A Guide to Log Analysis with Big Query

teflsearch.com/job-results/country/china

Page 66: A Guide to Log Analysis with Big Query

teflsearch.com/jobadvert3455

Page 67: A Guide to Log Analysis with Big Query

Average number of times Googlebot crawled a template

Page 68: A Guide to Log Analysis with Big Query

1. teflsearch.com

2. teflsearch.com/job-results

3. teflsearch.com/job-results/country/china

4. teflsearch.com/job-advert3455

Page 69: A Guide to Log Analysis with Big Query

1. teflsearch.com

2. teflsearch.com/job-results

3. teflsearch.com/job-results/country/china

4. teflsearch.com/job-advert3455

Page 70: A Guide to Log Analysis with Big Query

teflsearch.com/job-results

Page 71: A Guide to Log Analysis with Big Query

Average number of times Googlebot crawled a template

35%

Page 72: A Guide to Log Analysis with Big Query

5 How fresh does it think your

content is?

2 3 4 51

Page 73: A Guide to Log Analysis with Big Query

bit.ly/moz-fresh

Page 74: A Guide to Log Analysis with Big Query

Average number of times a page template is crawled by Googlebot

Page 75: A Guide to Log Analysis with Big Query

●Improve our internal linking●Build trust with last modified date in

sitemap

Page 76: A Guide to Log Analysis with Big Query

2 3 4 51

Page 77: A Guide to Log Analysis with Big Query

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Page 78: A Guide to Log Analysis with Big Query
Page 79: A Guide to Log Analysis with Big Query
Page 80: A Guide to Log Analysis with Big Query
Page 81: A Guide to Log Analysis with Big Query

Talk to a developer and ask for information

Page 82: A Guide to Log Analysis with Big Query

Are all the logs in one place?

Page 83: A Guide to Log Analysis with Big Query

Hi x

I’m {x} from {y} and we’ve been asked to do some log analysis to understand better how Google is behaving on the website and I was hoping you could help with some questions about the log set-up (as well as with getting the logs!).

What we’d ideally like is 3-6 months of historical logs for the website. Our goal is look at all the different pages search engines are crawling on our website, discover where they’re spending their time, the status code errors they’re finding etc.

There are also some things that are really helpful for us to know when getting logs.

Do the logs have any personal information in?

We’re just concerned about the various search crawler bots like Google and Bing, we don’t need any logs from users, so any logs with emails, or telephone numbers etc. can be removed.

Do you have any sort of caching which would create separate sets of logs?

If there is anything like Varnish running on the server, or a CDN which might create logs in different location to the rest of your server? If so then we will need those logs as well as just those from the server. (Although we’re only concerned about a CDN if it’s caching pages, or serving from the same hostname; if you’re just using Cloudflare for example to cache external images then we don’t need it).

Are there any sub parts of your site which log to a different place?

Have you got anything like an embedded Wordpress blog which logs to a different location? If so then we’ll need those logs as well.

Do you log hostname?

It’s really useful for us to be able to see hostname in the logs. By default a lot of common server logging set-ups don’t log hostname, so if it’s not turned on, then it would be very useful to have that turned on now for any future analysis.

Is there anything else we should know?

Best,

{x}

Email for a developer

Page 84: A Guide to Log Analysis with Big Query

So we might have something that looks like this

Page 85: A Guide to Log Analysis with Big Query

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Page 86: A Guide to Log Analysis with Big Query
Page 87: A Guide to Log Analysis with Big Query
Page 88: A Guide to Log Analysis with Big Query
Page 89: A Guide to Log Analysis with Big Query

BigQuery

Page 90: A Guide to Log Analysis with Big Query
Page 91: A Guide to Log Analysis with Big Query

BigQuery

Page 92: A Guide to Log Analysis with Big Query

Google’s online database for data analysis.

Page 93: A Guide to Log Analysis with Big Query

1. Ask powerful questions

2. Repeatable

3. Scaleable

4. Combine with crawl data

5. Easy to set-up

6. Easy to learn

What do we want from analysing our logs?

Page 94: A Guide to Log Analysis with Big Query
Page 95: A Guide to Log Analysis with Big Query
Page 96: A Guide to Log Analysis with Big Query
Page 97: A Guide to Log Analysis with Big Query
Page 98: A Guide to Log Analysis with Big Query
Page 99: A Guide to Log Analysis with Big Query

9,000,000 rows of data for 2 months.

400 - 800 queries

Page 100: A Guide to Log Analysis with Big Query
Page 101: A Guide to Log Analysis with Big Query

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Page 102: A Guide to Log Analysis with Big Query

Format the logs so we can import them into BigQuery

Separate the Googlebot logs from all the other logs

Page 103: A Guide to Log Analysis with Big Query

Screaming Frog Log Analyser

Code something

Page 104: A Guide to Log Analysis with Big Query

Screaming Frog Log Analyser

Page 105: A Guide to Log Analysis with Big Query
Page 106: A Guide to Log Analysis with Big Query

Code something

Page 107: A Guide to Log Analysis with Big Query

bit.ly/logs-code

Page 108: A Guide to Log Analysis with Big Query

What can you do with logs?

PART 1: THE WHY

Getting logs

Analysing Logs

Processing Logs

PART 2: THE HOW

Page 109: A Guide to Log Analysis with Big Query

Our data in BQ

Page 110: A Guide to Log Analysis with Big Query

We make sure we got what we wanted

Page 111: A Guide to Log Analysis with Big Query

THE QUESTION: What is the total number of requests

Googlebot makes each day to our site?

Page 112: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTtimestamp

FROM[mydata.log_analysis]

Page 113: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTtimestamp

FROM[mydata.log_analysis]

Page 114: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp)

FROM[mydata.log_analysis]

Page 115: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp)

FROM[mydata.log_analysis]

Page 116: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp) as date

FROM[mydata.log_analysis]

Page 117: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp) as date

FROM[mydata.log_analysis]

Page 118: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp) as date,count(*)

FROM[mydata.log_analysis]

Page 119: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp) as date,count(*)

FROM[mydata.log_analysis]

GROUP BYdate

Page 120: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp) as date,count(*) as number_of_requests

FROM[mydata.log_analysis]

GROUP BYdate

Page 121: A Guide to Log Analysis with Big Query

Our first SQL query

SELECTDATE(timestamp) as date,count(*) as number_of_requests

FROM[mydata.log_analysis]

GROUP BYdate

Page 122: A Guide to Log Analysis with Big Query

Comparing logs to GSC crawl volume

Number of requests

Page 123: A Guide to Log Analysis with Big Query

Run queries

Find something weird

Go look at crawl & website

Page 124: A Guide to Log Analysis with Big Query

Our data in BQ

Page 125: A Guide to Log Analysis with Big Query

1 Diagnose crawling &

indexation issues

Page 126: A Guide to Log Analysis with Big Query

2 Prioritisation

Page 127: A Guide to Log Analysis with Big Query

3 Spot bugs &

view site health

Page 128: A Guide to Log Analysis with Big Query

4 How important does Google

see parts of your site?

Page 129: A Guide to Log Analysis with Big Query

5 How fresh does it think

your content is?

Page 130: A Guide to Log Analysis with Big Query

1 Diagnose crawling &

indexation issues

4 How important does Google

see parts of your site?

Page 131: A Guide to Log Analysis with Big Query

What are the top 20 URLs crawled by

Google over our logs?

Page 132: A Guide to Log Analysis with Big Query

Login is my top crawled page and then search?

Page 133: A Guide to Log Analysis with Big Query

What are the top 20 page_path_1 folders

crawled by Google over our logs?

Page 134: A Guide to Log Analysis with Big Query

Location folders are taking more than 70% of my budget

Page 135: A Guide to Log Analysis with Big Query

Getting data by the day

Page Number of Googlebot Requests

page1 200,000

page2 120,000

Page 136: A Guide to Log Analysis with Big Query

Number of Googlebot requests day by day

Page 137: A Guide to Log Analysis with Big Query

3 Spot bugs &

view site health

Page 138: A Guide to Log Analysis with Big Query

How many of each status code does

Google find per day over our logs?

Page 139: A Guide to Log Analysis with Big Query

Number of Googlebot requests day by day

Page 140: A Guide to Log Analysis with Big Query

What are most requested 404 URLs by

Googlebot over the past 30 days?

Page 141: A Guide to Log Analysis with Big Query

Boy does it want that ad-tech snippet

Page 142: A Guide to Log Analysis with Big Query

5 How fresh does it think your

content is?

Page 143: A Guide to Log Analysis with Big Query

How many times on average is each page

in a page template crawled a day?

Page 144: A Guide to Log Analysis with Big Query

Average number of times a page template is crawled by Googlebot

Page 145: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

Page 146: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Page 147: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

Page 148: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Page 149: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Which are most crawled parameters on the website?

Page 150: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Which are most crawled parameters on the website?

How often are the most visited parameters crawled each day?

Page 151: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Which are most crawled parameters on the website?

How often are the most visited parameters crawled each day?

Which directories have the most 301 & 404 error codes?

Page 152: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Which are most crawled parameters on the website?

How often are the most visited parameters crawled each day?

Which directories have the most 301 & 404 error codes?

Which pages are crawled with parameters and without parameters?

Page 153: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Which are most crawled parameters on the website?

How often are the most visited parameters crawled each day?

Which directories have the most 301 & 404 error codes?

Which pages are crawled with parameters and without parameters?

Which pages are only partly downloaded?

How many hits does each section get, when the sections are classified in an

external dataset?

Page 154: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Which are most crawled parameters on the website?

How often are the most visited parameters crawled each day?

Which directories have the most 301 & 404 error codes?

Which pages are crawled with parameters and without parameters?

Which pages are only partly downloaded?

How many hits does each section get, when the sections are classified in an

external dataset?

What percentage of a directory was crawled over the past 30 days?

Page 155: A Guide to Log Analysis with Big Query

How long does it take for a page to be discovered after being published?

What are the top 20 combinations of page_path_1 & path_path_2 folders

crawled by Google over the time period of our logs?

Which pages have requests from Googlebot, which don’t appear in our crawl?

What are the top non-canonical pages being crawled?

Which are most crawled parameters on the website?

How often are the most visited parameters crawled each day?

Which directories have the most 301 & 404 error codes?

Which pages are crawled with parameters and without parameters?

Which pages are only partly downloaded?

How many hits does each section get, when the sections are classified in an

external dataset?

What percentage of a directory was crawled over the past 30 days?

What are the total number of requests across two different time periods?

Page 156: A Guide to Log Analysis with Big Query

That’s a lot of questions

Page 157: A Guide to Log Analysis with Big Query

bit.ly/logs-resource

Page 158: A Guide to Log Analysis with Big Query

bit.ly/logs-resource

Page 159: A Guide to Log Analysis with Big Query

bit.ly/logs-resource

Page 160: A Guide to Log Analysis with Big Query

bit.ly/logs-resource

Page 161: A Guide to Log Analysis with Big Query

In Summary

Page 162: A Guide to Log Analysis with Big Query

This is the thing you’re probably not doing

Page 163: A Guide to Log Analysis with Big Query
Page 164: A Guide to Log Analysis with Big Query
Page 165: A Guide to Log Analysis with Big Query
Page 166: A Guide to Log Analysis with Big Query

bit.ly/logs-resource

@dom_woodman

Page 167: A Guide to Log Analysis with Big Query
Page 168: A Guide to Log Analysis with Big Query

bit.ly/logs-resource

@dom_woodman