Top Banner
Optimizing Unstructured Data
92

Optimizing Unstructured Data

Jan 07, 2017

Download

Marketing

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimizing Unstructured Data

Optimizing Unstructured Data

Page 2: Optimizing Unstructured Data

@ajkohn

@SEMpdx

#SearchFest

Page 3: Optimizing Unstructured Data

My name is AJ Kohn

Page 4: Optimizing Unstructured Data

Blind Five Year Old Since 2007

Page 5: Optimizing Unstructured Data

Making the complex simple

Page 6: Optimizing Unstructured Data

Semantic Search

Page 7: Optimizing Unstructured Data

We have a problem

Page 8: Optimizing Unstructured Data

Ugh, as if!

Page 9: Optimizing Unstructured Data
Page 10: Optimizing Unstructured Data

WHAT?!

Page 11: Optimizing Unstructured Data
Page 12: Optimizing Unstructured Data

WHAT?!

Page 13: Optimizing Unstructured Data
Page 14: Optimizing Unstructured Data

Semantic search is about understanding meaning

Page 15: Optimizing Unstructured Data

OKAY!

Page 16: Optimizing Unstructured Data

OKAY!

Page 17: Optimizing Unstructured Data

Context

Page 18: Optimizing Unstructured Data

Context matters

Page 19: Optimizing Unstructured Data

Context matters

Page 20: Optimizing Unstructured Data

Natural Language

Processing

Page 21: Optimizing Unstructured Data

Finding all expressions that refer to the same entity in a text

Coreference Resolution

Part of Speech (POS) Tagging

Assign a part of speech to each word in a text

Page 22: Optimizing Unstructured Data

The word quiet isn’t spelled wrong but Google knew that I probably meant to write quite awesome instead

Page 23: Optimizing Unstructured Data

Machine learning

Page 24: Optimizing Unstructured Data

Making predictions based on patterns and rules from prior data

Page 25: Optimizing Unstructured Data
Page 26: Optimizing Unstructured Data

Google is better at getting meaning from text because

of access to more data

Page 27: Optimizing Unstructured Data
Page 28: Optimizing Unstructured Data

Entities

Page 29: Optimizing Unstructured Data

Letters and Words

Page 30: Optimizing Unstructured Data

Things

Page 31: Optimizing Unstructured Data

“New York” hasPopulation: 8.046 Million

hasPointsofInterest: Empire State Building

hasAddress: 350 5th Avenue hasHeight: 1,250 feet

Page 32: Optimizing Unstructured Data

The Knowledge Graph

Page 33: Optimizing Unstructured Data

Connections and relationships between entities and documents

Page 34: Optimizing Unstructured Data

Named Entity Recognition (NER)

Page 35: Optimizing Unstructured Data

One size doesn’t fit all

Page 36: Optimizing Unstructured Data

Context-Dependent Fine-Grained Entity Type Tagging

Page 37: Optimizing Unstructured Data

Not just any entities but salient entities

Page 38: Optimizing Unstructured Data

66 entities on a page and less than 5% are salient

http://bit.ly/bigdealentities

Page 39: Optimizing Unstructured Data

How do you train a machine learning model to

identify salient entities?

Page 40: Optimizing Unstructured Data

Hello McFly!

Page 41: Optimizing Unstructured Data

Word up

Page 42: Optimizing Unstructured Data

Word to your mother

Page 43: Optimizing Unstructured Data

Words

Page 44: Optimizing Unstructured Data

“Keywords don’t matter anymore”

Page 45: Optimizing Unstructured Data

Ice Bear cried, but just inside

Page 46: Optimizing Unstructured Data

I love structured data but optimizing unstructured data is far more powerful

Page 47: Optimizing Unstructured Data

Text on the page is more important now

Page 48: Optimizing Unstructured Data

Words = Entities ^ Context ^ Meaning

Page 49: Optimizing Unstructured Data

We can turn unstructured content into structured data

Page 50: Optimizing Unstructured Data

How much do you trust Google?

How much do you trust Google?

Page 51: Optimizing Unstructured Data

Stop writing for people and start writing for

search engines http://bit.ly/focusedwriting

Page 52: Optimizing Unstructured Data
Page 53: Optimizing Unstructured Data

28%

Page 54: Optimizing Unstructured Data

Most users don’t read but skim and scan instead

http://bit.ly/usersdontread

Page 55: Optimizing Unstructured Data

First you looked here

Then here

Page 56: Optimizing Unstructured Data

A penny for a paragraph return

Page 57: Optimizing Unstructured Data

Mirroring

Page 58: Optimizing Unstructured Data

Not only do we mirror body language we seek it out when searching

Page 59: Optimizing Unstructured Data
Page 60: Optimizing Unstructured Data

Keyword rich text and subheads allow users to

resume reading at any time

Page 61: Optimizing Unstructured Data

Keyword is not a four letter word

Page 62: Optimizing Unstructured Data

Better to you query syntax call it

Page 63: Optimizing Unstructured Data

But what about user delight?

Page 64: Optimizing Unstructured Data

Could you not

Page 65: Optimizing Unstructured Data

Task Completion > Aesthetics

Page 66: Optimizing Unstructured Data

Our job is to reduce friction

Page 67: Optimizing Unstructured Data

After writing your content go back and find where you can replace pronouns with nouns

Remember that readers won’t often ‘see’ these nouns but will use them as visual signposts

Page 68: Optimizing Unstructured Data

“It’s such a gorgeous work of art”

“Lobster and Cat is a beautiful painting”

ArtworkType: painting ArtworkTitle: Lobster and Cat

hasArtist: Pablo Picasso

Page 69: Optimizing Unstructured Data
Page 70: Optimizing Unstructured Data

Intent

Page 71: Optimizing Unstructured Data

Google may better understand the meaning of my query but do they know why I’m searching?

Page 72: Optimizing Unstructured Data

Why are they really searching?

Page 73: Optimizing Unstructured Data

Why are they really searching?

Common Problems with the Eureka 4870

Eureka 4870 Troubleshooting Tips

Local Vacuum Cleaner Repair Shops

Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner

Page 74: Optimizing Unstructured Data

Why are they really searching?

Common Problems with the Eureka 4870

Eureka 4870 Troubleshooting Tips

Local Vacuum Cleaner Repair Shops

Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner

Page 75: Optimizing Unstructured Data

Our job is to decode the intent from the query syntax

http://bit.ly/aggregatingintent

Page 76: Optimizing Unstructured Data

Target the keyword

Optimize the intent

Page 77: Optimizing Unstructured Data

What are we really talking about?

Page 78: Optimizing Unstructured Data

This is a factbox triggered by entities and

the Knowledge Graph

Page 79: Optimizing Unstructured Data

This answerbox is triggered by

semi-structured data

Page 80: Optimizing Unstructured Data

This answerbox is triggered by specific

patterns of text

Page 81: Optimizing Unstructured Data

Answerbox triggered by patterns of text and

specific understanding

Page 82: Optimizing Unstructured Data

Answerbox triggered by patterns of text and

specific understanding

Page 83: Optimizing Unstructured Data

Answerbox triggered by patterns of text and

semi-structured data

Page 84: Optimizing Unstructured Data

Answerbox triggered by patterns of text and

specific understanding

Page 85: Optimizing Unstructured Data

Game's the same, just got more fierce

Page 86: Optimizing Unstructured Data

Skate to where the puck is going to be, not to where it has been

Page 87: Optimizing Unstructured Data

The Link Graph

Page 88: Optimizing Unstructured Data

The Link Graph +

Scored Entities

<entity A>

<entity B>

<entity C> <entity B>

<entity C>

<entity A>

<entity A>

<entity D>

<entity B>

<entity D>

Page 89: Optimizing Unstructured Data

Entity authority could flow through links

similar to anchor text

Page 90: Optimizing Unstructured Data

TL;DL

Page 91: Optimizing Unstructured Data

We can help Google to find structure, entities and meaning in our content

The easier we make it, the more likely we are to

satisfy robots and humans

Page 92: Optimizing Unstructured Data

AJ Kohn Owner, Blind Five Year Old www.blindfiveyearold.com [email protected]

@ajkohn