Top Banner
35

Sunz2013 annelies tjetjep

Oct 19, 2014

Download

Documents

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Sunz2013 annelies tjetjep
Page 2: Sunz2013 annelies tjetjep

ANALYTICS TO COMBAT GROWTH IN

UNSTRUCTURED TEXT DATA

ANNELIES TJETJEP

BUSINESS SOLUTION MANAGER, ANALYTICS

21ST FEBRUARY 2013

Page 3: Sunz2013 annelies tjetjep

TEXT ANALYTICS EXPLORATION, CATEGORISATION,

SENTIMENT ANALYSIS & INSIGHT

• Analytics in a World of Big Data

• What is Text Analytics?

• The SAS® Text Analytics Suite

• Text Mining in Action

• SAS® Social Media Analytics

• Questions?

Page 4: Sunz2013 annelies tjetjep

Where is the cat?

Page 5: Sunz2013 annelies tjetjep

ANALYTICS IN A

WORLD OF BIG DATA BREAKDOWN OF DATA USAGE

Source: Economist Intelligence Unit 2011 Report, Sponsored by SAS, 2011

We put nearly all of the data that is of

real value to good use

We probably leverage about half of our

valuable data

Vast quantities of useful data go

untapped

22%

53%

24%

Page 6: Sunz2013 annelies tjetjep

ANALYTICS IN A

WORLD OF BIG DATA BREAKDOWN OF DATA COLLECTION & ANALYSIS

Based on 450 responses from 109 respondents who report practicing Big Data analytics; 4.1 responses per respondent on average .

Source: TDWI Big Data Analytics Report, 4 th Quarter 2011, Philip Russom

Structured data ( tables, records )

Semi-structured data ( XML and similar standards )

Complex data ( hierarchical or legacy sources )

Event data ( messages, usually in real time )

Unstructured data ( human language, audio, video )

Web logs and click streams

Social media data ( blogs, tweets, social networks )

Other

Spatial data ( long / lat coordinates, GPS output )

Machine-generated data ( sensors, RFID, devices )

Scientific data ( astronomy, genomes, physics )

Page 7: Sunz2013 annelies tjetjep

WHAT IS TEXT

ANALYTICS? HONG KONG EFFICIENCY UNIT

The 1823 Call Centre of the Hong Kong government's Efficiency Unit acts as a

single point of contact for handling public inquiries and complaints on

behalf of many government departments.

1823 operates round-the-clock, including Sundays and public holidays. Each

year, it answers about 2.65 million calls and 98,000 e-mails, including inquiries,

suggestions and complaints.

Page 8: Sunz2013 annelies tjetjep

"By decoding the 'messages' through statistical and

root-cause analyses of complaints data, the

government can better understand the voice of the

people, and help government departments improve

service delivery, make informed decisions and

develop smart strategies. This in turn helps boost

public satisfaction with the government, and build a

quality city.”

- Efficiency Unit’s Assistant Director, W. F. Yuk

1823

HONG KONG

EFFICIENCY UNIT

PU

BL

IC

Develop a Compliant

Intelligence System that

uncovers the trends,

patterns and relationships

inherent in the complaints

BUSINESS ISSUE RESULTS

Hong Kong ICT Awards 2009

Grand Award Best Public Service Application

(Transformations)

Page 9: Sunz2013 annelies tjetjep

“The news hits so fast that you have to be changing

things very quickly. You have to be aware of what

you're writing about and the content that you're

tagging it to. If an indexing mistake happens, you

have to change it very quickly because reputations

are at stake.”

- Keith DeWeese, Director of Information Semantics

Management

• Better ad targeting and increased ad revenue

American news organization, reaching

more than 80% of US households TRIBUNE

COMPANY

ME

DIA

AN

D

PU

BL

ISH

ING

Needed to quickly and

accurately define and

categorize online content

relevant to readership

BUSINESS ISSUE RESULTS

Page 10: Sunz2013 annelies tjetjep

THE SAS® TEXT

ANALYTICS SUITE BOTH BRAINS OF THE EQUATION

Natural Language Processing

Taxonomic classification

Entity and concept extraction

Sentiment identification

Contextual and pattern recognition

Linguistically-based classification models

Statistical Analysis

Singular value decomposition

Flat and hierarchical clustering

Word relationship strength profiling

Dominant word pairs identification

Algorithmically-based predictive models

Page 11: Sunz2013 annelies tjetjep

Content

Categorization Text Mining

Sentiment

Analysis

THE SAS® TEXT

ANALYTICS SUITE

Page 12: Sunz2013 annelies tjetjep

EXPLORING &

DISCOVERING

INSIGHTS

SAS® TEXT MINER

1. Input text messages – e.g. twitter data, reports,

email, news, forum messages

3. Discover Topics – cluster documents of similar content

and describe them with important key words

2. Parse & explore Text Data –break down text and explore relationships

of key concepts such as persons, places, organizations…

Page 13: Sunz2013 annelies tjetjep

DISCOVERING

PATTERNS FOR

MODELLING

SAS® TEXT MINER

2. Parse Text Data and Discover Topics – Break down text into

structured data, group messages of similar content

3. Predictive Modeling with text data – text data input into models may provide

reliable info to predict outcome & behavior

Predict customers that are likely to accept the offer…

1. Input text messages – e.g. twitter data, reports,

email, news, forum messages

Customer

data

Page 14: Sunz2013 annelies tjetjep

TAXONOMIES

Hotel Brand

Service Check-in, Check Out, Staff, Concierge, etc

Accommodations Bed, shower, TV, room art, lighting, technology,

etc.

Amenities Fitness, pools, spa, parking, etc.

Food and Bev Pool bar, restaurant, room service, etc.

Experience Nightlife, ambience, relaxation, romantic,

etc.

Gaming Slots, tables, tournaments, etc.

Website Navigation, ease of reservations, etc.

Categories and sub-categories

Related Terms, Phrases, linguistic logic

Page 15: Sunz2013 annelies tjetjep

CONTENT

CATEGORISATION

SAS® ENTERPRISE CONTENT

CATEGORIZATION

Topic = Organized Crime

Categorization Taxonomy

1. Input text content – e.g. twitter data, reports, email,

news, forum messages

2. Parse content through categorization taxonomy – match and score messages/

documents to relevant categories

3. Output Results – e.g. each message/ document is now

associated with detailed category/ subcategories

Results are indexed or fed into existing systems

for search & analysis

Page 16: Sunz2013 annelies tjetjep

CONCEPT

EXTRACTION

SAS® ENTERPRISE CONTENT

CATEGORIZATION

Concept Taxonomy

1. Input text content – e.g. twitter data, reports, email,

news, forum messages

2. Parse content through concepts taxonomy – match

messages/ documents to extract concepts

3. Output Results – e.g. each message/ document is now

associated with a list of extracted concepts

Results are indexed or fed into existing systems

for search & analysis

Concepts • Locations – kitchen…

• Persons – John…

• Dates – Monday…

• Weapons – knife…

Page 17: Sunz2013 annelies tjetjep

SENTIMENT

EXTRACTION SAS® SENTIMENT ANALYSIS

Sentiment Taxonomy

2. Parse messages through Sentiment taxonomy –

match and score messages, and their details, for

sentiment polarity (e.g.

message is 80% positive)

3. Output Results – e.g. each message/ document and characteristics within the

document are now associated with a sentiment polarity score

This is negative

This is positive

This is negative

This is positive

This is positive

This is negative

Results are indexed or fed into existing systems

for search & analysis

4. Sentiments Reports – Results are easily analyzed against time period and/or

product features, drillable to see exact message

1. Input text content – e.g. twitter data, reports, email,

news, forum messages

Page 18: Sunz2013 annelies tjetjep

Taxonomies

WHOLE BRAIN

PROCESS

INTERACTIONS

A CALL CENTRE EXAMPLE

Initial taxonomy Exploration of linkages

Topic categorisation

Predictive Modelling

Caller1234:

i called them with a little issue that i

had on my car repair, and the original

representative blind transferred me

over to the second representative that

i spoke to, so when i got to the second

rep (John?), he had no idea who i am,

what my account was, what were the

reasons that i was calling.

i had to re-explain myself completely.

Concepts:

Call reason: car repair

Unhappy reasons: blind

transfer; re-explain

Other related staff: John

Classification

Sentiment

Page 19: Sunz2013 annelies tjetjep

TEXT MINING IN ACTION

Page 20: Sunz2013 annelies tjetjep

TEXT MINING IN SAS® ENTERPRISE MINER

Page 21: Sunz2013 annelies tjetjep

TEXT MINING PARSING

Page 22: Sunz2013 annelies tjetjep

TEXT MINING SYNONYMS & CONCEPT LINKING

Page 23: Sunz2013 annelies tjetjep

TEXT MINING SYNONYMS & CONCEPT LINKING

Page 24: Sunz2013 annelies tjetjep

TEXT MINING SYNONYMS & CONCEPT LINKING

Page 25: Sunz2013 annelies tjetjep

TEXT MINING CLUSTERING

Page 26: Sunz2013 annelies tjetjep

TEXT MINING CLUSTERING

Page 27: Sunz2013 annelies tjetjep

TEXT MINING PREDICTIVE MODELLING

Page 28: Sunz2013 annelies tjetjep

SOCIAL MEDIA ANALYTICS

A QUICK LOOK

Page 29: Sunz2013 annelies tjetjep

Social Media is

everywhere – it’s not

just Facebook and

Twitter.

• Your customers are there

talking about your brand.

• What are customers saying

about you and what impact

could that have on your

business?

Sources: The Conversation: Brian Solis and Jess3

Page 30: Sunz2013 annelies tjetjep

POWER SHIFT THE EMPOWERED CONSUMER

COMPANIES CONSUMERS

Page 31: Sunz2013 annelies tjetjep

SOLUTION

FRAMEWORK

Data Mining

Correlation & Forecasting

Text Mining Natural Language Processing

Taxonomies Influence

& Engagement

Customizable Sentiment

Analysis

Text Clusters & Segments

Collect

Clean

Integrate

Organize

Sample Online Sources

Classify & Segment

Mine & Forecast

Web Crawling Data Stores

Blog Data

Web Data

Online Reviews

Media Data

Call logs

Survey Data

Listen, Engage, & Leverage

SAS Media Portal

SAS Conversation

Center

SAS Media

Workbench

iPad & Android

apps

Page 32: Sunz2013 annelies tjetjep

TEXT ANALYTICS • ANALYTICS TO COMBAT GROWTH IN

UNSTRUCTURED TEXT DATA

• Data is “BIG” and growing

• Most data is in unstructured or semi-structured format

• Need for smarter ways of mining data: automation & analytics

• Need for whole-brained analysis of textual information

• SAS provides an end-to-end text analytics suite

• Power is now in the hands of the consumer

Page 33: Sunz2013 annelies tjetjep

QUESTIONS?

Page 34: Sunz2013 annelies tjetjep

THANK YOU

[email protected]

Page 35: Sunz2013 annelies tjetjep