Top Banner
Text Analytics Applied Seth Grimes Alta Plana Corporation @sethgrimes 2 nd LIDER roadmapping workshop – Madrid May 8, 2014
28

Text Analytics Applied (LIDER roadmapping presentation)

Aug 11, 2014

Download

Data & Analytics

Seth Grimes

Presentation to the May 8 2014 LIDER roadmapping workshop in Madrid
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

Seth GrimesAlta Plana Corporation

@sethgrimes

2nd LIDER roadmapping workshop – MadridMay 8, 2014

Page 2: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

2

“Organizations embracing text analytics all report having an epiphany moment when they suddenly knew more than before.” -- Philip Russom, the Data Warehousing Institute,

2007http://tdwi.org/articles/2007/05/09-what-works/bi-search-and-text-

analytics.aspx

Page 3: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

3

Page 4: Text Analytics Applied (LIDER roadmapping presentation)

Document input and processing

Knowledge handling is key

Desk Set (1957): Computer engineer Richard Sumner (Spencer Tracy) and television network librarian Bunny Watson (Katherine Hepburn) and the "electronic brain" EMERAC.Hans Peter Luhn

“A Business Intelligence System”IBM Journal, October 1958

Page 5: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

5

Statistics and semanticsText analytics involves statistical

characterization and semantic understanding of text-derived features –Named entities: people, companies, places, etc.Pattern-based entities: e-mail addresses, phone

numbers, etc.Concepts: abstractions of entities.Facts and relationships.Events.Concrete and abstract attributes (e.g., “expensive”

& “comfortable”) including measure-value pairs.Subjectivity in the forms of opinions, sentiments,

and emotions: attitudinal data.– applied to business ends.

Page 6: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

6

SourcesIt’s a truism that 80% of enterprise-relevant

information originates in “unstructured” form:E-mail and messages.Web pages, online news & blogs, forum postings,

and other social media.Contact-center notes and transcripts.Surveys, feedback forms, warranty claims.Scientific literature, books, legal documents....

Non-text “unstructured” content?ImagesAudio including speechVideo

Value derives from patterns.

Page 7: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

7

ValueWhat do we do with information online, on-social,

and in the enterprise?1. Post/Publish, Manage, and Archive.2. Index and Search.3. Categorize and Classify according to

metadata & contents.4. Extract and Analyze.

Page 8: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

8

Semantics, analytics, and IRText analytics generates semantics to bridge

search, BI, and applications, enabling next-generation information systems.

Search BI/Big Data

Applica-tions

Search based applications (search + text + apps)

Information access (search + analytics)

Synthesis (text + BI)/(big data)

Text analytics (inner circle)

Semantic search (search + text)

NextGen CRM, EFM, MR, marketing, apps…

Page 9: Text Analytics Applied (LIDER roadmapping presentation)

New York Times,September 8, 1957

Page 10: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

10

http://open.blogs.nytimes.com/2012/02/16/rnews-is-here-and-this-is-

what-it-means/

<div itemscope itemtype="http://schema.org/Organization">  <span itemprop="name">Google.org (GOOG)</span>

Contact Details:  <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress">    Main address:      <span itemprop="streetAddress">38 avenue de l'Opera</span>      <span itemprop="postalCode">F-75002</span>      <span itemprop="addressLocality">Paris, France</span>    ,  </div>    Tel:<span itemprop="telephone">( 33 1) 42 68 53 00 </span>,    Fax:<span itemprop="faxNumber">( 33 1) 42 68 53 01 </span>,    E-mail: <span itemprop="email">secretariat(at)google.org</span></div>http://schema.org/Organization

Structure matters

http://img.freebase.com/api/trans/raw/m/02dtnzv

http://www.cambridgesemantics.com/semantic-university/semantic-search-and-the-semantic-web

Page 11: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

11

Exploratory analysis, synthesis

Decisive Analyticshttp://www.dac.us/

Page 12: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

12

http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html

A big data analytics architecture (example)

Page 13: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

13

ApplicationsSynthesis is cool, but let’s take a step back…Text analytics has applications in:

Intelligence & law enforcement.Life sciences & clinical medicine.Media & publishing including social-media analysis and contextual advertizing.Competitive intelligence.Voice of the Customer: CRM, product management & marketing.Public administration & policy.Legal, tax & regulatory (LTR) including compliance.Recruiting.

Page 14: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

14

Sentiment analysisA specialization, of relevance to:

Brand/reputation management.Customer experience management (CEM).Competitive intelligence.Survey analysis (EFM).Market research.Product design/quality.Trend spotting.

Page 15: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

15

http://altaplana.com/TA2014

Page 16: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

16

Military/national security/intelligenceLaw enforcement

Intellectual property/patent analysisFinancial services/capital markets

Product/service design, quality assurance, or warranty claims

OtherInsurance, risk management, or fraud

E-discoveryLife sciences or clinical medicine

Online commerce including shopping, price intel-ligence, reviews

Content management or publishingCustomer /CRM

Search, information access, or Question Answer-ing

Competitive intelligenceBrand/product/reputation management

Research (not listed)

Voice of the Customer / Customer Experience Management

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

5%6%

8%9%

10%11%

13%14%15%

16%25%

27%29%

33%38%38%

39%

What are your primary applications where text comes into play?

Page 17: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

17

Voice of the CustomerText analytics is applied to improve customer

service and boost satisfaction and loyalty.Analyze customer interactions and opinions –

• E-mail, contact-center notes, survey responses.• Forum & blog posting and other social media.

– to – • Address customer product & service issues.• Improve quality.• Manage brand & reputation.

Assessment of qualitative information from text helps users – • Gain feedback on interactions.• Assess customer value.• Understand root causes.• Mine data for measures such as churn likelihood.

Page 18: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

18

Online commerceText analytics is applied for marketing, search

optimization, competitive intelligence.Analyze social media and enterprise feedback to

understand the Voice of the Market: • Opportunities• Threats• Trends

Categorize product and service offerings for on-site search and faceted navigation and to enrich content delivery.

Annotate pages to enhance Web-search findability, ranking.

Scrape competitor sites for offers and pricing.Analyze social and news media for competitive

information.

Page 19: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

19

E-Discovery and complianceText analytics is applied for compliance, fraud and

risk, and e-discovery.Regulatory mandates and corporate practices

dictate –• Monitoring corporate communications• Managing electronic stored information for

production in event of litigationSources include e-mail (!!), news, social mediaRisk avoidance and fraud detection are key to

effective decision making• Text analytics mines critical data from unstructured

sources• Integrated text-transactional analytics provides rich

insights

Page 20: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

20

insurance claims or underwriting notes

video or animated images

photographs or other graphical images

field/intelligence reports

patent/IP filings

text messages/instant messages/SMS

Web-site feedback

chat

contact-center notes or transcripts

online reviews

Facebook postings

customer/market surveys

news articles

Twitter, Sina Weibo, or other microblogs

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%5%5%5%5%

7%9%

11%11%

12%12%12%13%

16%19%

20%20%

22%26%

31%31%

32%36%

37%38%

42%43%

46%

What textual information are you analyzing or do you plan to analyze?

Page 21: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

21

Web-site feedback

social media not listed above

chat

employee surveys

contact-center notes or transcripts

e-mail and correspondence

online reviews

scientific or technical literature

Facebook postings

on-line forums

customer/market surveys

comments on blogs and articles

news articles

blogs (long form) including Tumblr

Twitter, Sina Weibo, or other microblogs

0% 10% 20% 30% 40% 50% 60% 70%

16%

19%

20%

20%

22%

26%

31%

31%

32%

36%

37%

38%

42%

43%

46%

What textual information are you analyzing or do you plan to analyze?

201420112009

Page 22: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

22

Events

Semantic annotations

Other entities – phone numbers, part/product numbers, e-mail & street addresses, etc.

Metadata such as document author, publication date, title, headers, etc.

Concepts, that is, abstract groups of entities

Named entities – people, companies, geographic locations, brands, ticker symbols, etc.

Relationships and/or facts

Sentiment, opinions, attitudes, emotions, perceptions, intent

Topics and themes

0% 20% 40% 60% 80% 100%

Current; 33%

Current; 31%

Current; 34%

Current; 47%

Current; 51%

Current; 56%

Current; 47%

Current; 54%

Current; 66%

Expect; 21%

Expect; 24%

Expect; 23%

Expect; 23%

Expect; 28%

Expect; 25%

Expect; 33%

Expect; 28%

Expect; 22%

Do you currently need (or expect to need) to extract or analyze...

Page 23: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

23

export to Semantic Web formats (RDF, OWL, microformats, etc.)

media monitoring/analysis interface

supports data fusion / unified analytics

BI (business intelligence) integration

big data capabilities, e.g., via Hadoop/MapReduce

open source

sentiment scoring

low cost

document classification

ability to use specialized dictionaries, taxonomies, ontologies, or extraction rules

0% 10% 20% 30% 40% 50% 60% 70%

16%18%

22%25%

28%30%

32%33%33%

36%37%

40%41%

43%44%45%

53%53%54%

64%

What is important in a solution?

Page 24: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

24

Arabic

Chinese

French

Greek

Italian

Korean

Portuguese

Scandinavian or Baltic

Turkish or Turkic

Other Arabic script (including Urdu, Pashto, Farsi, Dari)

Other European or Slavic/Cyrillic

-10% 0% 10% 20% 30% 40% 50% 60%

10%1%

16%9%

36%34%

2%2%

18%7%

4%3%

13%8%7%

38%3%2%3%2%

5%9%

17%3%

28%7%

17%24%

2%10%

11%15%

8%4%

17%21%

3%20%

4%0%

1%1%

2%0%

CurrentWithin 2 years

Non-English language support?

Page 25: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

25

Software & platform optionsText-analytics options may be grouped in general

classes.• Installed text-analysis application, whether

desktop or server or deployed in-database.• Data mining workbench.• Hosted.• Programming tool.• As-a-service, via an application programming

interface (API).• Code library or component of a business/vertical

application, for instance for CRM, e-discovery, search.

Text analytics is frequently embedded in search or other end-user applications.

The slides that follow next will present leading options in each category except Hosted…

Page 26: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

26

User decision criteriaPrimary considerations include –

Adaptation or specialization: To a business or cultural domain, language, information type (e.g., text, speech, images) & source (e.g., Twitter, e-mail, online news).

By-user customization possibilities: For instance, via custom taxonomies, rules, lexicons.

Sentiment resolution: Aggregate, message, or feature level. (What features? Topics, coreferenced entities?)

What sentiment? Valence & what else? Emotion? Intent?

Outputs: E.g., annotated text, models, indicators, dashboards, exploratory data interfaces.

Usage mode: As-a-service (API), installed, or hosted/cloud.

Capacity: Volume, performance, throughput, latency.

Cost.

Page 27: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

2nd LIDER workshop

27

Linked Data Links?

Page 28: Text Analytics Applied (LIDER roadmapping presentation)

Text Analytics Applied

Seth GrimesAlta Plana Corporation

@sethgrimes

2nd LIDER roadmapping workshop – MadridMay 8, 2014