Top Banner
Moving from Unstructured Data into Structured Meanings and Data Stories Marshall Sponder WebMetricsGuru INC for NYU ITP CAMP 6-22-12
61

Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Sep 13, 2014

Download

Technology

my presentation to NYU ITP Camp on 6-22-12
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Moving from Unstructured Data into Structured Meanings and Data Stories

Marshall SponderWebMetricsGuru INC for

NYU ITP CAMP6-22-12

Page 2: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Marshall Sponder is the CEO/Founder of WebMetricsGuru Inc., a social solution design, social media analytics, web data analysis and SEO/SEM practice focusing on cutting edge market research and social media trend analysis.

He is the author of "Social Media Analytics: Effective Tools for Building, Interpreting, and Using Metrics", published by McGraw-Hill, 2012.

Marshall also teaches Social Media Analytics and Art at Rutgers University and UCI Irvine, Extension and is a frequent speaker at Analytics conferences internationally and in the United States.

Introduction – about me, besides being an ITP Camper….

Page 3: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Geo-located check-in data, somehow, got much harder to accurately capture, across the board, via listening platforms after last summer (but

it was always fragmentary, at best).Sysomos Map Query: 4sq.com/

Page 4: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Last Year did some data crunching using Radian6 and 4SQ check-ins – found context / story much easier to get via Geo-local data

My findings are that adding additional “dimensions” to the social data provides “context” that is often missing, because the social data is largely unstructured.

Also was able to look at “influencers” by the venues they habitually visited and their Twitter following.

Page 5: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms
Page 6: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Hitting People through multiple Channels, creates “meaning” and DejaVu.

NY Lottery online site

1. Pandora

2. NY Subway Train

Having cookies tracked across sites is probably doing something similar – but the idea is awareness, relevance and meaning are “created” by repetition across varied channels with in a certain frame of time.

Page 7: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Finding /Creating Meaning / Creating the Story using Social Media

Moment of Receptivity

At the moment of receptivity – your message, argument, proposition has a chance of being received and acted on. The story you create will be a mixture of what you wish to create, and what your recipients will make of it (how they will process it).

Page 8: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Some DataSome types Data (not all inclusive)

Printed / Written Data (ledgers, lists)

Learning/ Training Data (Education/University)

Financial, manufacturing, Legal and Legislative Data

Large and medium business/marketing silo data (business intelligence)

Search Engine and Web Analytics Data (structured and Page Based)

Social Media, Video, Audio and Geo Local (Mobile) Data

Big Data, including machine generated data (and Big Analytics)

Offline Data (verbal, observed) recorded or non-recorded

Unstructured Data

Structured Data

More work is req for unstructured data

Page 9: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

How much Unstructured Data is there?

Page 10: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Types of Unstructured Data (some of these types you deal with at ITP Camp)

Examples of Structured Data (Businesses, Governments and Educational Institutions have a lot of experience with this kind of data) - Databases - XML data - Data warehouses - Enterprise systems (CRM, ERP, etc) Examples of Unstructured Data (no one has a corner of this type of data, yet – everyone is struggling with it). - Excel spreadsheets (one can argue this point – as Excel can have structure too) - Word documents - Email messages - RSS feeds - Audio files - Video files - Social Media Data (tweets, posts, photos, likes, shares, Near Field, Geo-fensing) - Mobile Data (check-ins, SMS, etc)

Page 11: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Add a Plan to help Structure to Data• Identity (who are you identifying?)

• What (are you measuring?)

• Where (are you monitoring?)(have enough data?)

• When (is it happening?)

• Why (is it happening?)

You A business, non-profit, Gov, an official, Industry, etc.

Some one else (depends how you want to define this)

Check-ins, mentions, posts, clicks, pulse data, etc

Visits, Page views, Unique Visitors, etc (perhaps a specific audience type)

Behavioral and Attitudinal data – much harder – some ITP experiments seem to go here -

Social Media Channels Location (where) Venue Type Situation/mindset

Real Time / Asynchronous

Seasonal Specific event Time not defined

Exploratory (don’t know – trying to find out)

You know why, but you want to know how much, be more tactical, effect specific changes

Business Goals? Art Goals, Effect Changes,

Page 12: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Would be nice if Social Data could interface into something like Isadora, don’t know if anything that does that yet, or what the metaphors for the data would be (a good direction, though, if someone wants to take that on).

I suspect such an interface would lend itself to “Big Data”

Page 13: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Maybe this taxonomy would work (http://behaviorgrid.org/)... But you have code the verbatim manually – unless you can program machine

learning to do it for you (wont be that accurate, though)

Page 14: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

What Platforms Provide (not a complete list by any means)

• Geolocation (location, venue type, other friends)• Social – Sentiment, Volume, age/gender, some attempt at topic,

usually inadequate, text analytics• Web Analytics – Visit, Page view, Visitors (unique/new) Cookie

Location, pathing (on site only), correlation tools, search keywords, links (referrers), ecommerce tracking (on site only)

• Audience Measurement – via Ad Exchanges, online panels, demographics, psychographics, and geo-demographics.

• Census and Governmental data• Financial Data – Wall Street, • Market Research – Traditional – Forums, Polls, opinioned analysis

based on sampling (political polls, ie).• Market Research – New – Big Data – try to find out hidden

patterns (ie: people who fix their roofs have less car accidents and get cheaper car insurance, stuff like that).

Page 15: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

For ITP – Suggested Taxonomy• Author / Artist / Subject / Activity• Place (location)• Time (timeframe)• Type of Action / Behavior• Persona (this will have to be defined)• Medium (i.e.: GSM/Mobile, Projection, etc)• Subject / Area• Purpose (recreational, exploratory, consciousness

raising, etc)• Etc, etc, etc (these need to be further defined)

Page 16: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Two-Tiered Segmentation

Visitor Type

Product Directed

Category Directed

Early Stage Research

Discount Shopper

High-Ticket Buyer

Visitor Record

Prestige Giver

SMB Shopper

Brand Loyalist

Visit Types

Merging customer and visit type segmentation creates a two-tiered segmentation framework that becomes the core of our data model

Page 17: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Medium is the Message - McLuhan• How your measuring and viewing affects what you see, what you find.

Many ITP experiments are directly impacted by the method/medium being used. In emerging media, especially, due to “unstructured” aspect of it, tools shape the data (and insights).

• What tools or platforms are you using? What are tools are platforms can you use (are at your disposal)?

• What is your budget for the tools platforms and people?• Are you in control of the measurement process yourself, or are you

depending on others to execute it for you?• Do you have a framework to put all this data in? That’s pretty

important.

IMPLICATION: Choice of tool or platform profoundly shapes the results of your experiment or project

Page 18: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

What’s the Use Case? Pick One (or add a new use case)

TypeBehavioral

Page 19: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Use cases from a Tools/Platform perspective

Consumer Research

Listening for Insights

Rich Categorization

NLP (machine learning)

Social Media Coverage*

PR Monitoring & Support

Listening and Engagement

Traditional Media

Coverage*

Influencer Identification

Topic Categorization

Social CampaignsAutomating of the

engagement

Workflow

Operational Metrics

Low-Latency

Care of Gary Angel – Semphonic.com

Page 20: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Full Service

Examples:

Source: Semphonic.com

Not so much an issue for ITP, but many organizations end up buying the same data from multiple vendors (over and over) (something to avoid, if

you can)

Page 21: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Goal(s): Audience:

Location: Timing :

Vehicle (how your going to do it): Venues (where your going to do it):

Message (Call(S) to Action):

Product / Service / Program

Metrics/KPI’s

among

through/ with

ask fans and customers to

Regarding our

Where Success will be judged by

In some cases, a high level plan (similar to a 1 minute pitch) might help to add structure and meaning to what your going to try to do (even here at ITP Camp or ITP, in general)

Page 22: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Salvage the reputation of the Romanian 20th century composer, George Enescu

Classical music institutions, enthusiasts, and musicians alike

Ideally GeorgeEnescu.com A 6 month campaign period

Online videos, online networking, podcasts, musicological research

Personal blog, radio stations, YouTube, musicological conferences, etc.

Enescu’s art ought to be enjoyed and celebrated as the work of a deserving, 20th-century master

Program to promote the musicians and orchestras who wish to explore Enescu’s work

Popularity on Google Trends

New business connections and partnerships

Goal: Audience:

Location: Timing:

Vehicle: Venues:

Message:

Program:

Metrics/KPI’sNew visitors to website

Among

Through/ With

Ask fans and customers to

Regarding the

Success will be judged by

Youtube statistics

Example of a Student’s Goal – Resurrecting George Enescu’s Work

Page 23: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

And what is a Plan, Anyway?

Page 24: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

85

We’re drinking from the social media fire hose

Massive data to process and make sense of it all But … We Don’t Need to boil the ocean!

Page 25: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

New Solutions Lie in …• Adding additional dimensions to the data (i.e.:

time, place)• Adding Custom Taxonomies, Lexicons and data

mashups helps, if done well and cleanly• Customizing the source data feeds• Customizing Data Extraction from Pages Crawled• Defining what your goals are• Defining what, when, where and how your going to

accomplish your goals• Define your Key Performance Indicators that tell

you if you hit or missed your goal targets

Page 26: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Internet Abundant with Predictive Signals

Page 27: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Beyond Listening: Reinventing Social Media Monitoring

If a status update reaches a social network but no one sees it, does it exist?

Page 28: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Are people using the wrong solutions to

determine what people are saying?

Page 29: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

@listening as a use case … Why Bother??

“the problem with social is that there is so much data - there’s 40 or 50 data points that you can measure and you have to figure out whether they are important. Some of those measurements are fundamentally not important.

Page 30: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Pain!!

“I’m drowning in data and documents from the internet but I need actionable insights”

• Broad listening across the internet

• Focused on keyword matches

– Mentions of• Brand name “Starbucks”• Product names “Frappuccino”

• Produces valuable insights, but is exploratory in nature, as a result, it can not answer tactical questions and is not scalable.

Blogs

Social Networks

News Sites Trade Sites

Forums Press

Page 31: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

No ProcessSuccess Undefined90 % Unstructured

Time ConsumingHard to Scale

esp. at the beginning

Problems we all face with Social

Page 32: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

“Lens” approach using Boolean queries and saved

datasets don’t seem to work very well

Page 33: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

http://www.youtube.com/watch?v=4Y-SVxnVOv8

"housing solution"~2 AND "rhode island" AND "foreclosure", "road home program"~3 AND "foreclosure", "home loan modification"~4 AND "foreclosure", "jobless rate"~3 AND "foreclosure", "bankrupcy" AND "foreclosure" AND "housing" AND "obama", "rhode island housing"~3 AND "forclosure", "foreclosure prevention funds"~5, "bank foreclosures rhode island"~4 AND "obama", "selling house"~4 AND "foreclosure" AND "obama", "hardest hit fund"~4, "national foreclosure mitigation"~6, "homeowner stability initiative"~5 AND "obama", "roadhome program"~2, "hud homes rhode island"~3 AND "obama", "foreclosure settlement"~4 AND "25 billion"~2 AND "obama", "fannie mae freddie mac"~10 AND "foreclosure", "keeping people in their homes"~4

Radian6 Query on Foreclosures in Rhode Island

Monitoring has become too complex

Page 34: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

And we don’t get our “Pie in the Sky”

Page 35: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms
Page 36: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Recorded Future

Page 37: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Web is Loaded with EventsSilicon Valley executives head to Vail, Colo. next week for the annual Pacific Crest Technology Leadership Forum

The carrier may select partners to set up a new carrier as early as next month

“2010 is the year when Iran will kick out Islam. Ya Ahura we will.”

“... Dr Sarkar says the new facility will be operational by March 2014...”

Drought and malnutrition hinder next year’s development plans in Yemen...

“...opposition organizers plan to meet on Thursday to protest...”

“Excited to see Mubarak speak this weekend...”

“According to TechCrunch China’s new 4G network will be deployed by mid-2010”

“Strange new Russian worm set to unleash botnet on 4/1/2012...”

Page 38: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Recorded Future Architecture

70,000 Real-time Sources

3+ Billion Time-tagged Facts

100,000 future events/day

Page 39: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms
Page 40: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Baghdad Next Week - Google

Page 41: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Baghdad Next Week – Recorded Future

Page 42: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Baghdad Next Week – Recorded Future

Page 43: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Mobile and Tablets - Next three years Huge market segments still emerging

• Over 75% of businesses plan on deploying tablets by 2013• Revolutionizing health care delivery, on-site and mobile • Disrupting software engineering and user expectations

Page 44: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

VenueLabs – G

eo-Location AnalyticsActionable Intelligence gleamed from

LBS instead of exploratory insights from SMM

Topic Sentiment Influence

Engagement

They mess up here A LOT! If I wasn’t in a rush nor a coffee addict I would go somewhere else!

Traditional Insights Location Date /Time

Staff Working ManagersLocal Context Unit Sales

Nearby Competitors

New Insights

Page 45: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

VenueLabs solves Local Data Gap Example

They mess up here A LOT! If I wasn’t in a rush nor a coffee addict I would go somewhere else!

70% data is UV

Page 46: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Verified the Local Data Gap

The text of the verbatim don’t help much since they we can’t tell where this actually was taking place without looking at the additional short url and creating a context – which the software, today, usually isn’t able to do.

Page 47: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

New York Art Instance - VenueLabs

Page 48: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Most active Museums?

Page 49: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Local Data Analytics of Museums – adding location automatically makes info more

actionable (context)

Facebook & Twitter

Page 50: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Smoking Cessation Phases Smoking Cessation Patient Journey Stage 1

Behavioral: Cold TurkeyPatient Journey Stage 2

Behavioral: OtherPatient Journey Stage 3

Over The CounterPatient Journey Stage 4

RX

Side Effects of Smoking Cessation

Craving comfort food, nicotine, fear of weight gain, quitting and wondering how friends and family will view decision.

Respondents are actively seeking to quit smoking by fear the physical and psychological side effects.

Respondents in stage 3 are settling on a treatment option presenting the least side effects as possible or going cold turkey.

smoking cessation is the main choice at stage 4 (based on our listening) but many respondents are having problems staying on the regime due to side effects.

Choosing the right medication for Smoking Cessation

Respondents are looking for a way to stop smoking but are confused with options and asking for advice.

Patients are suffering the side effects associated with Nicorette, nicotine patches, smoking cessation or cold turkey

In Stage 3 Smoking Cessation decisions complicated by product bans for e-cigarettes and smoking cessation in some communities and occupations

In Stage 4 smoking cessation side effects are the main issue respondents have, with men appearing to do better with the treatment than woman. Negative press over side effects of smoking cessation are upsetting - making many rethink their decision to stop smoking,

Available Options for Smoking Cessation are Confusing

In Stage 1 respondents are seeking guidance on all the available treatment options and making a decision on which one(s) to try.

In Stage 2, the overwhelming choice of Smoking Cessation treatment option is Hypnosis, with second most popular treatment being Nicorette and then smoking cessation.

In Stage 3 use of Electronic Cigarettes followed by Nicorette gum as the most popular treatment according to our listening reports.

In Stage 4 Patients struggle with the side effects of smoking cessation treatment, itself. Some patients complete treatment successful but others do not and are dissatisfied with their progress.

Getting advice on the right treatment options for Smoking Cessation

Online respondents are going on blogs, twitter and forums looking for people who have experiences taking drugs for Smoking Cessation so they can get information on the right approach to take.

In Stage many side effects associated with each treatment are evident and respondents are grappling with which choice to make - often going with hypnosis first.

Patients in stage 3 have tried treatments and are sharing their experiences struggles and successes with Smoking Cessation.

In Stage 4 just about all the information on smoking cessation is negative, although that does not stop many patients from taking the drug, but many are stopping once they experience side effects.

Page 51: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Defining Key WordsPatient Journey Stage 1

Behavioral: Cold TurkeyPatient Journey Stage 2

Behavioral: OtherPatient Journey Stage 3

Over The CounterPatient Journey Stage 4

RX

"quit smoking"

"smoking cessation"

"cold turkey" AND "smoking" AND "quit on my own“

OR

“jotharmcoe”“smoking cessation”“mytimetoquit.com”“smoking cessation.com”“let’s quit together now”“get quit clinic”“qut-quit.com”

"counseling" AND "smoking"

"cutting back" AND "smoking"

"nicotine free-cigarettes"

"homeopathic remedies" AND "quit smoking" AND "stop smoking“

“hypnosis” AND “smoking” AND “-weed”

"embarrassed to" AND "doctor" AND "smoking"

"quit line" AND "smoking"

"herbal remedies" AND "quit smoking" AND "stop smoking"

"support group" AND "smoking"

"snus" AND "smoking"

"nicotrol“ (1 mention)

"e-cigarette" AND "-buy" AND "-"buying“

"stop smoking gum"

"" nasal sprayAND "smoking“(1 mention)

"nicoderm"

"lozenge" AND "smoking“

"patch" AND "smoking"

"inhaler AND "smoking"

"Nicolette“(strongest keyword)

"nicotine replacement therapy" AND "smoking"

"smoking cessation"

"buproban" AND "-order"

Page 52: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Location of Conversation for Each Stage of the Patient Journey

Stage 1 Stage 2

Stage 3 Stage 4

Behavioral: Cold Turkey Behavioral: Other

Over The Counter RX

Page 53: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Putting targeting into action RI Primary = Create The Story (Be The Story)?

RI District 1

80,120? Over 50

Members of Facebook over 49 years old in RI District 1 = 102,920 (caveat: there are a few zip-codes in both districts)

1. Hit potential voters in District 1 with issue targeted sponsored stories for AWARENESS only (expect little if any Clicks)

2. Blanket Zipcodes with mailing (post office now does this).

3. Use Venuelabs to sift checkin data and find voters –cross link to voting list when possible.

4. Categorize (data mining – persuadable?)5. Reach out / Community management – etc6. Set up tracking (i.e.; Campalyst – next slide)

Page 54: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Connecting Engagement To Conversions campalyst.com

Page 55: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Google Social Reports Cannot connect the dots to ROI (yet) though Campalyst, Can.

Page 56: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Google Social Reports Cannot connect the dots to ROI (yet) though Campalyst, Can.

Not enough information – Google cannot connect the dots back to the original post that generated the referral, but Campalyst does.

Page 57: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Campalyst ties cause and effect for Twitter and Facebook better than any other platform I’ve yet seen – marshall sponder – WebMetricsGuru.com

Page 58: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Campalyst can also find the brand advocates that generate the most traction and engagement for a brand or website.

Page 59: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Summary• The Future of Analytics is with

Actionable Data

• Actionable Data comes from adding contextual information and metadata in meaningful ways related to your business or organizational goals.

• You need a Plan (the right one) to execute, together with the metrics, audience, timing, venue, program /vehicle and KPI’s to succeed with Analytics of any kind.

Page 60: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Examples of Platforms (you can play with these later)Radian6 Basic

Sysomos Map

Brandwatch

Campalyst

Infinigraph

6dgree

Netbase

PeekAnalytics

Traackr

mPact

Venuelabs

Page 61: Unstructured data to structured meaning for nyu itp camp - 6-22-12 ms

Marshall SponderWebMetricsGuru INC.

www.webmetricsguru.comwww.smabook.com

[email protected]@webmetricsguru@smanalyticsbook

WebMetricsGuru.com

WebMetricsGuru INC.