audience selection April09 - web-docs.stern.nyu.eduweb-docs.stern.nyu.edu/old_web/emplibrary/Provost... · • premium display slots or remnants (e.g., on espn.com, etc.) • contextual

© Provost 2009

Audience Selection for OnAudience Selection for On--line Brand Advertising:line Brand Advertising:PrivacyPrivacy--friendly Social Network Targetingfriendly Social Network Targeting

Foster ProvostFoster Provost

withwith Brian Brian DalessandroDalessandro, Rod Hook, Rod Hook, , XiaohanXiaohan ZhangZhang, Alan Murray, Alan MurrayThis work conducted while the authors were at Media6°, Inc.

© Provost 2009

Privacy on-line?

• What information can firms know about you based on your on-line browsing behavior?

• Who you are? • Your demographics?• Psychographics?• Income? • Your medical conditions?• Where you live and your kids’ names? • Every page that you visit?

© Provost 2009

Privacy on-line? (cont.)

• However, much or all of the free stuff we like on the internet is there because of (the promise of) advertising dollars.

• Advertising dollars come in direct proportion to the ability tofind an appropriate audience for the message.

• Deciding whether you’re appropriate is aided by auxiliary data

• Where would we like firms to operate on the spectrum between the two unacceptable extremes:

“We can do whateverwe want with whateverdata we can get ourhands on.”

“You can’t do anythingwith MY data!”

© Provost 2009

• Are there points between the extremes that give us acceptable tradeoffs between “privacy” and efficacy?

• What about anonymization/de-identification?

“We can do whateverwe want with whateverdata we can get ourhands on.”

“You can’t do anythingwith MY data!”

(cf. Duncan JSM 2002)

© Provost 2009

Audience Selection for OnAudience Selection for On--line Brand Advertising:line Brand Advertising:PrivacyPrivacy--friendly Social Network Targetingfriendly Social Network Targeting

Foster ProvostFoster Provost

withwith Brian Brian DalessandroDalessandro, Rod Hook, Rod Hook, , XiaohanXiaohan ZhangZhang, Alan Murray, Alan MurrayThis work conducted while the authors were at Media6°, Inc.

© Provost 2009

On-line Brand Advertising

• Brand Advertising– goal: to deliver brand message to selected audience– key: selecting audience

• example strategy (traditional): find audience based on publishedcontent (tv shows, magazines) or location (billboards, etc.)

• On-line Brand Advertising– contrast with “direct marketing” on-line advertising

• for BA, goal is not necessarily clicks or on-line conversions

– traditional strategy applies: find audience based on published content

• premium display slots or remnants (e.g., on espn.com, etc.)• contextual targeting (e.g., Google AdSense)

– alternative strategy: identify audience members and target anywhere on the web (e.g., bid for them on ad exchanges)

• behavioral targeting• social network targeting

© Provost 2009

• Non-premium display ad market predicted to grow significantly faster than the rest of on-line advertising (e.g., sponsored search, premium display, contextual)

– (Coolbrith 2007) – largely due to the stabilization of the technical ad-serving

infrastructure based on the consolidation into a small number of ad networks (e.g., Doubleclick, RightMedia)

• There is evidence that display brand advertising increases purchases (on-line and off-), and improves search advertising as well (Comscore 2008, Atlas Institute 2007, Fayyad personal communication, Klaassen 2009)

© Provost 2009

On-line Brand Advertising

• Brand Advertising– goal: to deliver brand message to selected audience– key: selecting audience

• example strategy (traditional): find audience based on publishedcontent (tv shows, magazines) or location (billboards, etc.)

• On-line Brand Advertising– contrast with “direct marketing” on-line advertising

• for BA, goal is not necessarily clicks or on-line conversions

– traditional strategy applies: find audience based on published content

• premium display slots or remnants (e.g., on espn.com, etc.)• contextual targeting (e.g., Google AdSense)

– alternative strategy: identify audience members and target anywhere on the web (e.g., bid for them on ad exchanges)

• behavioral targeting• social network targeting

© Provost 2009

Hill, Provost, and Volinsky. “Network-based Marketing: Identifying likely adopters via consumer networks. ” Statistical Science 21 (2) 256–276, 2006.

Prior work:

Social network targeting (Hill et al. ‘06)

• Define Social Network Targeting--> cross between viral marketing and traditional

• target “network neighbors” of existing customers• based on direct communication between consumers• this could expand “virally” through the network without any word-of-

mouth advocacy, or could take advantage of it.

• Example application:– Product: new communications service– Firm with long experience with targeted marketing– Sophisticated segmentation models based on data, experience, and

intuition• e.g., demographic, geographic, loyalty data• e.g., intuition regarding the types of customers known or thought to have

affinity for this type of service

• Results: tremendous lift in response rate (2-5x)

© Provost 2009

• nodes are people• links are communications• red nodes are fraudsters

A snippet from an actual network including “bad guys”

these two bad guys are well connected

Dialed-digit detector (Fawcett & P., 1997)Communities of Interest (Cortes et al. 2001)

© Provost 2009

© Provost 2009

• �

© Provost 2009

Our contribution

1. A privacy-friendly technique for selecting and targeting brand audiences:

– by finding social neighbors of existing “brand actors”– using no-PII at all– based on visits to social-networking pages (and other UGC)– ironically, a 3rd-party ad network can provide greater privacy

2. A method for evaluating on-line brand audiences– based on density of “brand actors”– following the ideas from hold-out evaluation for predictive

modeling

3. A demonstration that the network neighbor audiences indeed have strong brand affinity

UGC = user-generated content (non-professional)brand actors = browsers having taken an action associated with brand affinity

© Provost 2009

From bipartite content-affinity network to quasi-social network (cartoon)

content visited (UGC pages)

browsers

among browsers

“social” network

© Provost 2009

Audience selection in a nutshell

• Advertiser indicates action showing brand affinity– visiting loyalty page, signing in to account, purchasing, visiting home page, etc.

• Collect brand action takers as seed nodes

– call the set of seed nodes B+

• Identify the set (N) of network neighbors of B+

• Rank N based on “brand proximity” to B+

• Choose audience A as the the top-ranked members of N

NoteNote: This can be done without saving any PII: only random numbers for the browser and for the content

brand proximity: a measure of similarity/distance between a node b and the set B+

similar in spirit to point-to-cluster distances

B+

© Provost 2009

Brand proximity measures

• POSCNT– number of unique content pieces

connecting browser to B+

• MATL– maximum number of content pieces

through which paths connect browser to some particular action taker (i.e., seed node in B+)

• minEUD– minimum Euclidean distance of

normalized content vector to a seed node

• maxCos– maximum cosine similarity to a seed node

• ATODD– “odds” of a neighbor being an action taker

(i.e., seed node in B+).

B+

A

B

CD

E

© Provost 2009

Evaluating brand audiences

• Define two time periods: t1 and t2– all decisions on selecting audience A are made during t1

– t2 is disjoint from and subsequent to t1

• Collect brand action takers in t2 (disjoint from seeds)

– call this set: B2+

• To evaluate an audience A we can compute the future density of brand actors:

• We would like to know how well our brand proximity measures rank future brand actors, so we can compute the area under the ROC curve (AUC) for any measure

A∩ B2+

A

© Provost 2009

Our study: Social Network Data

(from a working ad network)• a sample of about 10 million anonymized browsers • all of their observed visits to social networking content over 90 days (from several of the largest SN sites)

• bipartite graph: – 107 x 108 with ~2.5 x 108 non-zero entries

• quasi-social network: – 107 nodes with 20-40 neighbors each (on average)

• Resultant audiences per brand– on average ~100K seed nodes – total network neighbor audience pool: 2-4 million

© Provost 2009

Our study: Brand Data

More than a dozen well-known brands, separated into two groups:

Group 1:– Four brands where no advertising was done during

experimental period (Hotel A, Modeling Agency, Credit Report, Auto Insurance)

– Plus a fifth “brand” comprising a sought-after demographic group (Parenting)

Group 2:– 10 brands where some advertising was done during the

experimental period• Apparel: HipHop, Voip A&B, Airline, Hotel B, Electronics A&B,

Apparel: Athletic, Cell Phone, Apparel: Women’s

– advertising uniform across network neighbors– advertising does not lead directly to brand action

© Provost 2009

Lift in brand actor density

[For the top-10%, ATODD was usually the best]

© Provost 2009

Lift in brand actor density:top-10 NNs vs entire NN set

© Provost 2009

In-vivo tests

BrandImpressions of PSAs

to top rankedImpressions of PSAs

to RON

Organic conversion lift

Electronic A 67 53,347 5.89

Apparel: Athletic 26,161 266,661 6.06

Apparel: Hiphop 5,757 223,509 64.65

We selected a small set of high-ranking network neighbors for three group-2 brands. In production we showed them only public service announcements (PSAs). We did the same (with the same campaign parameters) for a “run of network” campaign (bid on everyone).

We acquired from the ad exchange the rates of conversion -- here “organic” conversion.

© Provost 2009

Social vs. Quasi-Social

…

Brand F‐AUC on all BF‐AUC on N

onlyHotel A 0.96 0.79Modeling Agency 0.98 0.84Credit Report 0.93 0.79Parenting 0.94 0.80Auto Insurance 0.97 0.81

15 Brand Average 0.96 0.81

The quasi-social network embeds a friends network?

• estimate each browser’s home page based on techniques analogous to author id based on citations (Hill & Provost, 2003)

• estimate “friends” to be those who visit each other’s home page

• do brand proximity measures rank brand actors’ friends highly?

Airline

© Provost 2009

One more test

For one brand (Cell Phone) we asked Quantcast.comfor demographic profiles of the seed nodes and their network neighbors:

Demographic Seeds Neighbors

Gender Female Female

Ethnicity Hispanic Hispanic

Age Young Young

Income Low Low

Education No College No College

© Provost 2009

Summary of findings

• We can build high brand-affinity audiences by selecting the (quasi-) social network neighbors of existing brand actors, identified via co-visitation of social-networking content.

• These neighbors take brand actions at a higher rate organically, as well as after being targeted by ads, and the highly ranked neighbors do especially well.

• We can learn better models by combining evidence from the individual brand proximity measures.

• The quasi-social network likely embeds a social network.

© Provost 2009

Main contributions

• To our knowledge this is the first published work on data mining for on-line brand advertising, and we show that it can be effective.

• In particular, we devise a privacy-friendly method for targeting social-network neighbors – which is in contrast to prior work.

• We introduce a general framework for evaluating on-line brand advertising – which should be useful far beyond social network-neighbor targeting.

• We provide a striking contrast to pessimistic views of the value of social networking sites for advertising (e.g., Clemons et al. 2007).

© Provost 2009

On being “privacy-friendly”

Primary concern: people (say employees) would have access to sensitive, harmful, or embarrassing information about you.

Secondary (but nonetheless important) concern: due to data breaches these data would become public.

The privacy-friendly technique:• has no need to ever keep PII• does not keep information about content either• does not use user-supplied SN profiles (in contrast to..)

--> thus, addresses both concerns

• plus, seems to be relatively safe from reidentification techniques

© Provost 2009

foster provost

audience selection April09 - web-docs.stern.nyu.eduweb-docs.stern.nyu.edu/old_web/emplibrary/Provost... · • premium display slots or remnants (e.g., on espn.com, etc.) • contextual

Documents