[photo of generla zorg] BlackHat Analytics 2: Detect & avoid Dark Tracking
[photo of generla zorg]
BlackHat Analytics 2:Detect & avoid Dark Tracking
#BlackhatAnalytics @philpearce
Web Analytics Exchange mentor
750 GA questions answered
Tracking protection group
(DNT)
WelcomePhil PearcePPC, Privacy and Analytics [email protected]/in/philpearce
Summary1.Definition2.Example Techniques3.Classifications4.Penalties5.Industry issues6.Group/Class action wars7.Big Data8.Look at the future9.Questions
#BlackhatAnalytics #emetrics @philpearce
A long time ago...… in a google universe far, far away...
Define: Blackhat Analytics
Define: Blackhat Analytics
Define: Blackhat Analytics
“0” results
If you do this search now...
Define: Blackhat Analytics
Me
Me
It turns out...
...I know more than Google ;)
Hypothesis
At some point in the future "BlackHat Analytics" or “Faking Conversions” might become more widespread. Because...
1. WA is becoming more important for business decision making.
2. Automatic performance based PPC bid management system are becoming more widely used.
3. Increase in online competitiveness & more revenue at stake.
Definition
Intentional act of distorting, deleting, unethically using, or hijacking WA data using technical or
legal loopholes; with the goal of making financial gains, or obtaining a competitive advantage.
Phil Pearce 2009
Evil tracking from pre-2010Referral backlink log spam (depreciated SEO technique)
Ad behavioural targeting (Interest Based Stalking)Remarketing Ads (Return Visitor Stalking) - Starwars stalkerSafari 3rd party POST cookie (Preference bypassing)
NEW “Headless Browser” spam
Flash cookie respawn (Zombie Cookies)
Visited links CSS hack (History Sniffing)
GA log spam (Spider visit loading JS)
EverCookie (all of the above+)
Super evil: EverCookie
The EverCookie was so difficult to delete:
even NSA considered using it!
Source: http://www.slideshare.net/jonbonachon/tor-stinks
DECLASSIFIED
But they decided
they did not need it ;)
Examples from USA
Classification
Intent Accidental MaliciousTarget Own website Competitors websiteData collection Purpose Same Different purposeScale Niche Mass effectImpact Data uneffected GA Account deletion
Intent Accidental Malicious
Target Own websiteCompetitors
websitePurpose of
data collection
Same purpose
Different purpose
Scale Niche Mass effect
ImpactData
uneffectedGA Account
deletion
Bad/Unreliable Measure Data
Classifications
Malintent
Cashback cookies (e.g Quidco)
Flash CookieFlash Cookie Respawn
EverCookie
CSS history sniffing
Speed checking robots
Google Wifi incident
Hostname spam
Google (not provided)
Phone call logs
App error logs
Fake conversions
Referral log spam
Unintentional or Accidental
Good/Accurate Measure Data
MalintentMalintentUnintentional or Accidental
Bad/Unreliable Measure Data
Good/Accurate Measure Data
Updates
Less accidental data mistakes
More good/reliable measure data
Speed checking robots
Hostname spam
Google Wifi incident
If nasty tracking code is installed - Who is liable?
Liability for Privacy & Security
Is the agency liable?
BUT agency is responsible for• Uphold professional standards (e.g. GACP status)• Pro-active client relationship
Local laws say... Website Owner is responsible (not Agency or Vendor)
No.
Why do people still do this bad stuff?
The Lure of the Dark side is too strong!
Its all about the money! €€€
Affiliate networks looking to increase CPA and attract new Affiliate.
Online News website looking to retain users & sell stories (e.g. NYT)
Banner networks looking to improve CPM & reduce cookie deletion rates and overcome keywords “not provided”.
Sustained CPC bidding wars
Big data
But there is a disturbance in the task force...
Meet the new Matt Cutts ...
Google Privacy “Red” team soon to be hired in 2013 following FTC settlement.
Mission to discovering and prioritizing subtle, unusual, and emergent privacy & security flawshttps://www.google.com/about/jobs/locations/mountain-view/engineering/systems/data-privacy-engineer-privacy-red-team-mountain-view.html
Hired WebSpam fighter to Force quality improvements in 2000.
http://www.mattcutts.com/blog/about-me/NEW
“Red team” leaderMatt Cutts
“Internal” Imperial Bureau Security
New Google Product Manager of Privacy & information security
F@#K - GA account deleted!
You will not collect any data that personally identifies an individual such as a:
full name email address billing information or other data which can be reasonably
linked to such information by Google
You must post a Privacy Policy which provides notice that your use of cookies is to collect traffic data.
You must not circumvent any privacy features (e.g, an opt-out) that are part of GA.
www.google.com/analytics/terms/us.html
Why cant GA just remove the bad PII data?
Free WA packages unable to remove PII without deleting whole GA accounts!
Raw logs are only stored for ~30days Right to be forgotten was introduced after GA was
designed.(although this might be possible with Universal which is user-centric, not visitor-centric)
“Sensitive” data also is an issue
http://en.wikipedia.org/wiki/Personal_identifier#Examples_of_PID
Don’t use userID that contain PII…
R2D2
(random userID)
KennyBaker(Full Name
used for userID)
www.yoursite.com
[email protected]://support.google.com/adwords/answer/8206?contact=1&rd=1
site:comptetitor.com inurl:"utm_content * gmail.com“
http://www.google.de/#q=inurl:de+inurl:utm_content+*+gmail+-blog&pws=0&num=100&filter=0&as_qdr=all
e.g. www.snsanalytics.com/xXiSy9?type=track_iframe&utm_medium=FacebookPage&utm_campaign=InfoFüred&utm_source=yuppi.hu&[email protected]
Example1: Accidental PII
Solution/Counter-measure for Accidental PII
Or use temporary robots.txt fix:User-agent: *Disallow: /*utm_medium=emailDisallow: /*gmail.comNoarchive: /*utm_medium=emailNoarchive: /*gmail.com
Add exclude parameters to GWT:
email, mailutm_source, utm_medium,
utm_campain, utm_content, utm_keyword, _ga
Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY.
By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool.
If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing.
Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event.
Disclaimer
Example2: Do you recognise this number?
-92,23,372,036,854,775,807
It is a Quintillion or “Big Integer”
Intentional Data damageWARNING: Don’t Try this at Home!
javascript:_gaq.push(['_setAccount','UA- xxxxxx-1'],['_addTrans','8148350','affiliation','-9223372036854775807','-9223372036854775807','0.00','-','-','-'],['_addItem','SKU00001','8148350','BIG refund','-','-9223372036854775807','1'],['_trackTrans']);
http://www.google-analytics.com/__utm.gif?utmwv=5.4.6&utms=44&utmn=393079074&utmhn=domain.com&utmt=tran&utmtid=8148350&utmtst=affiliation&utmtto=-9223372036854775807&utmttx=-9223372036854775807&utmtsp=0.00&utmtci=-&utmtrg=-&utmtco=-&utmcs=UTF-8&utmsr=1366x768&utmvp=1366x550&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=11.9 r900&utmdt=TITLE&utmhid=509485053&utmr=-&utmp=/&utmht=1385061484294&utmac=UA-XXXXX-1&utmcc=__utma=251194116.2116214072.1385060410.1385060410.1385060410.1; __utmz=251194116.1385060410.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);&utmu=qjAL~
Solution/Counter-measure for intention Data Damage
Tool to manually fix… bit.ly/bigintegerfix
Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY.By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool.If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing.Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event.
Fine calculator
.
Fine = (No. users effected * Scale badness * Size of Brand) less
(Website Risk assessment + Vendor privacy self certification)
Sony €320K fine by ICO for email & password beach.
Adobe password Breach expected to be
£ALOT more!http://www.ico.gov.uk/news/latest_news/2013/ico-news-release-2013.aspx
http://www.youtube.com/watch?v=2vZHg2F4u5Q
Here is a Fine example
Breach notification
http://en.wikipedia.org/wiki/Data_breachhttp://www.symantec.com/content/de/de/about/downloads/press/2010_annual_study.pdf
PII`s data sucked-out
from exposed servers!
Companies must notify DPA within reasonable amount of time, but not (currently) obligated to notify public!
Consumers VS Advertiser
But there is still an Imbalance in the force
Because…
• Maturity in Advertising sector
• User data allows better Ad targeting = €
• MORE data better targeting = €€
Data is power
We do'na
the datacapt
Rise of the Big Data Empire
Data Greed &
Fear of losing existing user data
Dark motivations:
Triggered…
Group/Class Action Wars
Note: “Class” is a collective of users(e.g. “South Bohemian Mothers group” vs Temelin nuclear Power plant)
Define: Class Action Prosecutor they represent the users.
Like Affiliates (i.e revenue motivated)but larger resources & clever-er
For example….
US Class Action Prosecutor: Like bounty hunters, but more
… sophisticated!
BIG class-action fines in US
Do class action lawsuits exist in Europe or are they only in US?
Question…
Class Action Prosecutors: also now active in UK!
e.g. Google UK vs Olswang Class Action (Safari 3rd party cookie bypassing on iOS)
First every UK “group action” vs Google UK on Feb 2013 claiming 10m Safari users effected
www.googlelawsuit.co.uk and www.facebook.com/SafariUsersAgainstGooglesSecretTracking
UK test case, could set precedent for
EU class-action cases!
Successful class action raids in US…
Settlement funds 50:50 between users and Class Action Lawyers. Previous settlements 70:30, thus smaller % cut for Class Action Lawyers, but huge number users in claim.
€13 million hit
€13per user
€7.5million
W3C republic – A new hope for Truce
Must be UNSET by default
DNT user signal
Browser ignore the W3C consensus on DNT
Firefox: Talk`s about a blockade of 3rd party cookies
MS: Windows8 IE10 rollsout DNT=1 which is UNSET by default!
Firefox Lost battle: Too many False positive
Firefox says its Han`s are tied for a few month on 3rd party cookies
Dark Side too powerful ;)
MS IE10 DNT=1 browser signal
ON by default…
http://www.ypolicyblog.com/policyblog/2012/10/26/dnt/
http://www.admonsters.com/article/apache-ignores-ie10-dnt-signal
…IE10 DNT signalgrounded
…Both Apache & Yahoo threaten to ignore DNT=1 from IE10…
Allow “Good” cookies
Alternative Cookie Clearinghouse proposed (like stopbad malware list)
Block “Bad” Cookie`s
2 years reign!
Infighting & disunity between Advertisers & Privacy Advocates.
Definition of Tracking (DNT) still not defined!
http://www.theregister.co.uk/2013/11/05/do_not_track_w3c_ads_privacy/
W3C republic
Group “almost” disbandedPeter Swire - Chief resignJonathan Mayer – Firefox resignsDigital Advertisers Association – leaves group!
Old W3C republic
Key member: Thomas Roessler
joins Google!
Imperial
Durnt, durnt, durnt… durnt, dan ner!
New Imperial Advertising Principles AdChoices proposed as
replacement for W3C`s DNT
Source: http://www.adweek.com/news/technology/daa-convene-new-do-not-track-group-updated-153023
Privacy in the Universe restored!
Users have choice & freedom within the Global Imperial Empire
But… The secret arms race
The Dark Star
Also affiliate networks start building Device Signature conversion tracking tools:We (tradedoubler.com) are looking at options such as device recognition, using non-personally identifiable information that is freely available from a user’s device. Using advanced matching algorithms a single device can be recognized at the point of impression/click and conversion without the use of cookies. http://www.tradedoubler.com/uk-en/blog/firefox-22-cookies/ [Jun 2013]
BIG Data Centre with ability to process:1. Device Signature tracking2. UserID respawn3. Custom Remarketing
Belgiumadvanced scanner study (by KU Leuven University)
But…
Resulted in…
Secret Device Signatures
tracking plans detected!
War for Anonymity (aka War of Shadows)
Browser (excluding Chrome) secretly move to anonymise device signatures
So that all customised devices extensions look the same!
Thus… destroying any
shadow tracking
Facebook(Borg) & Google (Empire)counter attack…
Use Force-browser power, to set DNT=0 (Do Target Me)
when user signs into service (messenger/gmail)
Prism Tracker
Unexpected “Snow den monster”
Enforcers/regulators get a boost of user support
Ed
Headless Browser robotic crawler causing havok in GA data!
Impossible to differentiate from a real user!www.webmasterworld.com/search_engine_spiders/4619880.htm
http://nodejsmodules.org/new/tags/spider
Examples of Headless Browsers:• Zombie.js• Phantom.js• HtmlUnit
Definition: A headless browser is a web browser WITHOUT a user interface.
Authenticate/Logged-in user tracking might be only way to exclude Headless Brower tracking!
Polarisation
Dark get darker (e.g. IE fav icon 3rd party cookies bypassing browser hole/exploit)
White get whiter (e.g. duckduckgo.com
& ixquick.com, mezzobit.com increase in usage)
Return of the Jedi Strike
2015 invasion of Privacy officers
Forced 5% global revenue power (max €100 million)
University Research divisions expand use of Taint Droids
Note: Anti-train droid link:http://gsbabil.github.io/AntiTaintDroid/
source: bringyourownit.com/2014/04/09/eu-data-protection-reform-the-100-million-euro-fine/
& www.bbc.co.uk/news/technology-25825690
$ Fines/Lawsuits
Low Chance of Blackhat
Detection
High Chance of Blackhat
Detection
Balance of Power
Ad Revenue $Browsers Neutral
(in the middle)
Google Data Empire
Facebook Borg
Class Action Prosecutors
Jedi Enforcers
…HAS CAUSED USER CONFUSION
& A MUDDLE
Because…
LITTLE MISS INFORMATIO
N
Data Dealer video
http://www.youtube.com/watch?v=x2eCAgQ1DTo&list=PL45AABD8BB96D3785&index=7
THIS HAS CAUSED USER CONFUSION
& A MUDDLE
So… Are we the bad guys?
In the eyes of the user… YES!!
…How do WE prevent big corporations
(and niche bad players) misusing user data/power?
With Great Data comes Great responsibility
Industry need to govern & enforce itself!
Look to the future…
That’s means YOU need to agree not break the analytics code of honour
AND make sure no one else abuses the system!
Good Bad
Report any thing that
looks a bit “Grey”
Standards & Self regulation
• Vendor built-in privacy & miss-use protection• Adwords & Adsense ToS levels• Affiliate network guidelines• WAA Code of Conduct• GA qualified individual• GAP certified partner• WAA Certified Ethical Analyst• Risk assessment / Compliance audit• Third party reviews & compliance automated monitoring
Please look out for U.i.O
User Intent Override
Is this a User Intent Override?
UIO?
ONE exception…(false U.i.O sighting)
Track me!
If user..Reads tracking message &
they still say… YES, track me!Then its not UiO
Just Quantitative self – tracking
agreement
Need for Industry standards and Honey pots / seeds tests.
Forced Training & Accreditation (e.g. Certified Analyst or MOWA member)
Google Adwords privacy cpc tax and Google organic SERP ranking bonus (SSL as ranking signal is a start)
Fixes (GA profile filters) GA profile filters:
Hostname include filter: (^|\.)yourdomain.com$
ISP location exclude Ask.com bot: ^(inktomi corporation|iac search and media europe ltd|iac search media inc|yahoo\! inc\.|facebook inc\.|stumbleupon inc\.|dub6 ec2|site confidence test agent servers|site ?confidence|apache ltd\.|nielsen netratings|affinity internet inc|microsoft corp)$
Top content report - Contains box: (email|add|postcode|zipcode|tel) or [?&](.+)=(.*)gmail\.com
Weekly scheduled report to check for the above
Check data stored in utm_content, User-defined, CustomFields & Event fields
Check all GA profiles including Raw Data profile for PII`s, and add exclude parameters where necessary.
Fixes (process changes) Account protection
Training for developers and marketers Check Scheduled reports not sending to
unknown users. Limit number of Number of Admin users Enable 2 stage authentication if possible. Looks for unusual variances of data spikes in
GA (especially new visits to homepage) CPA audits (GA vs Affiliate report)
Back to the present day…
Expected soon
Yikes… are they Disabling Tracking??
…California DNT track law Sept 2013
I`ll be track-ed (still)
No! California just asks for DNT visibility
(i.e. Does your server read the DNT signal?)
Prevention Use a tag management system, that is configured with
digitalData layer privacy features enabled (see appendix)
Try to use POST request rather than GET request where possible, or a form action=/thankyoupage.html
Keep pdf reader, flash & java updated
Lockdown FTP to fixed set of static IP`s, use long passwords, and ideally use 2stage Authentication for GTM write-access.
Recent development… Privacy Vigilantism
Good:• Egypt Gov “disconnected the
Internet” to control decedents• Anonymous coordinated with
decedents to re-setup internet communications in Egypt
Bad:• They ignore the law!• Young & inexperienced• “Splitter groups” & “out of control”
- hacking random websites!
Small Group of Users are revolting: Anonymous
This is how things should be…(Closing Remarks)
Google acts even more responsibly
Facebook introduces a more human(friendly) privacy interface
Users should not needing to rely on despicable class action lawyers
Enforcers become just watchers not needing to intervene
May the Data be on your side!
Party!
Party Tonight:19:30 NVMERI 20:10 MyCool King + DJ Trush 21:00 Charlie Straight22:15 midi lidi
May 4th be with you!Party
!Party Tonight:19:30 NVMERI 20:10 MyCool King + DJ Trush 21:00 Charlie Straight22:15 midi lidi
But.. be careful of the 5th November!
Sith
May the force
And 25th December - I feel your presents
Please Sign up to be a force for good… Google for “DAA code of ethics” or “MOA code of conduct” Please Sign!
www.digitalanalyticsassociation.org/codeofethics
www.moaweb.nl/Richtlijnen/internationale-gedragscodes-en-richtlijnen/2012-09-17%20GRBN%20Code%20Comparison.pdf/view
Thanks & Questions
#BlackhatAnalytics @philpearce
Appendix…
DISCLAIMER – I`m not a lawyer
GA terms of servicehttp://www.google.com/analytics/terms/us.htmlhttp://www.google.com/analytics/learn/privacy.html
Privacy Trouble shooterhttp://support.google.com/bin/static.py?hl=en&ts=1291807&page=ts.cs
Report a privacy concernhttp://www.google.com/contact/
Contact Google Analyticshttp://support.google.com/analytics/bin/request.py?hlrm=en&contact_type=contact_policyhttps://support.google.com/adwords/answer/8206?contact=1&rd=1
Report a security [email protected]://www.google.com/security.html
Discussion Questions How much is your data worth? Can you afford to drive traffic in the dark with no
insight? Is PII or sensitive data or urls being accidentally
tracked? Can competitors detect that PII data is being sent
into GA? Are you in a very competitive industry? When was the last time you audited your WA
installation? Are you capturing data that easily allows an
individual to be “linked” or “re-identified” by Google (e.g. detailed demographic data example, or Netflix.com + IMDB.com example1 or example2)
Related presentations & resources
.
CookieTAB virus screenshotshttps://www.dropbox.com/s/w0gprycb23ajguw/2011_03_18%20CookieTAB%20virus%20screenshots%20.pptx
Effect of EU Cookie law on US businesses: https://www.dropbox.com/s/ces1m53mm7o4gmm/2012-10-04%20GAUGE%20Boston%20-%20Effect%20of%20EU%20Cookie%20law%20on%20US%20organisations.pptx
Recipe for a Cookie Lawhttps://www.dropbox.com/s/l9n3gchusdv57bm/2011_03_18%20Recipe%20for%20a%20Cookie%20Law%20by%20Phil%20Pearce%20.pptx
Cookie law Implementation Exampleshttps://www.dropbox.com/s/7q8qfxesk44tpkc/Implimentation%20Examples%20by%20Phil%20Pearce%202012_03_18.pptx
Cookie compliance Audit - Example.docxhttps://www.dropbox.com/s/idyrql6c1aniaw6/01%20UK%20Cookie%20compliance%20Audit%20-%20Example.docx
CookieLaw research in 90mb Dropbox: https://www.dropbox.com/s/uapu90d7rc2uxl1/2012_Cookie_Law_Resources_Folder_40mb_Download.zip
AppendixExternal privacy feedback mechanisms:safeharbor.export.gov/companyinfo.aspx?id=16626feedback-form.truste.com/watchdog/request?url=www.google.comwww.bbb.org/sanjose/business-reviews/internet-services/google-in-mountain-view-ca-214105/file-a-complaintwww.networkadvertising.org/contact-support/report-problem/i-would-report-violation-of-nai-code-nai-member-company-2www.snapsurveys.com/swh/surveylogin.asp?k=133707671186 [ICO.gov.uk form]addons.mozilla.org/en-US/firefox/addon/privacy-dashboard/ [W3C feedback mechanism]www.google.com/trends/explore?hl=en#cat=0-14-54-1281&geo=US&date=today%203-m&cmpt=q [user web searches in category of “privacy” per country]
Security & Privacy prize of upto £13K offered by Google for detecting holes:www.google.com/about/appsecurity/reward-program/blog.chromium.org/2012/08/announcing-pwnium-2.htmlExample XSS hole in GA found in 2008: derkeiler.com/Mailing-Lists/Full-Disclosure/2008-12/msg00200.html
Open Source feedback techniques fourthparty.info/dataappanalysis.org/download.html
Free to check cookie databases:www.cookielaw.org/cookie-search.aspx?domain=http://www.facebook.comwww.cookiecert.com/cookies-for-facebook.comprivacyscore.com/score_details/2a03b4fe8d9d4eb8b4fb0ccf356cbaaa/showcase