[photo of generla zorg] BlackHat Analytics 2: Detect & avoid Dark Tracking
2. #BlackhatAnalytics @philpearce Web Analytics Exchange mentor 750 GA questions answered Tracking protection group (DNT) Welcome Phil Pearce PPC, Privacy and Analytics Expert Freelancer @philpearce www.linkedin.com/in/philpearce 3. Summary 1.Definition 2.Example Techniques 3.Classifications 4.Penalties 5.Industry issues 6.Group/Class action wars 7.Big Data 8.Look at the future 9.Questions #BlackhatAnalytics #emetrics @philpearce 4. A long time ago... in a google universe far, far away... 5. Define: Blackhat Analytics 6. Define: Blackhat Analytics Define: Blackhat Analytics 0 results 7. If you do this search now... Define: Blackhat Analytics 8. Me Me It turns out... ...I know more than Google ;) 9. Hypothesis At some point in the future "BlackHat Analytics" or Faking Conversions might become more widespread. Because... 1. WA is becoming more important for business decision making. 2. Automatic performance based PPC bid management system are becoming more widely used. 3. Increase in online competitiveness & more revenue at stake. 10. Definition Intentional act of distorting, deleting, unethically using, or hijacking WA data using technical or legal loopholes; with the goal of making financial gains, or obtaining a competitive advantage. Phil Pearce 2009 11. Evil tracking from pre-2010 Referral backlink log spam (depreciated SEO technique) Ad behavioural targeting (Interest Based Stalking) Remarketing Ads (Return Visitor Stalking) - Starwars stalker Safari 3rd party POST cookie (Preference bypassing) NEW Headless Browser spam Flash cookie respawn (Zombie Cookies) Visited links CSS hack (History Sniffing) GA log spam (Spider visit loading JS) EverCookie (all of the above+) 12. Super evil: EverCookie 13. The EverCookie was so difficult to delete: even NSA considered using it! Source: http://www.slideshare.net/jonbonachon/tor-stinks But they decided they did not need it ;) 14. Examples from USA 15. Classification Intent Accidental Malicious Target Own website Competitors website Data collection PurposeSame Different purpose Scale Niche Mass effect Impact Data uneffected GA Account deletion Intent Accidental Malicious Target Own website Competitors website Purpose of data collection Same purpose Different purpose Scale Niche Mass effect Impact Data uneffected GA Account deletion 16. Bad/Unreliable Measure Data Classifications Malintent Cashback cookies (e.g Quidco) Flash Cookie Flash Cookie Respawn EverCookie CSS history sniffing Speed checking robots Google Wifi incident Hostname spam Google (not provided) Phone call logs App error logs Fake conversions Referral log spam Unintentional or Accidental Good/Accurate Measure Data 17. MalintentMalintent Unintentional or Accidental Bad/Unreliable Measure Data Good/Accurate Measure Data Updates Less accidental data mistakes More good/reliable measure data Speed checking robots Hostname spam Google Wifi incident 18. If nasty tracking code is installed - Who is liable? 19. Liability for Privacy & Security Is the agency liable? BUT agency is responsible for Uphold professional standards (e.g. GACP status) Pro-active client relationship Local laws say... Website Owner is responsible (not Agency or Vendor) No. 20. Why do people still do this bad stuff? 21. The Lure of the Dark side is too strong! 22. Its all about the money! Affiliate networks looking to increase CPA and attract new Affiliate. Online News website looking to retain users & sell stories (e.g. NYT) Banner networks looking to improve CPM & reduce cookie deletion rates and overcome keywords not provided. Sustained CPC bidding wars Big data 23. But there is a disturbance in the task force... 24. Meet the new Matt Cutts ... Google Privacy Red team soon to be hired in 2013 following FTC settlement. Mission to discovering and prioritizing subtle, unusual, and emergent privacy & security flaws https://www.google.com/about/jobs/locations/mountain- view/engineering/systems/data-privacy-engineer- privacy-red-team-mountain-view.html Hired WebSpam fighter to Force quality improvements in 2000. http://www.mattcutts.com/blog/about-me/ Red team leaderMatt Cutts 25. Internal Imperial Bureau Security New Google Product Manager of Privacy & information security 26. F@#K - GA account deleted! You will not collect any data that personally identifies an individual such as a: full name email address billing information or other data which can be reasonably linked to such information by Google You must post a Privacy Policy which provides notice that your use of cookies is to collect traffic data. You must not circumvent any privacy features (e.g, an opt-out) that are part of GA. www.google.com/analytics/terms/us.html 27. Why cant GA just remove the bad PII data? Free WA packages unable to remove PII without deleting whole GA accounts! Raw logs are only stored for ~30days Right to be forgotten was introduced after GA was designed. (although this might be possible with Universal which is user-centric, not visitor-centric) 28. Sensitive data also is an issue http://en.wikipedia.org/wiki/Personal_id entifier#Examples_of_PID 29. Dont use userID that contain PII R2D2 (random userID) KennyBaker (Full Name used for userID) 30. www.yoursite.com privacy@google.com https://support.google.com/adwords/answer/8206?contact=1&rd=1 site:comptetitor.com inurl:"utm_content * gmail.com http://www.google.de/#q=inurl:de+inurl:utm_content+*+gmail+- blog&pws=0&num=100&filter=0&as_qdr=all e.g. www.snsanalytics.com/xXiSy9?type=track_iframe&utm_medium=FacebookPage&utm_ca mpaign=InfoFred&utm_source=yuppi.hu&utm_content=NAME.REMOVED@gmail.com Example1: Accidental PII 31. Solution/Counter-measure for Accidental PII Or use temporary robots.txt fix: User-agent: * Disallow: /*utm_medium=email Disallow: /*gmail.com Noarchive: /*utm_medium=email Noarchive: /*gmail.com Add exclude parameters to GWT: email, mail utm_source, utm_medium, utm_campain, utm_content, utm_keyword, _ga 32. Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY. By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool. If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing. Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event. Disclaimer 33. Example2: Do you recognise this number? It is a Quintillion or Big Integer 34. Intentional Data damage WARNING: Dont Try this at Home! javascript:_gaq.push(['_setAccount','UA-xxxxxx-1'],[' _addTr ans','8148350','affiliati on','-9223372036854775807' ,'-9223372036854775807','0.00','-','- ','-'],['_addItem','SKU 00001','8148350','BIG refund','-','-9223372036854775807','1'],['_trackTr ans']); http://www.google-analytics.com/__utm.gif?utmwv=5.4.6&utms=44&utmn=393079074&utmhn=domain.com&utmt=tran&utmtid=8148350&utmtst= affiliation&utmtto=-9223372036854775807&utmttx=-9223372036854775807&utmtsp=0.00&utmtci=-&utmtrg=-&utmtco=-&utmcs= UTF-8&utmsr =1366x768&utm vp=1366x550&utmsc=24- bit&utmul=en- us&utmje=1&utmfl=11.9 r900&utmdt=TITLE&utmhid=509485053&utmr =-&utmp=/&utmht=1385061484294&utmac=UA-XXXXX-1&utmcc=__utma=251194116.2116214072.1385060410.1385060410.1385060410.1; __utmz=251194116.1385060410.1.1.utmcsr=( direct)|utmccn=(direct)|utmcmd=(none);&utmu=qjAL~ 35. Solution/Counter-measure for intention Data Damage Tool to manually fix bit.ly/bigintegerfix Legal Disclaimer: The purpose of this example is to demonstrate a hole in all Analytics platforms, and how to patch this hole. It is used for TESTING purposes ONLY. By reading this example you agree to NOT use this on a live website, and agree that I (Phil Pearce) and NOT liabilities for and damage that a website owner may suffer arising out of this example & tool. If you are in any doubt, please seek the advice of the Google legal team www.google.com/contact/ or your local legal counsel BEFORE testing. Note: This issue has been raised on the GACP private discussion forum 6months ago, prior to this event. 36. Fine calculator . Fine = (No. users effected * Scale badness * Size of Brand) less (Website Risk assessment + Vendor privacy self certification) 37. Sony 320K fine by ICO for email & password beach. Adobe password Breach expected to be ALOT more! http://www.ico.gov.uk/news/latest_news/2013/ico-news- release-2013.aspx http://www.youtube.com/watch?v=2vZHg2F4u5Q Here is a Fine example 38. Breach notification http://en.wikipedia.org/wiki/Data_breach http://www.symantec.com/content/de/de/about/downloads/press/2010_annual_study.pdf PII`s data sucked-out from exposed servers! Companies must notify DPA within reasonable amount of time, but not (currently) obligated to notify public! 39. Consumers VS Advertiser But there is still an Imbalance in the force 40. Because Maturity in Advertising sector User data allows better Ad targeting = MORE data better targeting = 41. Data is power We do'na the data capt 42. Rise of the Big Data Empire 43. Data Greed & Fear of losing existing user data Dark motivations: 44. Triggered 45. Group/Class Action Wars Note: Class is a collective of users (e.g. South Bohemian Mothers group vs Temelin nuclear Power plant) 46. Define: Class Action Prosecutor they represent the users. Like Affiliates (i.e revenue motivated) but larger resources & clever-er For example. 47. US Class Action Prosecutor: Like bounty hunters, but more sophisticated! 48. BIG class-action fines in US 49. Do class action lawsuits exist in Europe or are they only in US? Question 50. Class Action Prosecutors: also now active in UK! e.g. Google UK vs Olswang Class Action (Safari 3rd party cookie bypassing on iOS) 51. First every UK group action vs Google UK on Feb 2013 claiming 10m Safari users effected www.googlelawsuit.co.uk and www.facebook.com/SafariUsersAgainstGooglesSecretTracking UK test case, could set precedent for EU class-action cases! 52. Successful class action raids in US Settlement funds 50:50 between users and Class Action Lawyers. Previous settlements 70:30, thus smaller % cut for Class Action Lawyers, but huge number users in claim. 13 million hit 13 per user 7.5 million 53. W3C republic A new hope for Truce Must be UNSET by default DNT user signal 54. Browser ignore the W3C consensus on DNT Firefox: Talk`s about a blockade of 3rd party cookies MS: Windows8 IE10 rollsout DNT=1 which is UNSET by default! 55. Firefox Lost battle: Too many False positive Firefox says its Han`s are tied for a few month on 3rd party cookies Dark Side too powerful ;) 56. MS IE10 DNT=1 browser signal ON by default http://www.ypolicyblog.com/policyblog/2012/10/26/dnt/ http://www.admonsters.com/article/apache-ignores-ie10-dnt-signal IE10 DNT signal grounded Both Apache & Yahoo threaten to ignore DNT=1 from IE10 57. Allow Good cookies Alternative Cookie Clearinghouse proposed (like stopbad malware list) Block Bad Cookie`s 58. 2 years reign! Infighting & disunity between Advertisers & Privacy Advocates. Definition of Tracking (DNT) still not defined! http://www.theregister.co.uk/2013/11/05/do_not_track_w3c_ads_privacy/ W3C republic 59. Group almost disbanded Peter Swire - Chief resign Jonathan Mayer Firefox resigns Digital Advertisers Association leaves group! Old W3C republic Key member: Thomas Roessler joins Google! 60. Imperial Durnt, durnt, durnt durnt, dan ner! 61. New Imperial Advertising Principles AdChoices proposed as replacement for W3C`s DNT Source: http://www.adweek.com/news/technology/daa-convene-new-do-not-track-group-updated-153023 62. Privacy in the Universe restored! Users have choice & freedom within the Global Imperial Empire 63. But The secret arms race 64. The Dark Star Also affiliate networks start building Device Signature conversion tracking tools: We (tradedoubler.com) are looking at options such as device recognition, using non-personally identifiable information that is freely available from a users device. Using advanced matching algorithms a single device can be recognized at the point of impression/click and conversion without the use of cookies. http://www.tradedoubler.com/uk-en/blog/firefox-22- cookies/ [Jun 2013] BIG Data Centre with ability to process: 1. Device Signature tracking 2. UserID respawn 3. Custom Remarketing 65. Belgium advanced scanner study (by KU Leuven University) But Resulted in Secret Device Signatures tracking plans detected! 66. War for Anonymity (aka War of Shadows) 67. Browser (excluding Chrome) secretly move to anonymise device signatures So that all customised devices extensions look the same! Thus destroying any shadow tracking 68. Facebook(Borg) & Google (Empire) counter attack Use Force-browser power, to set DNT=0 (Do Target Me) when user signs into service (messenger/gmail) 69. Prism Tracker Unexpected Snow den monster Enforcers/regulators get a boost of user support Ed 70. Headless Browser robotic crawler causing havok in GA data! Impossible to differentiate from a real user! www.webmasterworld.com/search_engine_spiders/4619880.htm http://nodejsmodules.org/new/tags/spider Examples of Headless Browsers: Zombie.js Phantom.js HtmlUnit Definition: A headless browser is a web browser WITHOUT a user interface. 71. Authenticate/Logged-in user tracking might be only way to exclude Headless Brower tracking! 72. Polarisation Dark get darker (e.g. IE fav icon 3rd party cookies bypassing browser hole/exploit) White get whiter (e.g. duckduckgo.com & ixquick.com, mezzobit.com increase in usage) 73. Return of the Jedi Strike 2015 invasion of Privacy officers Forced 5% global revenue power (max 100 million) University Research divisions expand use of Taint Droids Note: Anti-train droid link: http://gsbabil.github.io/AntiTaintDroid/ source: bringyourownit.com/2014/04/09/eu-data-protection-reform-the-100-million-euro-fine/ & www.bbc.co.uk/news/technology-25825690 74. $ Fines/Lawsuits Low Chance of Blackhat Detection High Chance of Blackhat Detection Balance of Power Ad Revenue $ Browsers Neutral (in the middle) Google Data Empire Facebook Borg Class Action Prosecutors Jedi Enforcers 75. HAS CAUSED USER CONFUSION & A MUDDLE Because LITTLE MISS INFORMATION 76. Data Dealer video http://www.youtube.com/watch?v=x2eCAgQ1DTo&list=PL45AABD8BB96D3785&index=7 77. THIS HAS CAUSED USER CONFUSION & A MUDDLE 78. So Are we the bad guys? 79. In the eyes of the user YES!! 80. How do WE prevent big corporations (and niche bad players) misusing user data/power? 81. With Great Data comes Great responsibility 82. Industry need to govern & enforce itself! Look to the future 83. Thats means YOU need to agree not break the analytics code of honour AND make sure no one else abuses the system! Good Bad Report any thing that looks a bit Grey 84. Standards & Self regulation Vendor built-in privacy & miss-use protection Adwords & Adsense ToS levels Affiliate network guidelines WAA Code of Conduct GA qualified individual GAP certified partner WAA Certified Ethical Analyst Risk assessment / Compliance audit Third party reviews & compliance automated monitoring 85. Please look out for U.i.O User Intent Override 86. Is this a User Intent Override? UIO? 87. ONE exception (false U.i.O sighting) Track me! If user.. Reads tracking message & they still say YES, track me! Then its not UiO Just Quantitative self tracking agreement 88. Need for Industry standards and Honey pots / seeds tests. Forced Training & Accreditation (e.g. Certified Analyst or MOWA member) Google Adwords privacy cpc tax and Google organic SERP ranking bonus (SSL as ranking signal is a start) 89. Fixes (GA profile filters) GA profile filters: Hostname include filter: (^|.)yourdomain.com$ ISP location exclude Ask.com bot: ^(inktomi corporation|iac search and media europe ltd|iac search media inc|yahoo! inc.|facebook inc.|stumbleupon inc.|dub6 ec2|site confidence test agent servers|site ?confidence|apache ltd.|nielsen netratings|affinity internet inc|microsoft corp)$ Top content report - Contains box: (email|add|postcode|zipcode|tel) or [?&](.+)=(.*)gmail.com Weekly scheduled report to check for the above Check data stored in utm_content, User-defined, CustomFields & Event fields Check all GA profiles including Raw Data profile for PII`s, and add exclude parameters where necessary. 90. Fixes (process changes) Account protection Training for developers and marketers Check Scheduled reports not sending to unknown users. Limit number of Number of Admin users Enable 2 stage authentication if possible. Looks for unusual variances of data spikes in GA (especially new visits to homepage) CPA audits (GA vs Affiliate report) 91. Back to the present day 92. Expected soon Yikes are they Disabling Tracking?? California DNT track law Sept 2013 93. I`ll be track-ed (still) No! California just asks for DNT visibility (i.e. Does your server read the DNT signal?) 94. Prevention Use a tag management system, that is configured with digitalData layer privacy features enabled (see appendix) Try to use POST request rather than GET request where possible, or a form action=/thankyoupage.html Keep pdf reader, flash & java updated Lockdown FTP to fixed set of static IP`s, use long passwords, and ideally use 2stage Authentication for GTM write-access. 95. Recent development Privacy Vigilantism Good: Egypt Gov disconnected the Internet to control decedents Anonymous coordinated with decedents to re-setup internet communications in Egypt Bad: They ignore the law! Young & inexperienced Splitter groups & out of control - hacking random websites! Small Group of Users are revolting: Anonymous 96. This is how things should be (Closing Remarks) Google acts even more responsibly Facebook introduces a more human(friendly) privacy interface Users should not needing to rely on despicable class action lawyers Enforcers become just watchers not needing to intervene 97. May the Data be on your side! Party Tonight: 19:30 NVMERI 20:10 MyCool King + DJ Trush 21:00 Charlie Straight 22:15 midi lidi 98. May 4th be with you! Party Tonight: 19:30 NVMERI 20:10 MyCool King + DJ Trush 21:00 Charlie Straight 22:15 midi lidi But.. be careful of the 5th November! Sith May the force And 25th December - I feel your presents 99. Please Sign up to be a force for good Google for DAA code of ethics or MOA code of conduct Please Sign! www.digitalanalyticsassociation.org/codeofethics www.moaweb.nl/Richtlijnen/internationale-gedragscodes-en-richtlijnen/2012-09-17%20GRBN%20Code%20Comparison.pdf/view 100. Thanks & Questions #BlackhatAnalytics @philpearce 101. Appendix 102. DISCLAIMER I`m not a lawyer GA terms of service http://www.google.com/analytics/terms/us.html http://www.google.com/analytics/learn/privacy.html Privacy Trouble shooter http://support.google.com/bin/static.py?hl=en&ts=1291807&page=ts.cs Report a privacy concern http://www.google.com/contact/ Contact Google Analytics http://support.google.com/analytics/bin/request.py?hlrm=en&contact_type=contact_policy https://support.google.com/adwords/answer/8206?contact=1&rd=1 Report a security concern security@google.com http://www.google.com/security.html 103. Discussion Questions How much is your data worth? Can you afford to drive traffic in the dark with no insight? Is PII or sensitive data or urls being accidentally tracked? Can competitors detect that PII data is being sent into GA? Are you in a very competitive industry? When was the last time you audited your WA installation? Are you capturing data that easily allows an individual to be linked or re-identified by Google (e.g. detailed demographic data example, or Netflix.com + IMDB.com example1 or example2) 104. Related presentations & resources . CookieTAB virus screenshots https://www.dropbox.com/s/w0gprycb23ajguw/2011_03_18%20CookieTAB%20virus%20scr eenshots%20.pptx Effect of EU Cookie law on US businesses: https://www.dropbox.com/s/ces1m53mm7o4gmm/2012-10- 04%20GAUGE%20Boston%20- %20Effect%20of%20EU%20Cookie%20law%20on%20US%20organisations.pptx Recipe for a Cookie Law https://www.dropbox.com/s/l9n3gchusdv57bm/2011_03_18%20Recipe%20for%20a%20Co okie%20Law%20by%20Phil%20Pearce%20.pptx Cookie law Implementation Examples https://www.dropbox.com/s/7q8qfxesk44tpkc/Implimentation%20Examples%20by%20Phil %20Pearce%202012_03_18.pptx Cookie compliance Audit - Example.docx https://www.dropbox.com/s/idyrql6c1aniaw6/01%20UK%20Cookie%20compliance%20Audi t%20-%20Example.docx CookieLaw research in 90mb Dropbox: https://www.dropbox.com/s/uapu90d7rc2uxl1/2012_Cookie_Law_Resources_Folder_40mb _Download.zip 105. Appendix External privacy feedback mechanisms: safeharbor.export.gov/companyinfo.aspx?id=16626 feedback-form.truste.com/watchdog/request?url=www.google.com www.bbb.org/sanjose/business-reviews/internet-services/google-in-mountain-view-ca- 214105/file-a-complaint www.networkadvertising.org/contact-support/report-problem/i-would-report-violation-of-nai- code-nai-member-company-2 www.snapsurveys.com/swh/surveylogin.asp?k=133707671186 [ICO.gov.uk form] addons.mozilla.org/en-US/firefox/addon/privacy-dashboard/ [W3C feedback mechanism] www.google.com/trends/explore?hl=en#cat=0-14-54-1281&geo=US&date=today%203- m&cmpt=q [user web searches in category of privacy per country] Security & Privacy prize of upto 13K offered by Google for detecting holes: www.google.com/about/appsecurity/reward-program/ blog.chromium.org/2012/08/announcing-pwnium-2.html Example XSS hole in GA found in 2008: derkeiler.com/Mailing-Lists/Full-Disclosure/2008- 12/msg00200.html Open Source feedback techniques fourthparty.info/data appanalysis.org/download.html Free to check cookie databases: www.cookielaw.org/cookie-search.aspx?domain=http://www.facebook.com www.cookiecert.com/cookies-for-facebook.com privacyscore.com/score_details/2a03b4fe8d9d4eb8b4fb0ccf356cbaaa/showcase