Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing
Dec 21, 2015
Thank you Prof. Dr. Gerhard Boerner !
Stephen,Thomas,Houjun,Me,RobertJing
Large Scale Statistics in Internet Behaviors
Hongguang BiGreetingland, LLCLos Angeles, CA
Internet and WWW History, how it works
Internet User Behaviors & Privacy
Online AdvertisingGeo, contextual and behavior targetings, Real-time bidding, Yield management
Chapter 1
About Collect User Information, what and how
Chapter 2
Chapter 3
Chapter 4
Cosmology: Nature defines physical lawsInternet: Human defines laws (or specifically: protocols)
Chapter 1: Internet and WWW
Cosmology: photons, electrons, neutrinos … (monad? Leibniz) Internet: bit
Cosmology: particles => stars => galaxies => clusters etc.Internet: bits => bytes or integers => words => pages & emails
Cosmology: millions of galaxies detected => billionsInternet: millions to billions of users
Cosmology: goal=> structures, statistics of galaxiesInternet: goal=> behaviors, statistics of users
Cosmology: Real WorldInternet: Information World, or Virtual World
Open Systems Interconnection Model: 7 layers
TCP, UDP
IP
HTTP
Encrypt
Information Age: Web and EmailWWW: March 1989, Tim Berners-Lee http 0.9: 1995; http 1.0: 1996; http 1.1: June 1999, RFC 2616
Mailbox Protocol: 1971SMTP: 1982, RFC 821Later developments: UUCP, sendmail,
• User sends request
• URL Address• Browser (Firefox, IE, Mobile etc.)• Language, who refers you, etc.• Cookies
• Web server responses
• Message body• Message size, modified time etc.• Server information• Setup cookies
http, how web works
Cookie is the only way that server can insert data into user’s browser.How does it work?
Client: send request without cookie;Server: response with a “Set-Cookie” header, containing some informationClient: send request with a “Cookie” header containing the SAME information
Cookie is bound to the specific server, and can be multiple
Chapter 2: User Behaviors & Privacy
• 1 Billion internet users: few hundred millions in Europe, 100M in US, China
• IP4 is full, which is 2^32 = 4.3 Billion addresses• Google gets 80 billions views every day, e.g. one internet user
visits about 1 Google page very day (e.g. search, email, ad)• Internet brings new economics, life styles, and social
phenomena. E.g. online shopping, social network (facebook), newspaper and publication, US elections
• For the 1st time in history, human beings might lose privacy; and their social activities can be tracked, studied, finally, manipulated by powerful players such as US government or Google etc.
Cases:
• Currently: “Tracking case”, Apply & GoogleInformation is transmitted securely to the Apply iAd server via a cellular network connection or Wi-Fi Internet conneciton,” explained a letter Apple sent to US Rep Edward Marke, D-Mass., on July 12 in response to his request for information, “The latitude/longitude coordinates are converted immediately by the server to a five-digit ZIP code”.
• 2008 “Suicide” case, mySpace• On the technical side, Credit card industry has
successfully built up tracking tools that trackuser behaviors for 20 year!
• You definitely expose • Geographic information (via IP)• OS and Browser, such as PC, Linux,
iPhone• Language
• May lost, protected by laws• You name, identity cards (credit card,
SSI, driver license etc.)• Via online shopping sites,
government/university service sites, credit report sites, dating sites etc.
• practically, still be stolen => virus, spyware, break-in
• May lose, un-protected• Demographic information e.g.
age, gender, income, household• Via ISP, or cellular service provider,
social network sites, other Free services
What kind of Private Information?
• User Profile• Uniquely identified by an
anonymous ID• The ID is tracked by using cookie
and permanently saved in disk • Every ID has a profile , consisting of
geographic information, demographic information, interests, shopping histories, recent behavior types (or, audiences) => any valuable information for advertisers
• Existing Techniques• Relational Database• Moving averages• Artificial neural Network
Chapter 4: Collect User Information
Relational Database
• A database consists of many “normalized” tables • A table consists of a primary key and multiple values• One table can have many keys to search
ResearchGroup: group_id, name, desciption, headMember: member_id, group_id, name, type (profession, postdoc,
student), status (current, left)Left: left_id, member_id, when, where
Moving Average
• A new value is an average of the last N detections, with weights that decay on time.
A simplified time-series analysis tool
Artificial Neural NetworkMachine learning
Training:3,5 => 154,6 => 249,8 => 72….…..6,7 => 41
Neurons workin parallel=> very fast
The Good side of tracking
Chapter 5: Online Advertising
The Good side of user trackingCurrent Challenges• server process 10,000 requests per second• for each request, update user profile with 100 attributes• pick up one from 100possible advertiser candidates•10^8 decisions per second•100 million impressions per day
The system we are developing
• Statistics => dynamic, finding rules,clustering analysis,time-series analysis
• Instant change of behaviors , e.g. shopping intention• How are behaviors affected by environment : social effect, “friend-
recommendation” effect
• THANKS!
In the Future