Top Banner
Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing
18

Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Thank you Prof. Dr. Gerhard Boerner !

Stephen,Thomas,Houjun,Me,RobertJing

Page 2: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Large Scale Statistics in Internet Behaviors

Hongguang BiGreetingland, LLCLos Angeles, CA

Page 3: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Internet and WWW History, how it works

Internet User Behaviors & Privacy

Online AdvertisingGeo, contextual and behavior targetings, Real-time bidding, Yield management

Chapter 1

About Collect User Information, what and how

Chapter 2

Chapter 3

Chapter 4

Page 4: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Cosmology: Nature defines physical lawsInternet: Human defines laws (or specifically: protocols)

Chapter 1: Internet and WWW

Cosmology: photons, electrons, neutrinos … (monad? Leibniz) Internet: bit

Cosmology: particles => stars => galaxies => clusters etc.Internet: bits => bytes or integers => words => pages & emails

Cosmology: millions of galaxies detected => billionsInternet: millions to billions of users

Cosmology: goal=> structures, statistics of galaxiesInternet: goal=> behaviors, statistics of users

Cosmology: Real WorldInternet: Information World, or Virtual World

Page 5: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Open Systems Interconnection Model: 7 layers

TCP, UDP

IP

HTTP

Encrypt

Page 6: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Information Age: Web and EmailWWW: March 1989, Tim Berners-Lee http 0.9: 1995; http 1.0: 1996; http 1.1: June 1999, RFC 2616

Mailbox Protocol: 1971SMTP: 1982, RFC 821Later developments: UUCP, sendmail,

Page 7: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

• User sends request

• URL Address• Browser (Firefox, IE, Mobile etc.)• Language, who refers you, etc.• Cookies

• Web server responses

• Message body• Message size, modified time etc.• Server information• Setup cookies

http, how web works

Cookie is the only way that server can insert data into user’s browser.How does it work?

Client: send request without cookie;Server: response with a “Set-Cookie” header, containing some informationClient: send request with a “Cookie” header containing the SAME information

Cookie is bound to the specific server, and can be multiple

Page 8: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.
Page 9: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Chapter 2: User Behaviors & Privacy

• 1 Billion internet users: few hundred millions in Europe, 100M in US, China

• IP4 is full, which is 2^32 = 4.3 Billion addresses• Google gets 80 billions views every day, e.g. one internet user

visits about 1 Google page very day (e.g. search, email, ad)• Internet brings new economics, life styles, and social

phenomena. E.g. online shopping, social network (facebook), newspaper and publication, US elections

• For the 1st time in history, human beings might lose privacy; and their social activities can be tracked, studied, finally, manipulated by powerful players such as US government or Google etc.

Page 10: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Cases:

• Currently: “Tracking case”, Apply & GoogleInformation is transmitted securely to the Apply iAd server via a cellular network connection or Wi-Fi Internet conneciton,” explained a letter Apple sent to US Rep Edward Marke, D-Mass., on July 12 in response to his request for information, “The latitude/longitude coordinates are converted immediately by the server to a five-digit ZIP code”.

• 2008 “Suicide” case, mySpace• On the technical side, Credit card industry has

successfully built up tracking tools that trackuser behaviors for 20 year!

Page 11: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

• You definitely expose • Geographic information (via IP)• OS and Browser, such as PC, Linux,

iPhone• Language

• May lost, protected by laws• You name, identity cards (credit card,

SSI, driver license etc.)• Via online shopping sites,

government/university service sites, credit report sites, dating sites etc.

• practically, still be stolen => virus, spyware, break-in

• May lose, un-protected• Demographic information e.g.

age, gender, income, household• Via ISP, or cellular service provider,

social network sites, other Free services

What kind of Private Information?

Page 12: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

• User Profile• Uniquely identified by an

anonymous ID• The ID is tracked by using cookie

and permanently saved in disk • Every ID has a profile , consisting of

geographic information, demographic information, interests, shopping histories, recent behavior types (or, audiences) => any valuable information for advertisers

• Existing Techniques• Relational Database• Moving averages• Artificial neural Network

Chapter 4: Collect User Information

Page 13: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Relational Database

• A database consists of many “normalized” tables • A table consists of a primary key and multiple values• One table can have many keys to search

ResearchGroup: group_id, name, desciption, headMember: member_id, group_id, name, type (profession, postdoc,

student), status (current, left)Left: left_id, member_id, when, where

Page 14: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Moving Average

• A new value is an average of the last N detections, with weights that decay on time.

A simplified time-series analysis tool

Page 15: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

Artificial Neural NetworkMachine learning

Training:3,5 => 154,6 => 249,8 => 72….…..6,7 => 41

Neurons workin parallel=> very fast

Page 16: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

The Good side of tracking

Chapter 5: Online Advertising

Page 17: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

The Good side of user trackingCurrent Challenges• server process 10,000 requests per second• for each request, update user profile with 100 attributes• pick up one from 100possible advertiser candidates•10^8 decisions per second•100 million impressions per day

The system we are developing

Page 18: Thank you Prof. Dr. Gerhard Boerner ! Stephen, Thomas, Houjun, Me, Robert Jing.

• Statistics => dynamic, finding rules,clustering analysis,time-series analysis

• Instant change of behaviors , e.g. shopping intention• How are behaviors affected by environment : social effect, “friend-

recommendation” effect

• THANKS!

In the Future