Analyzing Characteristic Host Access Management of Information Security University of Regensburg, Germany Patterns for Re-Identification of Web User Sessions Nordsec 27. – 29. October 2010 Aalto University, Espoo, Finland Dominik Herrmann, Christoph Gerber , Christian Banse, Hannes Federrath
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Dr. Max Mustermann Referat Kommunikation & Marketing Verwaltung
Analyzing Characteristic Host Access
Management of Information Security University of Regensburg, Germany
Patterns for Re-Identification of Web User Sessions
Nordsec 27. – 29. October 2010 Aalto University, Espoo, Finland
Dominik Herrmann, Christoph Gerber, Christian Banse, Hannes Federrath
behavior-based web user re-identification Christoph Gerber 2
agenda
problem description
relation to text-mining
case study and
test setting
re-identification
behavior-based web user re-identification Christoph Gerber 3
problem description
• small user group (e.g. users of a proxy-server) • all HTTP-requests are recorded • changing IP-addresses / different surfing sessions
Proxy-Server
IP 1, User 1: www.wikipedia.de IP 2, User 2: www-sec.uni-r.de IP 2, User 2: www.cse.tkk.fi IP 1, User 1: www.google.de
behavior-based web user re-identification Christoph Gerber 4
perspective of a proxy server
t
Host
Session 1: IP1
Session 2: IP2
Session 3: IP3 Session 4: IP4
x x
x x
o
o
o o
¤ ¤
¤
¤
¤
Δ Δ
Δ Δ
Δ
x
HTTP-request to a certain host issued by a user with the IP-address 4
behavior-based web user re-identification Christoph Gerber 5
perspective of a proxy server
t
Host
Session 1: IP1
Session 2: IP2
Session 3: IP3 Session 4: IP4
x x
x x
o
o
o o
¤ ¤
¤
¤
¤
Δ Δ
Δ Δ
Δ
x
User 1
User 2
behavior-based web user re-identification Christoph Gerber 6
perspective of a proxy server
t
Host
Session 1: IP1
Session 2: IP2
Session 3: IP3 Session 4: IP4
x x
x x
o
o
o o
¤ ¤
¤
¤
¤
Δ Δ
Δ Δ
Δ
x
User 1
User 2 User 1? User 2?
someone else?
behavior-based web user re-identification Christoph Gerber 7
perspective of a proxy server
t
Host
Session 1: IP1
Session 2: IP2
Session 3: IP3 Session 4: IP4
x x
x x
o
o
o o
¤ ¤
¤
¤
¤
Δ Δ
Δ Δ
Δ
x
aggregated session
User 2
User 1
behavior-based web user re-identification Christoph Gerber 8
behavior-based web user re-identification Christoph Gerber 25
simulations
• simulation of simultaneously surfing sessions - putting together the cronologically succeeding sessions - always 28 users / session
• in each experiment one parameter was modified - session duration - number of simultaneous users - offset between last training and first test session - number of consecutive training instances
• each experiment was repeated 25 times
behavior-based web user re-identification Christoph Gerber 26
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000
(SIM)
session duration
• longer session times support re-identification
prop
ortio
n of
cor
rect
ly
clas
sifie
d se
ssio
ns
session duration in minutes
behavior-based web user re-identification Christoph Gerber 27
numer of simultaneous users
• the fewer simultaneous users the better it works
prop
ortio
n of
cor
rect
ly
clas
sifie
d se
ssio
ns
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25 30
24 hours3 hours1 hour10 min
number of concurrent users
session duration
behavior-based web user re-identification Christoph Gerber 28
offset between test and training sessions
• each user tends to act similar at the same time of the day
prop
ortio
n of
cor
rect
ly
clas
sifie
d se
ssio
ns
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120 140 160
3 hours1 hour
offset between test and training in hours
behavior-based web user re-identification Christoph Gerber 29
number of training instances
• more training instances are better, but only few are needed
prop
ortio
n of
cor
rect
ly
clas
sifie
d se
ssio
ns
number of training instances
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12 14 16 18 20
3 hours1 hour
1 hour (48 hours train/test offset)10 min
behavior-based web user re-identification Christoph Gerber 30
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50
re-id
entif
ied
sess
ions
[%]
number of proxy servers
1 day3 hours
countermeasures
• using multiple, non-colluding proxy servers works - but is not practicable (at this early stage)
• more distribution schemes conceivable
prop
ortio
n of
cor
rect
ly
clas
sifie
d se
ssio
ns
behavior-based web user re-identification Christoph Gerber 31
countermeasures
• analyzing a part of the host frequency distribution
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000
acce
ss fr
eque
ncy
Host ranking
behavior-based web user re-identification Christoph Gerber 32
countermeasures
• analyzing a part of the host frequency distribution - keep the most popular hosts
1
10
100
1000
10000
100000
1 10 100 1000 10000 100000
acce
ss fr
eque
ncy
Host ranking
behavior-based web user re-identification Christoph Gerber 33
countermeasures
• analyzing a part of the host frequency distribution - keep the most popular hosts - can not prevent from user re-identification
0
0.2
0.4
0.6
0.8
1
0 0.1 0.2 0.3 0.4 0.5
re-id
entif
ied
sess
ions
[%]
proportion of most popular hosts kept
1 day3 hours1 hour
10 minutes
prop
ortio
n of
cor
rect
ly
clas
sifie
d se
ssio
ns
behavior-based web user re-identification Christoph Gerber 34
conclusion and discussion
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET http://www.ab.com/index.html HTTP/1.0" 200 2326
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET http://www.ab.com/index.html HTTP/1.0" 200 2326
• re-identification as a feasible attack • evaluated on a privacy preserving case study
• works well for small closed groups • not only for relevant for proxy-servers
• improvements in using context information
• improvements in gathering more realistic sessions