7/21/2014 1 Faculty of Engineering and Computer Science Concordia Institute for Information Systems Engineering Cloud Traffic Security Wen Ming Liu INSE 6620 July 23, 2014 Agenda 2 Cloud Applications Side-Channel Attacks Challenges and Solutions Ceiling Padding
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
7/21/2014
1
Faculty of Engineering and Computer Science
Concordia Institute for Information Systems Engineering
Cloud Traffic Security
Wen Ming Liu
INSE 6620 July 23, 2014
Agenda
2
� Cloud Applications
� Side-Channel Attacks
� Challenges and Solutions
� Ceiling Padding
7/21/2014
2
Cloud Computing Architecture
3
Cloud Computing Architecture
4
7/21/2014
3
Agenda
5
� Cloud Applications
� Side-Channel Attacks
� Challenges and Solutions
� Ceiling Padding
6
Web-Based Applications
untrustedInternet
Client ServerEncryption
“Cryptography solves all security problems!”Really?
7/21/2014
4
7
Side-Channel Attack on Encrypted Traffic
Internet
Client ServerEncrypted Traffic
User Input Observed Directional Packet Sizes
a: 801→, ←54, ←509, 60→
00: 812→, ←54, ←505, 60→,
813→, ←54, ←507, 60→
b-byte s-byte
� Network packets’ sizes and directions between user and a popular search engine
� By acting as a normal user and eavesdropping traffic with sniffer pro 4.7.5.
� Collected in May 2012
Indicator of the input itself
Fixed pattern: identified input string
8
Updated Patterns Dec 2013
Internet
Client ServerEncrypted Traffic
User Input Observed Directional Packet Sizes
a: 590→, 67→, ←60, ←60, ←728, 60→
00: 590→, 67→, ←60, ←60, ←698, 60→,
590→, 68→, ←60, ←60, ←717, 60→
b-byte s-byte
� Patterns may change over time, but attacks will still work in similar ways
� Patterns may be different for different Web applications, but they are always there
Indicator of input itself
Fixed pattern: identified input string
7/21/2014
5
9
To Make Things Worse
� The “Autocomplete” feature allows adversaries to combine the packets corresponding to multiple keystrokes
� Web applications are highly interactive
User Input Observed Directional Packet Sizes
a: 590→, 67→, ←60, ←60, ←728, 60→
00: 590→, 67→, ←60, ←60, ←698, 60→,
590→, 68→, ←60, ←60, ←717, 60→
b-byte s-byte
10
Longer Inputs, More Unique the Patterns
� S value for each character entered as:
a b c d e f g
509 504 502 516 499 504 502
h i j k l m n
509 492 517 499 501 503 488
o p q r s t
509 525 494 498 488 494
u v w x y z
503 522 516 491 502 501
� First keystroke: � Second keystroke:
First Keystroke
Second Keystroke
a b c d
a 509 487 493 501 497
b 504 516 488 482 481
c 502 501 488 473 477
d 516 543 478 509 499
Unique s value 12 out of 1616 out of 16
The unique patterns leak out users’ private information: the input string
In reality, it may take more than two
keystrokes to uniquely identify an input string.
7/21/2014
6
Side-Channel Leaks
� To protect the information in critical applications against network sniffing, a common practice is to encrypt their network traffic. However, as discovered in the research, serious information leaks are still a reality.
� Even though the communications generated during these state transitions are protected by HTTPS, their observable attributes can still give away the information about the user’s selection.
� The eavesdropper cannot see the contents, but can observe: number of packets, timing/size of each packet.
11
Slides 11-27 are partially based on: S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A reality today, a challenge tomorrow. In IEEE Symposium on Security and Privacy’10, pages 191–206, 2010.
Main Findings
� Analysis of the side-channel weakness in web applications.
� Several high-profile and really popular web applications actually disclose surprisingly detailed user’s sensitive information� Personal health data, family income, investment details, search queries
� The root causes of the side-channel information leaks are the fundamental characteristics in today’s web applications.� Stateful communication, low entropy input and significant traffic
distinctions
� In-depth study on the challenges in mitigating the threat.
� Evaluate the effectiveness and the overhead of common mitigation techniques such as packet padding.
� Show that effective solutions to the side-channel problem have to be application-specific, relying on an in-depth understanding of the application being protected.
� This suggests the necessity of a significant improvement of the current practice for developing web applications.
12
7/21/2014
7
Fundamental Characteristics of Web Applications
The root causes are some fundamental characteristics in today’s web applications :
� Low entropy inputs for better interactions.� Small input space� Autosuggestion, auto-complete
� Stateful communications.� Transitions to next states depend both on the current state and
on its input.� Although information for each transition may be insignificant, their
combination can be really powerful.
� Significant traffic distinctions.� The chance of two different user actions having the same traffic
pattern is really small. � Such distinctions often come from the objected updated by client-
server data exchanges.
13
Significant traffic distinctions
14
7/21/2014
8
Basic of Wi-Fi Encryption Schemes:
WEP: susceptible to key-recovery attacksWPA: TKIP (RC4)WPA2: CCMP (128-bit AES block cipher in counter mode)
The ciphertext fully preserves the size of its plaintext!
15
Scenario: search using encrypted Wi-Fi WPA/WPA2.Example: user types “list” on a WPA2 laptop.
Consequence: Anybody on the street knows our search queries.
Attacker’s effort: linear, not exponential.
821�
910
822�
931
823�
995824�
1007
16
7/21/2014
9
OnlineHealthA
(“A” denoting a pseudonym)
� A web application by one of the most reputable companies of online services
� Illness/medication/surgery information is leaked out, as well as the type of doctor being queried.
� Vulnerable designs� Entering health records
� By typing – auto suggestion� By mouse selecting – a tree-structure organization of
elements
� Finding a doctor� Using a dropdown list item as the search input
17
tabs
Entering health records: no matter keyboard typing or mouse selection, attacker has a 2000× ambiguity reduction power.
Find-A-Doctor: attacker can uniquely identify the specialty.
Attacker’s power
7/21/2014
10
OnlineTaxA
� It is the online version of one of the most widely used applications for the U.S. tax preparation.
� Design: a tax-preparation wizard� Tailor the conversation based on user’s previous input.
� The forms that you work on tell a lot about your family� Filing status� Number of children� Paid big medical bill� The adjusted gross income (AGI)
19
Entry page of Deductions & Credits
Summary of Deductions & Credits
Full credit
Not eligible
Partial credit
All transitions have unique traffic patterns.
Consult the IRS instruction: $1000 for each child
Phase-out starting from $110,000. For every $1000 income, lose $50
credit.
$0
$110000 $150000
Not eligibleFull credit Partial credit
(two children scenario)
child credit state machine
20
7/21/2014
11
Entry page of Deductions & Credits
Summary of Deductions & Credits
Full credit
Not eligible
Partial credit
Even worse, most decision procedures for credits/deductions
have asymmetric paths.Eligible – more questions
Not eligible – no more question
Enter your paid interest
$0
$115000 $145000
Not eligibleFull credit Partial credit
Student-loan-interest credit
21
Disabled Credit
$24999
Retirement Savings$53000
IRA Contribution
$85000 $105000
College Expense $116000
$115000Student Loan Interest
$145000
First-time Homebuyer credit $150000 $170000
Earned Income Credit$41646
Child credit *$110000
Adoption expense $174730 $214780
$130000 or $150000 or $170000 …
$0
A subset of identifiable AGI thresholds
� We are not tax experts.� OnlineTaxA can find more than 350 credits/deductions.
22
7/21/2014
12
A major financial institution in the U.S.
Which funds you invest? • No secret.
• Each price history curve is a
GIF image from MarketWatch.
• Everybody in the world can
obtain the images from
MarketWatch.
• Just compare the image sizes!
OnlineInvestA
Your investment allocation• Given only the size of the pie chart,
can we recover it?
• Challenge: hundreds of pie-charts
collide on a same size.
23
Inference based on the evolution of the pie-chart size in 4-or-5 days
� The financial institution updates the pie chart every day after the market is closed.
� The mutual fund prices are public knowledge.
≅ 800 charts ≅ 80 charts ≅ 8 charts 1 chart
Siz
e o
f d
ay
1
Siz
e o
f d
ay
2;
Pri
ce
s o
f th
e d
ay
Siz
e o
f d
ay
3;
Pri
ce
s o
f th
e d
ay
Siz
e o
f d
ay
4;
Pri
ce
s o
f th
e d
ay
≅80000 c
hart
s
24
7/21/2014
13
Challenging to Mitigate the Vulnerabilities
25
� Traffic differences are everywhere. Which ones result in
serious data leaks?� Need to analyze the application semantics, the availability of
domain knowledge, etc.
� Hard.
� Is there a vulnerability-agnostic defense to fix the
vulnerabilities without finding them?� Obviously, padding is a must-do strategy.
� We found that even for the discussed apps, the defense policies
have to be case-by-case.
Why challenging?
26
7/21/2014
14
� See if problem can be solved without analyzing individual application
� Application-agnostic manner: Padding
� Rounding:
� Random padding:
� Average overhead:
� Given , reduction power being calculated after padding
Universal Mitigation Policies
27
Any Problem?
Agenda
28
� Cloud Applications
� Side-Channel Attacks
� Challenges and Solutions
� Ceiling Padding
7/21/2014
15
29
Trivial problem?
30
Don’t Forget the Cost
(Prefix) char s Value Rounding (ΔΔΔΔ)
64 160 256
(c) c 473 512 480 512
(c) d 477 512 480 512
(d) b 478 512 480 512
(d) d 499 512 640 512
(a) c 501 512 640 512
(b) a 516 576 640 768
Padding Overhead (%) 6.5% 14.1% 13.0%
1 S. Chen, R. Wang, X. Wang, and K. Zhang. Side-channel leaks in web applications: A reality today, a challenge tomorrow. In IEEE Symposium on Security and Privacy’10, pages 191–206, 2010.
� No guarantee of better privacy at a higher cost
� ∆ ↑⇏ privacy↑� ∆ ↑ ⇏ overhead↑
� To make all inputs indistinguishable by rounding will result in a 21074% overhead for a well-known online tax system 1
7/21/2014
16
Two Conflicting Goals
31
� To prevent side-channel attacks, we face two seemingly conflicting goals,
� Privacy protection: Reduce the differences in packet sizes
� Cost: Minimize the overhead (communication and processing…)
� The similar goals may allow us to borrow existing expertise on privacy-preserving data publishing(PPDP).
How?
Grouping and Breaking
Slides 31-45 are partially based on: W. M. Liu, L. Wang, K. Ren, P. Cheng, M. Debbabi, S. Zhu, “PPTP: Privacy-Preserving Traffic Padding in Web-Based Applications,” IEEE Trans. on Dependable and Secure Computing (TDSC),
Solution: Ceiling Padding
32
473 477 478 (c) c
477 477 478 (c) d
478 499 478 (d) b
499 499 516 (d) d
501 516 516 (a) c
516 516 516 (b) a
S Value Padding (Prefix) charOption 1 Option 2
Quasi-ID Function 1 Function 2 Sensitive AttributeGeneralization
PPTP:Padding group
PPDP:anonymized group
� PPTP goals:
� Privacy
� Cost
� PPDP goals:
� Privacy
� Data utility
So we can apply existing techniques in data publication to achieve ceiling padding
However, there are a few difference, and hence challenges...
� Ceiling padding: pad every packet to the maximum size in the group
7/21/2014
17
33
PPTP Components
Internet
� Interaction:
� action a:� Atomic user input that triggers traffic� A keystroke, a mouse click …
� action-sequence �� :� A sequence of actions with complete input info� Consecutive keystrokes…
� action-set Ai:� Collection of all ith actions in a set of action-seq
� Observation:
� flow-vector v:� A sequence of flows (directional packet sizes)� Triggered an action
� vector-sequence ��:� A sequence of flow-vectors� Triggered by an equal-length action-sequence
� vector-set Vi:� Collection of all ith vectors in a set of vector-seq
� Vector-Action Set VAi:
� Pairs of ith actions and corresponding ith flow-vectors
User Input Observed Directional Packet Sizes
a: 801→, ←54, ←509, 60→
00: 812→, ←54, ←505, 60→,
813→, ←54, ←507, 60→
34
Privacy and Cost
� k-indistinguishability: Given a vector-action set VA
� Padding group:any S⊆VA satisfying all the pairs in S have identical flow-vectors and no S’ ⊃S can
satisfy this property
� We say VA satisfies k-indistinguishability (k is an integer) if the cardinality of every padding group is no less than k
� Goal of privacy protection:
� Upon observing any flow-vector in the traffic, the eavesdropper cannot determine which action in the table (vector-action set) has triggered this flow-vector.
� l-diversity:
� Address the cases that:No all inputs should be treated equally in padding (for example, some statistical
information regarding the likelihood of different inputs may be publicly known).
7/21/2014
18
35
Privacy and Cost
� Vector-distance:
� Given two equal-length flow-vectors v1 and v2, vector-distance is the total number of bytes different in the flows: ���� ��, �� =∑ (|��� − ���|)"#�$� .
� Padding cost:
� Given a vector-set V, the padding cost is the sum of the vector-distances between each flow-vector in V and its countpart after padding.
� Processing cost:
� Given a vector-set V, the processing cost is the number of flows in V
which corresponding packets should be padded.
Agenda
36
� Cloud Applications
� Side-Channel Attacks
� Challenges and Solutions
� Ceiling Padding
7/21/2014
19
Challenge 1
37
473 477 478 (c) c
477 477 478 (c) d
478 499 478 (d) b
499 499 516 (d) d
501 516 516 (a) c
516 516 516 (b) a
S Value Padding (Prefix) charOption 1 Option 2
Differences and challenges:
� Data utility measures & padding cost
� Traffic padding: cost of option 1 is worse than that of option 2
� Data publication: utility of function 1 is better than that of option 2
38
Challenge 2
� Recall that adversaries may combine multiple keystokes
� Example:
� One obvious, but invalid solution:Pad every keystroke (separately)
� Another obvious, but invalid solution:Pad on the whole string!
First Keystroke
Second Keystroke
a b c d
a 487 493 501 497
b 516 488 482 481
c 501 488 473 477
d 543 478 509 499
First Keystroke
Second Keystroke
a b c d
a 487 493 501 497
b 516 488 482 481
c 501 488 473 477
d 543 478 509 499
First Keystroke
Second Keystroke
a b c d
a … … 501 …
b … 488 … …
c 501 488 … …
d … … … …
First Keystroke
Second Keystroke
a b c d
a 509 … … 501 …
b 504 … 488 … …
c 502 501 488 … …
d 516 … … … …
First Keystroke
Second Keystroke
a b c d
a 516 … … 501 …
b 504 … 488 … …
c 504 501 488 … …
d 516 … … … …
Strings 1st keystroke 2nd keystroke
ac 509 501
ca 502 501
ad 509 497
dd 516 499
… … …
7/21/2014
20
39
PPTP - Overview of Algorithms
� Intention:� To demonstrate the existence of abundant possibilities in approaching PPTP issue, and not to design an exhaustive list of solutions.
� Design three algorithms for partitioning inputs into padding groups.� Main difference: the algorithms handle in increasingly complicated cases .� Computational complexity:
� Collect data from two real-world web applications:
� A popular search engine(users’ search keyword needs to be protected)
Collect flow-vectors for query suggestion widget for all possible combinations of four lettersby crafting requests to simulate the normal AJAX connection request.
� Authoritative drug information system from national institute(user’s possible health information needs to be protected)
Collect vector-action set for all the drug information by mouse-selecting following theapplication’s three-level tree-hierarchical navigation.
7/21/2014
21
41
Overhead - Padding Cost
� The padding cost against k:
� To compare to rounding, Δ=512(engineB)andΔ=5120(drugB)which achieves only 5-indistinguishility.� Our algorithms have less padding cost in both cases.� Observe that our algorithms are superior specially when the number of flow-vectors is larger.
42
Overhead – Execution Time
� Generate n-size flow data by synthesizing n/|VA| copies of engineB and drugB.
� The computation time of mvmdGreedy increases slowly with n.� Practically efficient (1.2s for 2.7m flow-vectors),� Require slightly more overhead than rounding when it is applied to a single Δ value.
� The computational time of mvmdGreedy against privacy property k� A tighter upper bound: Ο(&* ⨯ & ⨯ 21 ⨯ λ) (worse case), Ο(&* ⨯ & ⨯ log(21 ⨯ λ)) (average case)� The computation time increases slowly with k for engineB, and decreases slowly for drugB
.
7/21/2014
22
43
Overhead – Processing Cost
� An application can choose to incorporate the padding at different stage ofprocessing a request, however, we must minimize the number of packets to bepadded.
� Pad the flow-vectors on the fly,� Modify the original data beforehand.
� The processing cost against k:� Rounding must pad each flow-vector regardless of the k’s and the applications, while ouralgorithms have much less cost for engineB and slightly less for drugB.
Extension
44
� Adapt l-diversity to address cases that:
� No all inputs should be treated equally in padding.
� Model:� Catch the information about the inequality.
� Algorithms:
� Need additional constraints on partition.
7/21/2014
23
45
Experiments
� Collect data from two real-world web applications:� Another popular search engine
(users’ search keyword needs to be protected)
� Authoritative patent information system from national institute (company’s patent interest needs to be protected)
46
Challenge 3: Ceiling Padding Defeated
Condition s Value Rounding (ΔΔΔΔ) Ceiling
Padding112 144 176
Cancer 360 448 432 528 360
Cervicitis 290 336 432 352 360
Cold 290 336 432 352 290
Cough 290 336 432 352 290
Padding Overhead (%) 18.4% 40.5% 28.8% 5.7%
2-indistinguishability
� Observation: a patient received a 360-byte packet after login
� Cancer? Cervicitis? ⇒50%,50%� Extra knowledge: this patient is a male
� Cancer? Cervicitis? ⇒100%,0%
� Facts (Ceiling Padding):
Slides 46-58 are partially based on: W. M. Liu, L. Wang, K. Ren, M. Debbabi,, “Background Knowledge-Resistant Traffic Padding for Preserving User Privacy in Web-Based Applications,” Proc. The 5th IEEE International Conference and on Cloud Computing Technology and Science (IEEE CloudCom 2013),
7/21/2014
24
47
Solution: Add Randomness
Condition s Value
Cancer 360
Cervicitis 290
Cold 290
Cough 290
Cancerous Person
36
0
36
0
36
0
� Random Ceiling Padding� Instead of deterministically forming padding groups, the server will randomly (at uniform, in this example) selects one out of the other three conditions (together with the real condition) to form a padding group for ceiling padding.
Always receive a 360-byte packet
Condition s Value
Cancer 360
Cervicitis 290
Cold 290
Cough 290
Cervicitis Patient
36
0
29
0
66.7%: 290-byte packet33.3%: 360-byte packet
29
0
48
Better Privacy Protection
Diseases s Value
Cancer 360
Cervicitis 290
Cold 290
Cough 290
� Can tolerate adversaries’ extra knowledge
� Suppose an adversary knows a patient is male and he saw s = 360
Patient
Has
Server
Selects
Cancer Cervicitis
Cancer Cold
Cancer Cough
Cold Cancer
Cough Cancer
The adversary
now can only
be 60%, instead
of 100%, sure
that patient has
Cancer.
� Cost is not necessarily worse
� In this example, these two methods actually lead to exactly the same expected padding and processing costs
7/21/2014
25
Analysis
49
� Scenario:
� Algorithm: randomness drawn from uniform distribution
� Data: action-sequence and flow-vector are of length one
� Analysis of privacy preservation:
� Lemma 6.1
� Analysis of costs:
� Lemma 6.2, Lemma 6.3
Model, Scheme and Experiment
50
� Model:
� component, privacy, method, cost
� Scheme:
� Main idea:
� Server randomly selects members to form the group. � Different choices of random distribution lead to different algorithms.
� Two instantiations of scheme:
� Bounded uniform distribution; Normal distribution� Computation complexity: O(k)
� Experiment:
� Data from real-world web applications
� Low overheads and high uncertainty
7/21/2014
26
51
Privacy Properties
� k-Indistinguishability:
� For any flow-vector, at least k different actions can trigger it.
� Given vector-action set VA, padding algorithm M, range Range(M,VA)
� To response a user input, server randomly selects members to form the group. � Different choices of random distribution lead to different algorithms.
� Goal:
� The privacy properties need to be ensured.� The costs of achieving such privacy protection should be minimized.