Top Banner
INHA UNIVERSITY INCHEON, KOREA http:// eslab.inha.ac.kr/ ALPACAS: A Large-scale Privacy-aware Collaborative Anti- spam System Z. Zhong, L. Ramaswamy and K. Li, IEEE, INFOCOM 2008 Intelligent E-Commerce System Lab. Aettie, Ji
26

INHA UNIVERSITY INCHEON, KOREA ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

ALPACAS: A Large-scale Privacy-aware Collabo-

rative Anti-spam Sys-tem

Z. Zhong, L. Ramaswamy and K. Li, IEEE, INFOCOM 2008

Intelligent E-Commerce System Lab.

Aettie, Ji

Page 2: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 2 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

OUTLINE

INTORDUCTIONPRIOR WORKTHE ALPACAS ANTI-SPAM FRAMEWORK

Feature-Preserving Fingerprint Privacy-Preserving Collaboration Protocol System Structure

EXPERIMENTS & RESULTSDISSCUSIONCONCLUSION

Page 3: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 3 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

INRTODUCTION

Motivations Recent spam attack expose strong challenges to

statistical filters, which have been popular. Collaborative spam filtering has a natural de-

fense paradigm, wherein information of spam is shared, since the spammers sends similar emails to several target receivers.

However, privacy of participating collaboration is an important challenge.

For protecting privacy, digest approaches have been proposed but they are not sufficient.

Page 4: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 4 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

INRTODUCTION

Contributions ALPACAS: Large-scale Privacy-Aware Collaborative

Anti-spam System.• A resilient fingerprint generation technique, “fea-

ture-preserving transformation”, is proposed.• A privacy-preserving protocol is designed to con-

trol the amount of information to be shared. The experimental results demonstrate that the AL-

PACAS outperforms traditional stand-alone statisti-cal filters.

Page 5: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 5 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

PRIOR WORK

Drawbacks of the existing collaborative anti-spam schemes (using DCC). How it works?

• Participating servers in DCC share the email’s digests computed through hash functions such as MD5.

• DCC system replies back with the recent statistics about the digests.

Drawbacks• Hashing schemes like MD5 generate complete differ-

ent hash value even if a single byte is altered.• The DCC scheme does not completely address the

privacy issue. inference-based privacy breaches.

Page 6: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 6 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

THE ALPACAS ANTI-SPAM FRAMEWORK(1/2)

Challenges To protect email privacy,

• The messages have to be encrypted.• It should retain important feature of the messages.

To avoid inference-based privacy beaches,• It is necessary to minimize the information revealed

during the collaboration.

ALPACAS framework components Feature-preserving fingerprint Privacy-preserving protocol DHT-based architecture

Page 7: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 7 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

THE ALPACAS ANTI-SPAM FRAMEWORK(2/2)

Fig. 1: ALPACAS System Overview

(a) ALPACAS Network (b) Internal mechanism of EA4

Page 8: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 8 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Feature-Preserving Finger-print(1/4)

Shingle-based Message Transformation Shingle: If two documents vary by a small amount

their shingle sets also differ by a small amount.

THE ALPACAS ANTI-SPAM FRAMEWORK

Fig. 2: ALPACAS Feature Sets, DCC and Razor Digests for 2 spam emails (Texts in bold font indicate differences)

Page 9: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 9 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Feature-Preserving Finger-print(2/4)

Shingle-based Message Transformation Generation of transformed feature set of message

Ma(TFSet(Ma))• Computing Rabin fingerprint[11] of consecutive to-

kens in sliding window of length W• Each fingerprint is in the range of (0, 2K – 1)• For a message with X tokens, X – W + 1 fingerprints

are obtained.• The smallest Y are retained.• The similarity between Ma and Mb can be calculated

as

THE ALPACAS ANTI-SPAM FRAMEWORK

)()(

)()(

),(),(

),(),(

bYWaYW

bYWaYW

MTFSetMTFSet

MTFSetMTFSet

Page 10: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 10 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Feature-Preserving Finger-print(3/4)

Shingle-based Message Transformation In consideration of the privacy preservation,

• Rabin fingerprint algorithm is one-way hash function such that it is infeasible to reverse.

• However, it is possible to infer a word or a group of words from an individual feature value.

THE ALPACAS ANTI-SPAM FRAMEWORK

Page 11: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 11 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Feature-Preserving Finger-print(4/4)

Term-level Privacy Preservation Controlled shuffling

• The email text is divided into consecutive h chucks of z consecutive token.

• The tokens in each chuck are shuffled in a pre-de-fined manner, remaining the ordering of chucks.

• Each chuck is divided into y sub-chuck. (y is a factor of z.)

• The tokens in chuck CKh are shuffled such that the to-ken at rth position in the sth sub-chuck is moved to (r ⅹ y + s)th position in CKh.

• If two messages contain an identical term, by shuf-fling the term, the feature set could be different.

THE ALPACAS ANTI-SPAM FRAMEWORK

Page 12: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 12 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Privacy-Preserving Collab-oration Protocol (1/3)

Spam/ham dichotomy

Protocol EAj receives Ma, then computes TFSet(Ma). EAj sends query to other agent with subset of TFSet(Ma). EAk receives the query, then check its spam/ham KB. For each matching entry in spam KB, EAk sends back the

complete transformed feature set. For each matching entry in ham KB, EAk sends back a

small, randomly selected part of the transformed feature set.

THE ALPACAS ANTI-SPAM FRAMEWORK

Revealing the contents of a spam email does not affect the privacy, whereas revealing information about a ham email constitutes a privacy breach.

Page 13: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 13 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Privacy-Preserving Collab-oration Protocol (2/3)

THE ALPACAS ANTI-SPAM FRAMEWORK

Fig. 3: ALPACAS Protocol: Query and Re-sponse

Page 14: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 14 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Privacy-Preserving Collab-oration Protocol (3/3)

Protocol(cont’) EAj now computes the ratio of MaxSpamOvlp(Ma) to Max-

HamOvlp(Ma) and decides whether the Ma is spam or ham.

If the score is greater than a threshold λ, Ma is classified spam, otherwise ham.

THE ALPACAS ANTI-SPAM FRAMEWORK

2

)()(1 aa MMaxHamOvlpMpMaxSpamOvlScore

Page 15: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 15 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

System Structure (1/2)

Design principle

DHT-based Architecture EAj is responsible for maintaining information about

all the emails whose TFSet as one feature element in the range of allocated to it.

THE ALPACAS ANTI-SPAM FRAMEWORK

A query should be sent to an email agent only if it has a reasonable chance of containing information about the email that is being verified. Contacting any other email agent not only introduces inefficiencies but also leads to unnecessary exposure of data.

Page 16: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 16 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

System Structure (2/2)

DHT-based Architecture (cont’) N email agent. All feature elements lie within (0, 2K-1). The range (0, 2K-1) is divided into N overlapping re-

gion as {(MinF0,MaxF0), (MinF1,MaxF1), . . . , (MinFN-1, 2K−1)}.

(MinFj, MaxFj) denotes the sub-range allocated to EAj.

• For spam, EAj stores the entire TFSet.• For ham, EAj stores the subset of TFSet.

If MinFj ≤ Ft ≤ MaxFj, then EAj is called rendezvous agent of feature element Ft.

THE ALPACAS ANTI-SPAM FRAMEWORK

Page 17: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 17 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

EXPERIMENTS & RESULTS

Benchmarked algorithm Bogofilter based on Bayesian filtering

• Calculating a spamminess score of the email. DCC based on simple hash-based collaborative fil-

tering • Counting the number of times the hash value of the

email has been reported as a spam.

Page 18: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 18 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Experimental Setup

Dataset TREC email corpus & SpamAssassin email corpus TREC corpus is classified into 67 email sets accord-

ing to their target address (67 agents). Half of each email set including ham and spam is

used for training and the remainder for testing. Each individual has a pre-classified email

corpus(SpamAssassin) a the initial knowledgebase.

EXPERIMENTS & RESULTS

Page 19: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 19 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Performance Metrics

Spam filtering accuracy A ham email that is classified a spam by the filter-

ing scheme is termed as false positive.

Privacy of collaborative anti-spam system Message-level privacy breach percentage is defined

as the ratio number of test ham messages suffering privacy compromises to the total number of test ham messages.

Communication overhead of the system Per-test communication cost metric is defined as

the total number of messages circulated in the sys-tem during the entire experiment.

EXPERIMENTS & RESULTS

Page 20: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 20 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

SPAM Filtering Effective-ness

EXPERIMENTS & RESULTS

Fig. 4: False Positive Percentages of AL-PACAS, BogoFilter and DCC

Fig. 5: False Negative Percentages of AL-PACAS, BogoFilter and DCC

Fig. 6: System Overall Accuracy (DCC is not displayed because its FP is 0)

Page 21: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 21 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Robustness Against At-tacks

EXPERIMENTS & RESULTS

Fig. 7: System Robustness AgainstGood-Word Attacks

Fig. 8: System Robustness againstCharacter Replacement At-tacks

Page 22: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 22 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Privacy Awareness

EXPERIMENTS & RESULTS

Fig. 9: Privacy Breach in ALPACAS (Varying Number of Agents)

Page 23: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 23 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Communication Oveheads

EXPERIMENTS & RESULTS

Fig. 10: Communication Overheads of the ALPACAS and the DCC systems

Page 24: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 24 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

Massage Transformation Algorithm Analysis

EXPERIMENTS & RESULTS

Fig. 11: False Positive of AL-PACAS for Various Parameter Setup

Fig. 12: False Negative of ALPACAS for Various Pa-rameter Setup

Fig. 13: Effectiveness of Controlled Shuffling Strategy

Page 25: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 25 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

DISCUSSION

Approaches like statistical filtering combined the feature preservation transformation scheme.

Applying dynamic nature of email agent to the system using replication and finger-table based routing.

Approaches for preventing malicious email agents.

Page 26: INHA UNIVERSITY INCHEON, KOREA  ALPACAS: A Large-scale Privacy-aware Collaborative Anti-spam System Z. Zhong, L. Ramaswamy and.

- 26 -INHA UNIVERSITYINCHEON, KOREA

http://eslab.inha.ac.kr/

CONCLUSION

In this paper, the design and evaluation of ALPACAS is presented.

The two novel features: A feature preserving transformation technique A privacy-preserving protocol

Our initial experiments show that ALPACAS Is very effective in filtering spam. Has high resilience towards various attacks. Has strong privacy protection to the participating

entities.