Top Banner
1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang , Jason Hong, and Lorrie Cranor
23

1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

Dec 16, 2015

Download

Documents

Sherilyn Barker
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

1

CANTINA : A Content-Based Approach to Detecting Phishing Web Sites

WWW 2007

2008.09.09

Yue Zhang , Jason Hong, and Lorrie Cranor

Page 2: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Agenda

Phishing Attacks Motivation & Goal Relative Work CANTINA Evaluation Conclusion

2

Page 3: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Phishing Attacks(1/2)

The Act of stealing personal information via the internet for the purpose of committing financial fraud Create a faked site similar to original sites like bank Send to users using variable methods

• Spam e-mail, XSS vulnerabilities, Malware … Technical issues

URL Obfuscation• Similar domain, Encoding URL…

DNS hijacking• Modifying hosts file, DNS server setting…

Malware• BHO(Browser Helper Object), Browser Toolbar, Key logger…

3

Page 4: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Phishing Attacks(2/2)

Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages Similar to original web site

Often contain brand names and other terms that are common on a given web page Owner’s brands

4

Page 5: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Motivation & Goal

Phishing is a rapidly growing problem with 9,255 unique phishing sites reported in 2006

84 Anti-phishing toolbars Low accuracies There is a strong need for better automated detection

algorithms

A novel content-based approach for detecting phishing web sites. Accomplish the accuracy more than existing approach

5

Page 6: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Related work(1/3)

Anti-Phishing has four categories Why People Fall for Phishing Attacks?

• Have examined the reasons that people fall for phish-ing attacks

Educating people about Phishing Attacks• Focused on online training materials, testing and sit-

uated learning Anti-Phishing User Interface

• Focused on the development of better user interface for anti-phishing tools

Automated Detection of Phishing

6

Page 7: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Relative work(2/3)

Anti-Phishing user interface Toolbar-based approach

Browser extensions• Dynamic Security Skins• Web Wallet

7

Page 8: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Relative Work(3/3)

Automated detection of phishing To use heuristics to judge whether a page has phishing

characteristics.• Host name, domain name, URLs,…

To use a blacklist that lists reported phishing URLs

8

Page 9: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

CANTINA | Basic Concept

Criminals often create phishing sites by copying and then modifying a legitimate site’s web pages Contain brand names and terms of legitimate pages

Robust Hyperlinks To find a broken links Add lexical signature to URLs

• If link doesn’t work, then feed signature to search engine• Ex. http://aaa.com/a.html?lexical-signature==“word1+word2+...+word5”

TF/IDF (Term frequency/Inverse document frequency) Frequency based algorithm. Basic algorithm for search engine

• comparing and classifying documents• A term has a high TF-IDF weight by having a high

term frequency in a given document

9

Page 10: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

CANTINA | Basic Concept

10

Web pageCalculate TF-IDF weight of each term

Take the five terms with highest TF-IDF weight

Search top file term(term1+term2..) using google

Compare the domain name with google search results

Phishing site : domain name of current page do not match the domain name of the N top search results (30)

Page 11: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

CANTINA | Basic Concept

eBay, user, sign, help, forgot

Faked Page

TF/IDF Top 5 :

Page 12: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

CANTINA | Basic Concept

eBay, user, sign, help, forgot

Real Page

TF/IDF Top 5 :

Page 13: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

CANTINA | Basic Concept

Page 14: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

CANTINA | Additional Solutions

Basic CANTINA has a number of false positive

Solutions Add the current domain name to the lexical signature

ZMP(Zero results Means Phishing)• Google returns zero search results

– Meaningless domain(e.g., “u-s-j.be”)

Larger set of heuristics based on related work• From existing approach (e.g., SpoofGuard, PILFER)• Age of Domain, Known Images, Suspicious URL,…

14

Page 15: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Evaluation | Effectiveness #1(1/2)

Four conditions Basic TF-IDF Basic TF-IDF + domain name Basic TF-IDF + ZMP Basic TF-IDF + domain + ZMP

100 phishing URLs and 100 legitimate URLs Phishing URLs : PhishTank.com Legitimate URLs : From previous study

15

Page 16: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Evaluation | Effectiveness #1(2/2)

16

Basic TF-IDF + ZMP + domain False positives a little high Final TF-IDF

Page 17: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Evaluation | Effectiveness #2(1/2)

Want to reduce false positives Combining several heuristics method

17

Page 18: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Evaluation | Effectiveness #2(2/2)

Determining the best weights for these heuristics is a typical classification problem. Use a simple forward linear model Used 100 phishing URLs, 100 legitimate to find weights

18

Page 19: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Evaluation | Effectiveness #3(1/2)

To evaluate the effectiveness of Final-TF-IDF, Final-TD-IDF+heuristics, SpoofGuard, and Netcraft SpoofGuard : the highest true positive rate

• Relies entirely on heuristics Netcraft : one of the best toolbars overall

• Uses a combination of heuristics and an extensive blacklist.

100 phishing URLs from PhishTank.com 100 legitimate URLs

35 sites often attacked (citibank. Papayl) 35 top pages from Alexa ( most popular sites) 30 random web pages from random.yahoo.com

19

Page 20: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Evaluation | Effectiveness #3(2/2)

20

Reduced false positives from 6% to 1% by com-bining Final-TF-IDF with simple heuristics But, true positive was decreased

Page 21: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Discussion

Limitations Does not apply to non-English web sites System Performance

• Depend on performance of Google search engine

Attacks by criminals use image instead of words Add invisible text Circumventing TF-IDF and PageRank

• Using “Google Bombs” Attempt a DoS attack on Google

21

Page 22: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST

Conclusion

CANTINA uses TF-IDF + search engines + heuristics to find phishing web sites 97% true positives with 6% false positives 89% true positives with 1% false positives

Shifts problem of identifying phishing sites to a search en-gine problem

22

Page 23: 1 CANTINA : A Content-Based Approach to Detecting Phishing Web Sites WWW 2007 2008.09.09 Yue Zhang, Jason Hong, and Lorrie Cranor.

CS710 | KAIST 23

Q&A