Top Banner
Lexical Feature Based Phishing URL Detection Using Online Learning Reporter: Jing Chiu Advisor: Yuh-Jye Lee Email: [email protected] 2011/3/17 Data Mining and Machine Learning Lab. 1
18

Lexical Feature Based Phishing URL Detection Using Online Learning

Jan 03, 2016

Download

Documents

tobias-landry

Lexical Feature Based Phishing URL Detection Using Online Learning. Reporter: Jing Chiu Advisor: Yuh-Jye Lee Email: [email protected]. Paper Information. Authors: Aaron Blum (University of Alabama, Birmingham) Brad Wardman (University of Alabama, Birmingham) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lexical Feature Based Phishing URL Detection  Using Online Learning

Lexical Feature Based Phishing URL Detection Using Online Learning

Reporter: Jing Chiu

Advisor: Yuh-Jye Lee

Email: [email protected]

2011/3/17 Data Mining and Machine Learning Lab. 1

Page 2: Lexical Feature Based Phishing URL Detection  Using Online Learning

Paper Information

Authors: Aaron Blum

(University of Alabama, Birmingham) Brad Wardman

(University of Alabama, Birmingham) Thamar Solorio

(University of Alabama, Birmingham) Source:

ACM Artificial Intelligence Security Workshop 3rd, 2010

2011/3/17 Data Mining and Machine Learning Lab. 2

Page 3: Lexical Feature Based Phishing URL Detection  Using Online Learning

Outline

Introduction Related Work Approach Data Evaluation Conclusion

2011/3/17 Data Mining and Machine Learning Lab. 3

Page 4: Lexical Feature Based Phishing URL Detection  Using Online Learning

Introduction Phishing

A cybercrime comes from spammed emails and fraudulent websites

Entice victims to provide sensitive information The information is used to steal identities or gain access to money

Characteristics Highly dynamic environment

Model need to be updated frequently New ideas

Combine online learning with content-inspection based approach Model trained only by largely lexical features

(without host based features) Provide results to show the performance of URL inspection based

detection is as well as content inspection based detection

2011/3/17 Data Mining and Machine Learning Lab. 4

Page 5: Lexical Feature Based Phishing URL Detection  Using Online Learning

Related Work Content based Phishing URL Detection

Use the similarity between the content files to detect phishing websites

Purely URL based Malicious URL Detection Use host information and URL lexical features with

online learning algorithms PhishNet

Extend the usability of blacklists Domain Blacklisting

Expand blacklist by the DNS zone file data and WHOIS information

2011/3/17 Data Mining and Machine Learning Lab. 5

Page 6: Lexical Feature Based Phishing URL Detection  Using Online Learning

Approach

Feature Extraction Delimiters: “/”, ”?”, ”.”, ”=” and “_” Bigram combination Lexical feature groups

Learning algorithm Confident Weighted Algorithm

Updating model by different weights of the features’ occurrence

2011/3/17 Data Mining and Machine Learning Lab. 6

Page 7: Lexical Feature Based Phishing URL Detection  Using Online Learning

Approach (cont.) MD5 Matching

Use files’ MD5 checksum to check files similarity

Easy to evade ( by varying the content) Examples

Deep MD5 Matching Download all the associated content files Compare the similarity between two websites’

content files by Kulczynski 2 coefficient

2011/3/17 Data Mining and Machine Learning Lab. 7

Page 8: Lexical Feature Based Phishing URL Detection  Using Online Learning

Data Data Source

UAB Phishing Data Mine Two and half a year collecting time Benigns may look “phishy” (e.g.) 9,506unique domains 25,203 URLs (6,114 malicious)

Cyveillance 18,990 unique domains 34,234 URLs (all malicious)

All feeds are fully de-duplicated Datasets

UAB Feeds Cyveillance full Cyveillance abridged Mixed

2011/3/17 Data Mining and Machine Learning Lab. 8

Page 9: Lexical Feature Based Phishing URL Detection  Using Online Learning

Data (cont.)

Percentage of total URLs vs. Individual Domains

2011/3/17 Data Mining and Machine Learning Lab. 9

Page 10: Lexical Feature Based Phishing URL Detection  Using Online Learning

Evaluation

Experiment setting Training and testing set was conducted on daily

batches Training initially conducted on UAB data Model will be updated by a daily URL

blacklist/whitelist feed False positive and false negative error rates

were computed every prediction

2011/3/17 Data Mining and Machine Learning Lab. 10

Page 11: Lexical Feature Based Phishing URL Detection  Using Online Learning

Evaluation(cont.)

2011/3/17 Data Mining and Machine Learning Lab. 11

Page 12: Lexical Feature Based Phishing URL Detection  Using Online Learning

Evaluation(cont.)

2011/3/17 Data Mining and Machine Learning Lab. 12

Page 13: Lexical Feature Based Phishing URL Detection  Using Online Learning

Evaluation(cont.)

2011/3/17 Data Mining and Machine Learning Lab. 13

Page 14: Lexical Feature Based Phishing URL Detection  Using Online Learning

Conclusion

Lexical features based learning provide robust performance by CW algorithm

Quality diverse training data could approve a accuracy higher than 97%

For proposed system Training data could be collected from any

blacklists Easy implement and robust performance

2011/3/17 Data Mining and Machine Learning Lab. 14

Page 15: Lexical Feature Based Phishing URL Detection  Using Online Learning

Thanks for your attention

Q&A?

2011/3/17 Data Mining and Machine Learning Lab. 15

Page 16: Lexical Feature Based Phishing URL Detection  Using Online Learning

Lexical Feature Group

2011/3/17 Data Mining and Machine Learning Lab. 16

Page 17: Lexical Feature Based Phishing URL Detection  Using Online Learning

URLs including the recipient’s email

2011/3/17 Data Mining and Machine Learning Lab. 17

Page 18: Lexical Feature Based Phishing URL Detection  Using Online Learning

Data in UAB Phishing Data Mine

2011/3/17 Data Mining and Machine Learning Lab. 18