Top Banner
Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic
17

Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Technical Advisor: Dr. Lidror Troyansky

Presents:

Academic Advisor: Dr. Yuval Elovic

Page 2: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

• As the world gets computerized and connected, organizations are getting more and more exposed to data leaks (both malicious and innocent).

• 70% of the network traffic is occupied by P2P!BitTorrent, eMule, FreeNet, Gnutella…File Transferors can deliberately or unintentionally distribute sensitive information in seconds to all of the world!

• Examples:– Israeli Air Force lieutenant colonel shared via P2P his laptop and

revealed confidential documents of the Israeli Air Force and got suspended from his office.

– Israeli Police of Eilat’s chief of Intelligence also shared a secret police plan with all of the world and risked many policemen lives…

Page 3: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.
Page 4: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Nothing!!!!

A Google search for the terms “P2P networks

Information leaks” results with just 148 pages!!!

After checking the first 50 for the relevance we got tired…

As a world leader in the ILP (Information Leaks Prevention)

PortAuthority© Technologies addressed this problem.

The research will be done using “P2P Inspector Gadget”

system.

Page 5: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Gnutella network

Computer A:Sharing non-confidential files

Laptop B:Containing an organization

confidential file

PDA C:Searches and downloads

organizations confidential file

Router

RouterRouter

Organization FirewallP2P Inspector Gadget

Client Organization

Router

Page 6: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

• Develop a system which will:– Connect to P2P networks and perform smart

search and download suspicious files while avoiding P2P anti-bots algorithms.

– Analyze the files (PDFs, DOCs, TXTs, source codes and other types) using smart Machine Learning, industry’s most advanced algorithms and user feedback mechanism with very few false-positives.

– produce history and statistics such as IPGeoLocation and file information, stored in a database.

– Enable the research of information leaks in P2P networks.

Page 7: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Gnutella NetworkIGDB

P2PIG User

IGtellaHandlerWritten in Java

Local File System

Gnutella network connection

IGDBHandler

IGConfClassifierWritten with Python

P2PInspectorGadgetGUI controller, written with Java

SWT

JEP

IGFileConverterIGStatisticsHandler

Page 8: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Downloaded

Not in System

Converted

Analyzed

FeedBacked/Learned

Search and Download file

Convert to .txt

Analyze File

FeedBack file

System ReInitialzed

• The file is in the P2P network and we have no information about it.

• The file is found by the system’s search engine and is fully downloaded.

• In this stage the file is first saved to the system’s database.

• The file is converted to text format as a preliminary action before it is analyzed by the system.

• The system currently works with all text formats and binary file types such as PDF, Word and PowerPoint .

• The file is analyzed by the system and its confidential probability is determined.

• The user is able to view the file’s content and give feedback to the system.

• In this stage the system adds the file to its database and to its probability hash tables.

• The system is reinitialized.

Page 9: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

• The problem of analyzing the files for confidential information is a part of the Categorization Problem Domain.

• In our case, there are two, well defined sets of documents (Confidential and Non-Confidential).

• There are many kinds of Algorithms for Categorization problems, after a research in the area and a warm recommendation from our professional advisor we chose the usage of an algorithm based on the Bayes Theorem, Conditional Probabilities (with some improvements).

• The usage of Bayes Theorem is very common in the problem of SPAM filtering (which resemblance to our problem).

Page 10: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

• The Algorithm works in two Phases:– First Phase: Learning – Building the Probabilities.

At first, the Algorithms is given two Training Sets, a Set of Confidential files and a Set of Non-Confidential files.Using Bayes Conditional Probability formula, the Probability of each of the terms in the files is saved in a dedicated Data-Structure.

– Second Phase: Analyzing - Combining.In the second phase each of the terms in the analyzed files gets its probability (computed in the learning phase).The Algorithm now tries to Combine the probabilities of all of the most frequent terms.We are using the Robinson-Fisher Combiner which improves greatly the Algorithm accuracy and reduces significantly the number of false-positives.

j jj

iii )(A)(B|A

)(A)(B|A|B)(A

PrPr

PrPrPr

2/12

2/22

2

exp

)2/(

)2/(),;(

vxvv

xv

vvxf

Page 11: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.
Page 12: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Option to force connection to a specific Ultra-peer

Connecting to several Ultra-peers simultaneously

Status bar shows the current status of the system and displays help messages.

Page 13: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

Insert the keywordsWhen pressing the system generates several search queries based on the words and file types.

The user can choose which files to download, or configure the system to download all files.

When the file starts to download, the system starts to save information about this file (IP sources, number of users currently hold the file and more).

Page 14: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

The user can view the downloading progress.The user can cancel any download at anytime.The user can send a downloaded file to be analyzed.

Page 15: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.
Page 16: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.

• Remaining tasks:– Statistics gathering.– Improve smart search and filtering.– Add more GUI functionality.– Conduct official algorithm test and document it.– ARD, Test Document, and more.

• Start date: Oct’ 2005.• Estimated End date: Aug’ 2006.• Over 15,500 lines of code and still counting…

– More than 1267 python lines.

• Over 800 hours of work per man.• 18 pizza platter

Page 17: Technical Advisor: Dr. Lidror Troyansky Presents: Academic Advisor: Dr. Yuval Elovic.