Top Banner
Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z http://ter.ps/ 759d https://www.facebook.com/SDSAtUMD
39

Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Dec 26, 2015

Download

Documents

Arlene Richards
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Security Data Science (SDS)

Prof. Tudor DumitrașAssistant Professor, ECEUniversity of Maryland, College Park

ENEE 759D | ENEE 459D | CMSC 858Z

http://ter.ps/759d

https://www.facebook.com/SDSAtUMD

Page 2: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Introducing Your Instructor

2

Tudor DumitrașOffice: AVW 3425Email: [email protected] Website: http://ter.ps/759d Office Hours: Mon 2-3 pm

Page 3: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

My Background• Ph.D. at Carnegie Mellon University

– Research in distributed systems and fault-tolerant middleware

• Worked at Symantec Research Labs– Built WINE platform for Big Data experiments in security

– WINE currently used by academic researchers and Symantec engineers

• Joined UMD faculty

• Research and teaching on applied security and systems– Focus on solving security problems with data analysis techniques

3

WINE

Page 4: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

SDS In A Nutshell• Course objectives

– Ability to understand and interpret scholarly publications, to explain their key ideas, and to provide constructive feedback

– Ability to apply some of these ideas in practice

• Topics

• Grading– 50% paper reviews and class participation

– 50% projects

Vulnerabilities and exploits Spam infrastructuresFailures of cryptosystems Pay per installInternet worms Attacks against physical infrastructureDenial of service Targeted attacksBotnets Economic implications of cybercrime

4

Page 5: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

We Are Swimming in Data• Data created/reproduced in 2010: 1,200 exabytes• Data collected to find the Higgs boson: 1 gigabyte / s• Yahoo: 200 petabytes across 20 clusters

• Security: – Global spam in 2011: 62 billion / day

– Malware variants created in 2011: 403 million

5

Page 6: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Why So Much Data?• We can store it

– 6¢ / GB

– 29¢ / GB (SAS HDD)

• We can generate it– Most data is machine-generated

– Most malware samples are variants of other malware, generated automatically (repacking, obfuscation)

What to do with all this data? 6

Page 7: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Three Stories about Data

7

Page 8: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

WHAT QUESTIONS TO ASK ON A FIRST DATE?The Power of Big Data

ONE

8

Page 9: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

If You Want to Know …Do my date and I have long-term potential?

9

Page 10: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

If You Want to Know …Do my date and I have long-term potential?

Q Do you like horror movies?

Q Have you ever traveled around another country alone?

Q Wouldn't it be fun to chuck it all and go live on a sailboat?

Likelihood ofcoincidence

275,000 user submitted questions34,260 real world couples

3.7×

10

DataPsychology

… ask:

Top 3 user rated questions, about:• God• Sex • Smoking

Page 11: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

11

Source: CNN Money

• eHarmony– Analyzes hundreds of behavioral variables, most collected automatically

– CTO: former search engineer at Yahoo!

• OkCupid We do math to get you dates

– Founded by Harvardmath & CS majors

• PlentyOfFishBuilding this matching system was harder than [being] cited in the paper that won the Fields Medal

Online Dating and Big Data

Page 12: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Early 1900s: Most Factories Had Private Generators

12

Source: Nicholas Carr

Electricity was critical for business, but not widely available

Page 13: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

13

Source: OkCupid

Is he an engineer?

Does she dateengineers?

Data analytics provide remarkable insight

Applications in many disciplines

Page 14: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

What Is Data Science?

• Also known as ……Big Data analytics

…Machine intelligence

…Data-intensive computing

…Data wrangling

…Data munging

…Data jujitsu

14

Source: Drew Conway

Page 15: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

TWOIMPROVING MACHINE TRANSLATIONThe Unreasonable Effectiveness of Data

15

Page 16: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

2005 NIST Machine Translation Competition

• Google’s first entry– None of the engineers spoke Arabic

• Simple statistical approach

• Trained using United Nations documents– 200 million translated words

– 1 trillion monolingual words

16

English-Arabic competition

Page 17: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

17

For many hard problems there appears to be a threshold of sufficient data A. Halevy, et al., CACM 2009.

Page 18: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

What is Security Data Science?

• Also known as …… Security analytics

… Surveillance analytics

• Applying data science methods to security problems

18

Page 19: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Security Principles in 60 Seconds [J. Saltzer & M. Schroeder, SOSP 1973]

• Economy of mechanism: Keep the protection mechanism as simple and small as possible

• Fail-safe defaults: Base access decisions on permission rather than exclusion

• Complete mediation: Check every access to every object• Open design: Do not keep the design secret• Separation of privilege: Require two keys to unlock, not one• Least privilege: Grant every program/user the least set of

privileges necessary to complete the job• Least common mechanism: Minimize the amount of mechanism

common to more than one user and depended on by all users• Psychological acceptability: Design interfaces for ease of use

19

Page 20: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Security in Practice(Source: C. Nachenberg, Symantec)

• 1986: Simple computer viruses– Defense: anti-virus

• 1990: Polymorphic viruses (decryption logic + encrypted malicious code)

– Defense: “universal” decoder, emulation

• 1995: Macro viruses– Defense: AV vendor cooperation, digital signatures for macros

• 1999: Worms– Defense: Vulnerability-specific signatures

• 2004: Web-based malware– Defense: behavior blocking

• 2006: Auto-generated malware – Defense: reputation based security

• 2010 (but probably earlier): Targeted attacks (physical infrastructure, 0-day, etc.)

– Defense: ??20

Page 21: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

THREE

21

UNDERSTANDING ZERO-DAY ATTACKSThe Need for Security Data Science

Page 22: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Zero-Day Attacks: Recent Examples

22

2009: Operation Auroraagainst Google

2010: Stuxnet

2011: Attack against RSA

Zero-day attack = cyber attack exploiting a software vulnerability before the public disclosure of the vulnerability

Page 23: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Price of Zero-Day Exploits on the Black Market

23

The Economist, March 2013

Page 24: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

The Elderwood Project

24

Group with “seemingly unlimited” supply of zero-day exploits(Source: Symantec)

Page 25: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Zero-Day Attacks: Open Questions

Decade-long open questions• How common are zero-day attacks?• How long can they remain undiscovered?• What happens after disclosure?

Creation

Vulnerabilitytimeline

[Arbaugh 2000, Frei 2008, McQueen 2009, Shahzad 2012]

Prior work

Zero-day attack

Vulnerability disclosed(“day zero”)

Exploit used in attacks

Security patch released

All hosts patched

25

Page 26: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Zero-Day Attacks: Open Questions (cont’d)

26

Creation Vulnerability disclosed(“day zero”)

Exploit used in attacks

Security patch released

All hosts patched

Decade-long questions: Why still open?• Rare events, hard to observe in small data sets• Need data analysis at scale

[weeks]

Before disclosure:Targeted attacks

After disclosure:Large-scale attacks

Rare events

Page 27: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Research in Security Data Science

27

Challenge 1: Find the needle in the haystack– Example: Identify and measure zero-day attacks

Challenge 2: Ensure generally applicable and repeatable results – The threat landscape changes frequently

Challenge 3: Deal with new and advanced threats– Skilled and persistent hackers can bypass firewalls, anti-virus, password-

protected systems, two-factor authentication, physical isolation

[…]

-100 -50 T0 50 100 150 (weeks)

Varia

nts

10

103

105

403 million new malware variants created in 2011

Targeted attacks before disclosure

Rare events

Your thesis topic goes here

Page 28: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

What is Security Data Science? (re-visited)• Systems knowledge: develop technologies needed to store and

process massive data sets• Statistics & machine learning knowledge: analyze the data and

extract information• Security knowledge: ask the right questions about cyber attacks

• Data scientists are in high demand in the cybersecurity industry

Booz Allen may be recruiting more [data scientists] than Google or Facebook

The Economist, June 2013

28

Page 29: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Course Content• Introduction to Security Data Science

• Hands-on emphasis – this is largely an unexplored research area– Team-based projects

– Reviews of scholarly publications

– No textbook

• Specific things you can expect to learn– Selected topics in security

– System skills: Experiment design, data analysis, scalability

– Team skills: Cooperating to achieve your team goals

– Speaking/writing skills: Presenting paper/project findings, providing constructive feedback

29

Page 30: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

This is an Advanced Course• You are responsible for holding up your end of the educational

bargain– I expect you to attend classes and to complete reading assignments

– I expect you to learn how to analyze data and to try things out for yourself

– I expect you to know how to find research literature on security topics• The required readings provide starting points

– I expect you to manage your time• In general there will be one written assignment due before each lecture

• Learning material in this course requires participation – This is not a sit-back-and-listen kind of course; class participation is required

for understanding the material and makes up a part of your grade!

• Different grading criteria for graduate and undergraduate students

Page 31: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Reading Assignments• Readings: 1-2 papers before each lecture

– Not light reading – some papers require several readings to understand

– For next time: C. Kanich et al., 'Spamalytics: An Empirical Analysis of Spam Marketing Conversion,'ACM CCS, 2008.

– Check course web page (still in flux) for next readings and links to papers

• Homeworks: review the papers you read using a defined template– Submit homework by email to [email protected]

• We might switch to a Web based submission system in the future

– Due at 6 pm the evening before class

– BibTeX template: Summary, Contributions, Weaknesses, Opinion (optional)

– I will provide feedback on some of your written critiques; no email means your writeup is satisfactory

• In-class discussion: stand up and talk about the papers– Volunteers are preferred

– Students randomly selected if no volunteers

31

Page 32: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Discuss …Do my date and I have long-term potential?

Q Do you like horror movies?

Q Have you ever traveled around another country alone?

Q Wouldn't it be fun to chuck it all and go live on a sailboat?

Likelihood ofcoincidence

275,000 user submitted questions34,260 real world couples

3.7×

32

DataPsychology

… ask:

Top 3 user rated questions, about:• God• Sex • Smoking

Page 33: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Course Projects• Pilot project: two-week individual projects

– Propose a security problem and a data set that you could analyze to solve it• Some ideas are available on the web page

– Conduct preliminary data analysis and write a report

– Propose projects by September 9th (soft deadline)

– Submit report by September 18th

• Group project: ten-week group project– Deeper investigation of promising approaches

– Submit written report and present findings during last week of class• 2 checkpoints along the way (schedule on the course web page)

– Form teams and propose projects by September 30 th

• Peer reviews: review at least 2 project reports from other students– Use skills learned from paper reviews

– Post project proposals, reports and reviews on Piazza

33

Page 34: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Pre-Requisite Knowledge• Good programming skills

– Knowledge of languages commonly used in data analysis, like Matlab or R, is a plus

– To brush up: ‘Data Analysis and Visualization with MATLAB for Beginners’ seminar, on September 12 at 5pm, Room 1110 Kim Engineering Building

• Ability to come up to speed on advanced security topics– Covered in the paper readings

– Basic knowledge of security (CMSC 414, ENEE 459C or equivalent) is a plus

• Ability to come up to speed on data analytics– Lectures provide light-duty tutorials, but you will need to pick up the

details as you go along 34

Page 35: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Policies• “Showing up is 80% of life” – Woody Allen

– Participation in in-class discussions is required for full credit

– You can get an “A” with a few missed assignments, but reserve these for emergencies (conference trips, waking up sick, etc.)

– Notify the instructor if you need to miss a class, and submit your homework on time

• UMD’s Code of Academic Integrity applies, modified as follows:– Complete your homework entirely on your own. After you hand in your

homework, you are welcome (and encouraged) to discuss it with others

– Discuss the problems and concepts involved in the project, but produce your own project implementation, report and presentation• Group projects are the result of team work

• See class web site for the official version 35

Page 36: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Classroom Protocol• Please arrive on time; lecture begins promptly

– I also promise to end on time

– Handouts, readings and homework templates posted class web page

• Questions are encouraged – If you don’t understand, ask; probably other students are struggling too

– Explain the content of your reading assignment, and the underlying reasoning, to the rest of the class

– Your reasons don't have to be "right” – you just have to be able to explain them

• There is no way to cover everything – If there is an interesting aspect that we do not cover in class, feel free to

incorporate that in your projects 36

Page 37: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Grading Criteria• Straight scale: A≥90; B≥80; C≥70; D<70

– 50% Written paper critique and class discussion• 24 assignments x 2 points each + 2 points for this lecture

– 50% Projects• 30 points for group project, 10 points for pilot project, 10 points for project reviews

– 10% Subjective evaluation

• Expectations– Graduate students: you can explain the contributions and weaknesses of the

papers you read

– Undergraduates: you demonstrate a general understanding of the papers

• Unsatisfactory participation means:– You did not read the papers

– You did not produce a working implementation for your project, or you do not understand how the implementation works

37

Page 38: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Review of Lecture• What did we learn?

– Data analytics provide real benefits

– Analyzing large data sets allows tackling long-standing hard problems

– Difference between security principles and security in practice

– Examples of security problems that require insights from large data sets

• I want to emphasize– This is systems course, not a not a pen-and-paper course

– You will be expected to build a real, working, data analysis tool

• What’s next?– Basic statistics and experimental design

– Pilot project: proposal, approach, expectations

• Deadline reminder – Post pilot project proposal on Piazza by Monday (soft deadline)

– First homework due on Sunday at 6 pm

38

Page 39: Security Data Science (SDS) Prof. Tudor Dumitraș Assistant Professor, ECE University of Maryland, College Park ENEE 759D | ENEE 459D | CMSC 858Z .

Dive Inhttp://ter.ps/759d

39