Leveling the playing field

Leveling the Playing Field

Aaron Bedra Chief Security Officer, Eligible @abedra keybase.io/abedra

http://keybase.io/abedra

Right now, your web applications are being

attacked

And it will happen again, and again, and again

As you grow so will the target on you

Keeping up with security is difficult

Actually, it’s unfair

Things you have to get right Things the attacker has to get right

Time the attacker has to focus on you Time you have to focus on the attacker

It’s asymmetric warfare

There’s no way to manually keep up

ManualAutomated

Intelligent

Scaling your defenses means strategic

automation

STOP!

Let’s talk about the problem we are solving

for a minute

Problems

• We don’t know what people are doing

• We don’t know how often they are doing it

• We don’t know how effective we are

• We are don’t have enough resources to keep up

Goals• Reduce noise

• Generate better signal

• Reduce operational overhead

• Build better business cases

• Spend energy on the really important stuff

Reducing Noise

It starts with really simple stuff

Tie up the loose ends with static configuration

Static configuration checklistAt least a B+ rating on SSL Labs*

Reject extensions that you don’t want to accept

Reject known bad user agents

Reject specific known bad actors

Custom error pages that fit your application

Basic secure headers

You’ll be surprised how well this works

It has a fringe benefit of creating better

awareness

You can feed this back to your intelligence

Reducing Operational Overhead

Dealing with malicious actors has to be easy

It shouldn’t require deploys, reloads, or any potential forward impact

Let’s talk about how to create something that will

help

Step 1Put everything in one place!

Centralization of events is critical

If you can’t see it, it didn’t happen

There are options

Log aggregation and a query engine

The query engine can serve as your discovery

agent

A nice first step

But it will eventually fall over

That’s when you reach for a messaging system

Log to topics in a queue

Create processors to understand events

Step 2Process Events

For every event type you will need to understand

how to process it

Structured logging can help, but it doesn’t fit

everywhere

The goal is to accept an event and return

consumable details

type logEntry struct { Address string Method string Uri string ResponseCode string }

func processEntry(entry string) logEntry { parts := strings.Split(entry, " ") event := logEntry{ Address: parts[0], Method: strings.Replace(parts[5], "\"", "", 1), Uri: parts[6], ResponseCode: parts[8], } return event; }

You will likely have multiple processors

Split topics by event type or application

Once you have the data accessible, figure out

what happened

Track everything!

• HTTP Method

• Time since last request/average requests per sec

• Failed responses

• Failure of intended action (e.g. login, add credit card, edit, etc)

• Anything noteworthy

type Actor struct { Methods map[string]int FailedLogins int FailedResponses map[string]int }

func updateEvents(event logEntry, counts *map[string]Actor) { counts[event.Address].Methods[event.Method] += 1 if event.ResponseCode != "200" || event.ResponseCode != "302" { counts[event.Address].FailedResponses[ResponseCode] += 1 } if event.Method == "POST" && event.ResponseCode == "200" { counts[event.Address].FailedLogins += 1 } }

Once you have things in one place, it’s all about counting

Simple counts with thresholds go a long way

Step 3Thresholds, Patterns, and Deviations

Exceeding a count is a signal that something

needs to be done

There are a lot of signals that could be malicious

You can start with simple thresholds

• Too many failed logins

• Too many bad response codes (4xx, 5xx)

• Request volume too high

These provide a lot of signal

But they don’t get you all the way there

There are patterns of behavior that signal

malicious intent

Example

10.20.253.8 - - [23/Apr/2013:14:20:21 +0000] "POST /login HTTP/1.1" 200 267"-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0" "77.77.165.233"

10.20.253.8 - - [23/Apr/2013:14:20:22 +0000] "POST /users/king-roland/credit_cards HTTP/1.1" 302 2085 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0" "77.77.165.233"



That was a carding attack

As you dig in, you will find many patterns like

these

But again it doesn’t cover everything

There will also be interesting deviations

5%5%4%

27% 59%

GET POST HEAD PUT DELETE

Deviations in normal flow are interesting but not necessarily malicious

You will have to build more intelligent processing to

understand them

Example

A password reset request comes from a new

location

Is it a harmless request or an account takeover?

Your processors will have to make complicated choices based on lots of information

Nailing deviation requires the largest amount of

effort

Step 4Act

Once you have enough information to make a decision, you must act

There are multiple ways to act

• Blacklist

• Whitelist

• Mark

• Do nothing

Blacklist and whitelist are pretty straight forward

Blacklist when thresholds are exceeded or

patterns/deviation fit

Whiltelist things you never want to be

blacklisted

Marking is more interesting

Marking allows you to tag actors as potentially

malicious

This allows you to dynamically modify your

responses

And choose how you react

“Of course machines can't think as people do. A machine is different from a person. Hence, they think differently.”

-- Alan Turing, The Imitation Game

You can often render bots useless with small

changes

Which exposes them as bots

And gives you the confidence you need to

blacklist them

Marking also helps you lower the rate of false

positives

Step 5Visualize

Visualization is incredibly helpful

You need a window into your automation

Spending a few minutes a day looking at what

happened is vital

You can pretty easily catch bugs this way

Architecture & Peformance

There are three main ideas

• The thing that acts on actors

• The shared cache

• The event processors

Acting on actors should be fast

Fast in a web request is single digit milliseconds

You can choose to embed this in your applications

or your web servers

Data locality is important

It usually involves replicating the global cache

to each decision point

The cache should hold everything needed to act

on actors

The web server asks the cache what to do

The event processors work out of band

Their sole purpose is to populate the cache

Processors tend to be more custom

But the cache and the acting logic is common

github.com/repsheet

http://github.com/repsheet

Pitfalls

Things to consider• False positives

• Decision latency

• Incorrect modeling

• Bad data

• Monitoring

There’s a good chance you will block incorrectly

Make use of whitelisting

Mobile carriers will be a problem

So will NATed IP addresses

Time to decision should be monitored

Create a solid regression suite

Run all your models through it when you make

even a single change

Understand where bad data can impact you

Build tolerance of bad data so you don’t make

incorrect decisions

Monitor everything!

This type of automation deserves every monitor and metric you can get

Questions?

Leveling the playing field

Technology