® © 2016 MapR Technologies 1 ® © 2016 MapR Technologies 1 © 2016 MapR Technologies ® Advanced Threat Detection on Streaming Data Carol McDonald, Solution Architect Strata + Hadoop World March 2016
®© 2016 MapR Technologies 1 ®© 2016 MapR Technologies 1 © 2016 MapR Technologies
®
Advanced Threat Detection on Streaming Data Carol McDonald, Solution Architect Strata + Hadoop World March 2016
®© 2016 MapR Technologies 2 ®© 2016 MapR Technologies 2
Meeting Advanced Threats Head On
• Solutionary: Managed Security Services Provider – Provides Threat Intelligence as a
Service
®© 2016 MapR Technologies 3 ®© 2016 MapR Technologies 3
Real-time Detection of Advanced Threats
• Objective: – Provide real time threat Intelligence on
trillions of messages per year – Store and process lots of unstructured
security data – Combine machine learning and predictive
analytics
®© 2016 MapR Technologies 4 ®© 2016 MapR Technologies 4
Event-based Detection of Advanced Threats
Threat Alerts Store and
Process Unstructured
Data
Anomaly Detection
Real-time Threat Intelligence
Predictive Analytics Machine Learning
®© 2016 MapR Technologies 5 ®© 2016 MapR Technologies 5
Meeting Advanced Threats Head On
• Challenges: – Expanding Data storage
in RDBMS expensive $$ – Could not process
unstructured data at scale
Scaling Unstructured Data Processing
Challenges
RDBMS Economics Unstructured Data
®© 2016 MapR Technologies 6 ®© 2016 MapR Technologies 6
Serve Data Store Data Collect Data
What Did The Solution Need to do ?
Process Data Data Sources
? ? ? ?
Security Feeds
HTTP
Syslog
Firewall
Other
®© 2016 MapR Technologies 7 ®© 2016 MapR Technologies 7
How to do this with High Performance at Scale? • Parallel , Partitioned = fast , scalable
®© 2016 MapR Technologies 8 ®© 2016 MapR Technologies 8
Data Ingest
Solution: Stream Processing Architecture
Topics
Sources
Security Feeds
HTTP
Syslog
Firewall
Other
Data Ingest: • Kafka or MapR Streams: fast
distributed messaging
Topics
Topics
Topics
®© 2016 MapR Technologies 9 ®© 2016 MapR Technologies 9
Fast Distributed Messaging
• Topics organize events into categories
• Topics decouple producers from Consumers
®© 2016 MapR Technologies 10 ®© 2016 MapR Technologies 10
Fast Distributed Messaging
• Topics are partitioned for fast throughput and scalability
®© 2016 MapR Technologies 11 ®© 2016 MapR Technologies 11
How to do this with High Performance at Scale? • Parallel , Partitioned:
– Messaging
®© 2016 MapR Technologies 12 ®© 2016 MapR Technologies 12
Data Ingest
Complex Event Processing with Storm and Esper Stream
Processing
Parser Bolt
Kafka Spout
Enrich Bolts
Esper Kakfa Bolt
Esper Spout
Topic
Alert Bolts
Cross topology correlation of events
• Stream Processing: – Storm: distributed real
time computation – Esper: Complex Event
Processing Topics
Topics
Topics
®© 2016 MapR Technologies 13 ®© 2016 MapR Technologies 13
Complex Event Processing with Esper
• Detect a related set or pattern of events within a time window
• Example Pattern Excess Login Failure: – Same user, same source login failure
SELECT * FROM Event(ip_src IS NOT NULL AND ec_activity=’Logon’ AND ec_outcome = ‘Failure’)
.std:groupwin(ip_src).win:time (300 sec) GROUP BY ip_src HAVING COUNT(*) = 10
®© 2016 MapR Technologies 14 ®© 2016 MapR Technologies 14
How to do this with High Performance at Scale? • Parallel , Partitioned:
– Processing
®© 2016 MapR Technologies 15 ®© 2016 MapR Technologies 15
Real-time Detection of Advanced Threats: Examples
Data transferred from critical database servers
Large traffic flows from a host to a given IP address
Employee accessing database servers at unusual hours
User logging in from two different countries within a short window
®© 2016 MapR Technologies 16 ®© 2016 MapR Technologies 16
Complex Event Processing with Storm and Esper
Cross-topology correlation of events
®© 2016 MapR Technologies 17 ®© 2016 MapR Technologies 17
NoSQL Storage
Solution: Stream Processing Architecture Stream
Processing
MapR-FS
MapR-DB
HDFS Bolt
Index Bolt
HBase Bolt
• NoSQL Storage – HBase: fast scalable storage and
caching – Elastic Search: Indexing for real-
time search analytics
®© 2016 MapR Technologies 18 ®© 2016 MapR Technologies 18
Scalability with HBase (MapR-DB)
Key colB colC
val val val
xxx val val Key colB col
C
val val val
xxx val val Key colB col
C
val val val
xxx val val
Storage Model RDBMS HBase
Normalized schema à Joins for queries can cause bottleneck
De-normalized schema à Data that is read together is stored together
®© 2016 MapR Technologies 19 ®© 2016 MapR Technologies 19
MapR-DB (HBase API) is Designed to Scale
Key Range
xxxx xxxx
Key Range
xxxx xxxx
Key Range
xxxx xxxx
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Key colB colC
val val val
xxx val val
Fast Reads and Writes by Key! Data is automatically partitioned by Key Range!
®© 2016 MapR Technologies 20 ®© 2016 MapR Technologies 20
How to do this with High Performance at Scale? • Parallel , Partitioned:
– Storage
®© 2016 MapR Technologies 21 ®© 2016 MapR Technologies 21
NoSQL Storage
Solution: Stream Processing Architecture
MapR-FS
MapR-DB
• Machine Learning – thread modeling – anomaly detection
• Security Analytics
Serve Data
®© 2016 MapR Technologies 22 ®© 2016 MapR Technologies 22
Data Driven Forensics Investigation
• What can the data tell us? – What happened within a time range?
– How did the threat get in?
– What are all the activities associated with a specific IP/user?
– How much data was affected?
– Has this occurred elsewhere in the past?
®© 2016 MapR Technologies 23 ®© 2016 MapR Technologies 23
Solution: Stream Processing Architecture
®© 2016 MapR Technologies 24 ®© 2016 MapR Technologies 24
Key to Real Time: Event-based Data Flows
Key to Scale = Parallel Partitioned: • Messaging • Processing • Storage
®© 2016 MapR Technologies 25 ®© 2016 MapR Technologies 25
Stream Processing
Building a Complete Data Architecture
Sources/Apps Bulk Processing
Web-Scale Storage MapR-FS MapR-DB MapR Streams
Event Streaming Database
®© 2016 MapR Technologies 26 ®© 2016 MapR Technologies 26
Key to Real Time: Convergence A
pps
High Availability Data Protection
Unified Security Real Time Multi-tenancy
Unified M
anagement &
Monitoring
Customer Experience Data Architecture Optimization
Security Investigation & Event Management
Operational Intelligence
Managed Services & Custom Apps
Event Streaming
Database
Storage
Converged Data Platform
®© 2016 MapR Technologies 27 ®© 2016 MapR Technologies 27
Why Hadoop for Security Analytics?
• Cost effective for storing and analyzing large volumes of data in real-time
• Provides search & query, machine learning for activity correlation and anomaly detection
• When it comes to Hadoop, select an enterprise distribution (e.g. MapR Converged Data Platform) so you can focus on your primary objective
®© 2016 MapR Technologies 28 ®© 2016 MapR Technologies 28
To Learn More: • http://learn.mapr.com/
®© 2016 MapR Technologies 29 ®© 2016 MapR Technologies 29
To Learn More: • Download example code – https://github.com/caroljmcdonald/mapr-streams-sparkstreaming-hbase
• Read explanation of example code – https://www.mapr.com/blog/spark-streaming-hbase
®© 2016 MapR Technologies 30 ®© 2016 MapR Technologies 30
Q & A
@mapr
https://www.mapr.com/blog/author/carol-mcdonald
Engage with us!
mapr-technologies