Jul 21, 2015
• Collect as much information as possible from files/binary objects
– Other contained files/objects
– Metadata, e.g. mobile app permissions, geolocation, IP addresses, domains, etc.
• Strip protection layers for additional analysis
• Do it really, really fast
• Do it at scale
• Forensics
• Anti-Virus
• Threat Intelligence
• ...
• Files can be
– Packed
– Obfuscated
– Encrypted
– Broken
• Large amounts of data to process
• Speed
• Consolidating metadata and files/objects
• Scheduling
• Reporting
• Communication
FILES
FILES
ENGINE
METADATA
• Preprocessing
– Identification
– Initial analysis
• Analysis
– Unpacking
– Validation
• Post processing
– Consolidating metadata
MODULES
IDENTIFICATION ANALYSIS
VALIDATION
UNPACKING
...
SCHEDULER
REPORT, METADATA, FILES
• Speed
• Security
• We can emulate
• Various identification engines
– Signature based
– Heuristics
– ...
• Problems
• Signatures
• Various complexity
– Simple (e.g. PEiD) • Simple byte and wildcard matching, hash matching
• 12 ?? 56 ?8 9?
– Medium (e.g. TitanMist) • Small Regex like subset
– High (e.g. TLang) • Almost full fledged programming language
• Other
• Some parts depend on identification
• Dedicated analysis modules
• Internal/external modules
• Unpacking
• Validation
• Collecting metadata
• Repairing broken files