This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
2. Another Year, Another Talk Good News and Bad News Good News:
Were going to fix this thing. We have no choice. The global economy
is based on Information Technology being trustworthy. An economy
where you have to be big enough to field a cyber army in order to
participate, is a broken economy indeed Its not like the big guys
are doing a great job defensively Bad News: Were not going to fix
it according to dogma. Hows the status quo working out for us?
There are many alternatives to dogma that are even worse How do we
find those that are better?
3. A Riddle What is the fundamental difference between attack
and defense?
4. Answer When an attack doesnt work, you can tell. Offense has
an inherent quality filter Put up or shut up Doesnt mean there
arent bugs in offensive disclosure The Oracle Critical is just an
unencrypted transport if itsa bug, then Wireshark is dropping
hundreds of 0day Press Will Report Anything But its not the
same
5. The Reality Of Defense Too much dogma Not enough science You
have to defend against every bug -> Thatsimpossible -> You
dont have to show youve defendedagainst anything Critiques of
defenses arent much better nobody ismeasuring or critiquing
effectiveness So this is a talk about skepticism and the processes
offinding effective defenses to the real and legitimatethreats we
cannot ignore You shouldnt agree with everything Im going to
present My goal is to show you some new ideas, and give you
aframework to consider them as worthwhile or not This is the only
way were going to get defense to work Lest you think theres nothing
concrete here
6. The Fundamental Test Take 2000 systems with a defense. Take
2000 systems without. Come back in six months, and manually audit
all 4000 systems. Is there or is there not a statistically
significant difference in theinfection rate? Even if we dont do the
above, let us at least respect a goldstandard when we see one! The
time may come when we spend as much money onsecurity research as we
do on medical research. Medicine took hundreds of years to become
scientific, and theyhad dead bodies to motivate them We dont have
dead bodies, or hundreds of years. We still need to fix these
problems. Some vendors out there care along these lines. Reward
them!
7. The Three Heads Of The SecurityHydra 1) The inability to
authenticate 2) The inability to write secure code 3) The inability
to bust the bad guys What were not talking about today
Authentication DNSSEC no time, ask me in private (or waita few
months) Busting the bad guys Remarkable lack of consensus regarding
which bad guys are mostimportant I tend to worry about the Aurora
attack, which involved espionageagainst (lets face it) the entire
Fortune 500, and against those raidingSMB payrolls, because that
calls into question the very viability ofSMB Others have different
priorities What we are talking about The inability to write secure
code
8. An immediate clarification Its not that its impossible to
write secure code Its not impossible to deploy X.509 PKI Its not
impossible to bust the bad guys Its just plainly and utterly
improbable At least in most organizations Possible is not enough.
Probable or bust.
9. What are we looking at today? How do we address timing
attacks? How do we generate random numbers? How do we suppress SQL
Injection? How do we detect network manipulation? How do we scan
the Internet? These are all things that are possible today. How do
we make them more deployable, less expensivemore probable?
10. Timing Attacks Many systems are modeled in terms of just
what datathey send Not in terms of when they send it Sometimes data
leaks security sensitive data Possible to distinguish 15-100
microseconds oflatency over Internet, and 100 nanoseconds oflatency
over LAN (1000 samples) Opportunities and limits of remote timing
attacks (Scott ACrosby , Rudolf H. Riedi , Dan S. Wallach) Possible
to exploit string comparison functions inwidespread scripting
languages, thus breaking HMACcompare (OpenID/OAuth) Exploiting
timing attacks in widespread systemsNate Lawson and Taylor Nelson @
Black Hat 2010
11. The Proposed Fix Any time values need to compared in a
security critical context, compare them in constant time (so that
theres no correlation between whats compared, and how long it
takes) public static boolean isEqual(byte[] a, byte[] b) {if
(a.length != b.length) { return false; }int result = 0;for (int i =
0; i < a.length; i++) {result |= a[i] ^ b[i] } return result ==
0;} Looks good, right?
12. The Problem You have to remember to do this
everywheretheres a security critical comparison You dont get to do
it all the time, because theperformance impact is too high You thus
must actually identify all the securitycritical comparisons Its
possible. But its not probable.
13. A Solution? I seem to note that distinguishing against
Internetnoise yields less accuracy (15,000-100,000ns) thanLAN noise
(100ns) Thats three to four orders of magnitude! And Internet noise
is not actually random What if we actually did have a random delay?
tc qdisc change dev eth0 root netem delay3ms 1ms For all packets
emitted from the first Ethernet interface, add arandom amount of
lag between 1,000,000ns and 3,000,000ns Boltzmann Filter At
minimum, the LAN should be as secure as theInternet. Maybe Internet
attackers also are impacted. This is a lot easier to deploy. That
really matters. But does it work?
14. What Could Go Wrong? All timing noise can be averaged out
eventually, so a global randomdelay cant work Pretty much all
password comparisons are done with non-constant time compares, so I
guess all passwords are vulnerable? Heres some SSH 0day
sys_auth_passwd(Authctxt *authctxt, const char *password){/*
Encrypt the candidate password using the proper salt. */
encrypted_password = xcrypt(password, (pw_password[0] &&
pw_password[1]) ? pw_password : "xx"); return
(strcmp(encrypted_password, pw_password) == 0); Strcmp is not
constant time. So, you just offline brute force for passwords that
have certain characters and see how far you get. It is highly
unlikely that the above attack actually works Nanosecond
differentials are too small to recover Maybe not locallyhmmm
15. What We Really Need To Know How much timing noise, of what
nature, willpermanently obscure how much timing signal beyondthe
point of infeasible return? Somewhere between 1 nanosecond and 1
day thereis an amount of noise that will indefinitely obscure an
nnanosecond differential Theres likely to be an equation here CSI
Enhance has its limits There is a limit to how much lag we can ask
for, from theperformance guys It is higher for some requests than
for others We might require more lag than perf is willing to give
(at leastin general) Need to discover these numbers
16. What could actually go wrong The distribution of lag from
the interface may be easyto filter Quantized into 1ms chunks?
Gaussian when it should be uniform, or uniform when itshould be
Gaussian Could be filterable thanks to TCP timestamps (whichhave
~10ms accuracy, but also have sharp edges) All of the above can be
fixed, the question is if theyneed to be The perfect (constant time
comparisons) is the enemy ofthe good (interface-wide jitter) Jitter
does not need to apply to all packets; could be a TCP setsockopt or
whatnot Could also be applied at the end of a php script
17. Another Day, Another Time RSA is broken! No, not the thing
with the smartcards that would(maybe, depending on vendor) leak
their private key No, not the thing with the SecureID seeds that
were stolen The thing with certificates with easily breakable RSA
keys Something like 1 in 200 RSA keys on the Internet failed!
Hughes and Lenstra had first announce, Nadia Heninger had
parallelresearch At the time, the break was blamed on RSA itself
Two primes in RSA (p and q) If either is repeated (p and q1, p and
q2), then all are easy toderive Euclids Greatest Common Denominator
RSA is bad!
18. Reality Bad random number generators create trapdoor
functions in all cryptosystems Rather than breaking the crypto, you
guess the key Basic concept of 2011s Phidelius (expanded a password
into a pseudorandom stream, which was then used to feed a key
generator for RSA/DSA/ECC). Bad RNG isnt a bug, its a feature! They
thought theyd shown RSA was bad They actually showed that RNGs are
still broken Debians bug wasnt just Debians Werent operating
systems supposed to fix this?
19. Theory Collecting and providing entropy is hard; let
theoperating system do it for you /dev/random for good bits,
/dev/urandom for best effortbits If /dev/random runs out of bits,
block until more arefound Sources for entropy Hardware RNG Keyboard
Mouse Disk Rotation (as impacted by air) Problem: Lots of
environments dont have any of that
20. Actual Environments Desktops Humans w/ keyboards and mice
Often disks Servers Sometimes have disks VMs Embedded devices
21. The Reality of Hardware RNG Its just not there. Yes, I know
Ivy Bridge is coming out with aHardware RNG. In 2012. Thats top of
the line gear now. Yes, I know some TPMs are reported to
haveHardware RNGs. For some reason, people treat TPM hardware
asunstable radioactive gunk Its also rarely in embedded kit
22. Whats Happening: An Analogy Proteins causes cancer
http://ukpmc.ac.uk/abstract/MED/3007842/reload=0;jsessionid=3X3Cs6G7VbyRT1xEPcUX.4
Carbohydrates cause cancer
http://www.smh.com.au/lifestyle/diet-and-fitness/high-carbohydrate-diet-tied-to-cancer-20110616-1g4o9.html
Fats cause cancer
http://www.telegraph.co.uk/health/healthnews/5650141/High-fat-diet-can-increase-risk-of-deadly-cancer.html
Alcohol causes cancer
http://pubs.niaaa.nih.gov/publications/arh25-4/263-270.htm So you
dont consume proteins, carbohydrates, fats, orbooze. You starve to
death.
23. What Actually Happens How do I know? I actually asked some
devs. 1) They have some code that depends on/dev/random 2) On
initialization of their embedded device, the codetries to generate
a key. 3) Theres no human at the keyboard, no hand at themouse, no
disk to spin, and no hardware RNG./dev/random blocks. The device is
a brick. Quite literally, starving for entropy 4) At best, they
switch to /dev/urandom. At worst theyswitch to rand() and then they
ship. /dev/urandom is underseeded, though, and is still broken
24. A comparison What perfectionists think will happen: Its
broken! Sure theyll demand hardware RNG! What developers actually
do: Security failed us again. Lets ship something thatworks.
Perfectionism caused (at least) 1 out of 200 RSA keys on the Net to
be easily broken Its almost certainly worse than that Those are
just the keys we can easily detect We can do better.
25. TrueRand: An Old Hack [0] Why do we like measuring keyboard
and mice? Humans and computers are not synchronized Humans do not
operate on nanosecond clocks like computers do Human is slow clock,
CPU is fast clock Any system with two clocks, has a Hardware Random
Number Generator Even if the error is one part per million, thats a
bit per second per megahertz The error is generally much larger
than a part per million, just from thermal noise (Not just thermal
noise)
26. TrueRand: An Old Hack [1] What TrueRand (from Matt Blaze
and D.P. Mitchell, in1996) does Run the CPU in a tight loop
(count++); Every 16ms, fire an interrupt On interrupt, shuffle the
count variable, and integrate it into abuffer The entropy comes in
here timer is slow clock, CPU isfast clock After 11 shuffles,
return the buffer as an integer Hash two buffers together using
sha1, return only thefirst byte It aint bad. But its disowned.
Thats too bad, because it would have prevented (atleast) 1/200 keys
from being broken.
27. Why is it disowned? (Literally Matt Blaze was vaguely
horrified thatIm revisiting this code) Perfectionism We cant model
its behavior. We dont know howgood or bad it is, so we shouldnt do
it at all. This attitude has actually led to a reduction
inavailable entropy in the Linux kernel Used to look at interrupt
counts from variousdevices Now they arent used, because they might
bepolluted
28. DakaRand 1.0 [0] An update to the old model Multiple
generators Sleepers: Measure usleep with CLOCK_MONOTONIC
CLOCK_REALTIME RDTSC (on X86 platforms) CPU counter there are
equivalents for ARM, MIPS Incrementer: See how many times we can
increment an integer within a certain time period (100% CPU)
29. DakaRand 1.0 [1] RTC: Measure interrupts from the realtime
clockusing CLOCK_MONOTONIC (dedicated IRQ!) 128hz 8192hz Threads:
Measure the status of an integermodulated by a runaway thread (100%
CPU) Anyone who thinks computers are completely
deterministiccreations has never written threaded code ;) Two
Threads, One Int (one adds, one subtracts, main polls) Two Threads,
Two ints (both add, main compares) One Thread, One Int (one adds,
main polls) Possible addition: Noisier functions than add
30. DakaRand Flow Short version Push all bits into a SHA-256
Hash Dont undercount entropy Only count them as entropy when they
pass Von Neumanns debiasing check Count 1s to decide whether 0 or 1
Throw away 00 and 11, count only 01 and 10 Actually insert a 0 or a
1 when you count a bit Dont overcount entropy Scrypt (time/memory
hard function) the resulting SHA- 256 value Make it miserable to
guess entropy Use the output of Scrypt as the input to AES-256-CTR,
emit the resulting stream
31. Attacking DakaRand The game: Find a platform
(Desktop/Server/VM/Embed) oran OS under which DakaRand provides
poor entropy in oneof its modes Userspace/Hypervisor Scheduling
Were only called some number of times per second These times per
second may be at predictable intervals If sufficiently predictable,
theyll bias the output Will they simultaneously and identically
bias both clocked entities? Autoclocking If you time something
against itself, youre going to have a bad time Clocks are highly
correlated to themselves RTC and CLOCK_MONOTONIC could be the same
underlying timer in a VM VMs, more than anything else, should be
exposing a random device (even if the random device itself uses
clock differentials) Still, this code seems to still work on
VMs
32. The VM Cloning Issue /dev/random keeps bits around for a
long time When you clone an image, you end up with those bitsbeing
static for a long time Meaning you keep generating the same entropy
for a longtime DakaRand attempted guarantee: Each read is atomic
The results of the read may be used across multiple images But two
separate calls at two separate times MUST yield twouncorrelated
streams Cant do anything after the read is fully completed During
the read (which does last a second, due to scrypt)is already after
I actually dont think you can do better than this, though Iwas
considering XORing the keystream with /dev/urandomanyway
33. Is The Underlying Use Of CryptoSafe? Modified Von Neumann
We absorb a tremendous amount of data into our hash structure that
has obvious patterns If you have 100GB of 0s and 128 bits of actual
randomness, output of hash has 128 bits of randomness We do
explicitly include the 0 and 1 Stream Function vs. Raw Output Lots
of raw output from a function tends to leak external state So lets
not leak external state. Cryptographic Stream Function RNGs tend to
have their own family of functions that are distinctly not
cryptographically validated Mersenne Twister, not AES-256 in
Counter Mode Is it in fact the case that strong (not RC4)
cryptographic functions encompass all properties of RNGs? Well,
what does dieharder say?
34. DieHarder CipherSuite Test About 16,000 CPU hours of
DieHarder Entropy Testswas run across 21 ciphers, with inputs of
either 16MBof zero or (the same) 16MB of /dev/urandom output About
24,000 different tests per cipher/content class Thanks, Jamie
Schwettman, who did all the work tomake this sweep happen No
obvious statistical leanings to the data Machine learning people
are taking a look Thanks, Prior Knowledge, Aleks Jakulin! No
conclusive findings yet Releasing this data too
35. Neat tool want it?csql: run SQL against CSV files $ cat
pass2.csv | head -n 20000 | ./csql - "SELECT cipher,content, test,
subtest, count(pv), avg(pv) from c group by cipher,content, test,
subtest;" | head -n 10 aes-128-cbc,urandom,dab_bytedistrib,0,10,0.0
aes-128-cbc,urandom,dab_dct,256,10,0.47393035
aes-128-cbc,urandom,diehard_2dsphere,2,10,0.627572674
aes-128-cbc,urandom,diehard_3dsphere,3,10,0.664239991
aes-128-cbc,urandom,diehard_birthdays,0,10,0.50850473
aes-128-cbc,urandom,diehard_bitstream,0,10,0.017056331
aes-128-cbc,urandom,diehard_count_1s_byt,0,10,0.441374983
aes-128-cbc,urandom,diehard_count_1s_str,0,10,0.538731369
aes-128-cbc,urandom,diehard_craps,0,20,0.0394997795
aes-128-cbc,urandom,diehard_dna,0,10,0.396250338
36. Kernel Recommendations /dev/random MUST not block. Make an
IOCTL if you must Return data slowly if you like CryptGenRandom on
Windows does not appear to block 1 out of 200 RDP keys are not
likely to be corrupt Dont be so shy about interrupt sources Care
less about interrupt counts than interrupt timings ftrace exposes
microsecond timings, which might not be finegrained enough Use
nanosecond arrival times, as much as possible, from deviceson
foreign busses. The slower the foreign device is, the better. You
want to be measuring slow clocks against fast clocks By definition,
the kernel is interrupted at finer grain than userspace. Obviously
you dont have to include every last interrupt it takestime to check
the time. Maybe consider this Modified Von Neumann
construction
37. From The Bottom To The Top Our biggest problems in security
do not revolvearound Random Number Generation They revolve around
languages Language Theoretic Security: The hypothesis thatsecurity
vulnerabilities are the consequence of thelanguages code is written
in Coined by Len Sassaman and Meredith Patterson Sapir-Whorf is
true for code Corollary: If language got us into this mess,
languagecan get us out More important corollary: Languages are
spoken orwritten by humans. Ignore their needs at your peril.
38. The Shift One way to look at language theoretic security is
through the lens of computability theory Different classes of code
have different amounts ofpower, and communication should be limited
tothe least amount of power necessary Attacks expands power from
Declarative to throughRegular Expression through Turing Complete
This is indeed a valid lens Another lens
39. Diagramming Sentences:IT WAS ACTUALLY USEFUL
40. Injection Vulnerabilities:When Trees Disagree Parsers,
almost by definition, turn streams of bytes intotrees Injection
Vulnerabilities exist when a sending language and areceiving
language (which may or may not be the same)disagree on the nature
of the tree sent An extreme case of this is when bytes flow out
intosurrounding memory But SQL Injection, LDAP Injection, XSS, etc
are all justsituations where (generally) the sender thought it sent
theusers data, but the receiver thought it received a peers code A
purely declarative language can still (easily) be injected into,
andcomplexity can remain declarative and still yield damage. The
attackis not in the increase of complexity, but in the transition
of content fromone identity/context to another through parse tree
differentials. So what?
41. We have to stop injectionvulnerabilities Theyre killing us
Theyre not l33t Theyre totally effective Theyre the vast majority
of vulnerabilities everwritten and discovered We havent actually
fixed them If we did fix them, they wouldnt still be
costingbillions of dollars [Yes, were going to revisit
Interpoliqueits OK,were going to bash it too]
42. What is the importance of anothertheoretical model? It
declares the rules of the game. 1) We want to synchronize parse
trees. 2) We want developers to actually use our method. A language
unspoken has a term: A dead language It explains what is
surprisingly not understood Why did XML become popular? Instead of
spending months figuring out just how to sayhello, they have their
code, you have your code, and its selfdescribing strings in each
direction. No fiddly the eighth biton the fourth byte changes
everything Why did JSON become popular? XML invented its own modes
of being fiddly
43. The Hard Truth Developers are in charge. Not architects
(they love ASN.1 and XML and WS-ZOMG) Not academics (they love
Haskell) Not management (they love money) Money is made
byperformance, reliability, maintainability, features,
rapiddevelopment Money is later lost by security, maybe So, not us.
What is the #1 thing developers like? Code working
44. Thus, the biggest explanation Why is PHP so popular? If you
dont think it is, see here: What is PHP incredibly good at? Copy
and paste codeand it works We understand that CPAN makes PERL We
dont understand that PHP sample codemakes PHP Java Alternative:
Look how much code my IDE can write for me! Copy and paste with a
suit on
45. The Language Success Metric What are the odds, if I try
this, that it will work? Not, when it fails, it fails fast!
Surprisingly, nobody tracks this metric (Except maybe Processing,
which is incredible) Thats why all the successful languages tend to
bethe brainstorms of one guy Art is science before we know what
were doing PHP beats your favorite language If we want to fix
security, here is a good place towork
46. Whats Wrong With ORMs? Object Relational Models Problems
with SQL Injection? Dont use SQL!Instead, the database just looks
like your favoritelanguages native objects. Great, right up until
the moment you need to makea query.
47. Look at this. It matters.
+[,+[-[>+>+>++++++++[-]>>+[-]
++[-]+[[-]++++++++[-]>++++[-]>+[-][-]< +[-]+
+[-]+[[-]+[-]++[-]< [>+[[-]< select($name); 32 characters
of punctuation, deeply interspersed $result = query(SELECT $name
FROM $names WHERElength($name)prepare(SELECT * FROM foo where x=?
andy=?);$stmt->bind_param(ss, $x, $y);$stmt->execute();
Finally, evaluate the generated code eval(b(SELECT * FROM foo where
x=^^x and y=^^y); Eval is, surprisingly, the only way to retrieve
the values of $xand $y from inside the function b().
53. Whats Wrong With Interpolique? What if the dev writes:
eval(b(SELECT * FROM foo where x=$x andy=$y); If $x and $y are
attacker controlled, hes not farfrom an eval that will run code in
PHPs context! The b() function is in a position to defend the
codethat ultimately enters eval, but now youre entirelydependent on
b() knowing what PHP will do givenarbitrary bytes. GOOD LUCK WITH
THAT Highly greppable error case, but its pretty scary
54. Building A Safe Interpolique Eval only exists so that
variables from the callingscope can be dereferenced One approach is
to implementcreate_selfscoped_function() Returns a function that
always runs in the scope of itsparent Could implement proxies so it
can only read variables,and cant rewrite
$rows=$mysql_safequery(select * from foo wherex=^^x and y=^^y);
Requires a patch to PHP -- Daniel Zulla is working onthis!
55. Code Rewriting? If we know what we would have liked
developers to have written, why dont we just transform code once?
Never really been a fan of this Have you ever audited autogenerated
code? What do you do when the code looks like:$z = SELECT * from
foo where x=$x and y=$y;;$rows = mysql_query($z); Static analysis
can of course find such situations (thusknowing $x came in from a
HTTP variable) but mostdevs dont have access to such static
analysis tools Should they?
56. Tainting What if we actually marked every character that
came in from anHTTP query as tainted? Metadata, on a character by
character basis Would survive passing from function to function
Might even survive reasonable mangling by built in filters Then,
you could write something like:mysql_query_safe(select * from foo
where x=$x and y=$y;); Even though $x and $y would expand, the
wrapper functionwould see that those particular characters were
once tainted withthe mark of the web, and could rewrite the unsafe
query aroundit This still works with mysql_query_safe($x) when $x
wasassembled elsewhere, even concatenated; Could have problems with
silent failure with filtering functions Requires a patch to PHP
Daniel Zulla also working on this
57. SuperEncoding as Explicit Tainting Based on discussions
with Zane Lackey and Nick Galbreath atEtsy, based on an approach
theyre already running inproduction What if all variables from the
web, were encoded in a whitelistedformat? Simple hex encoding --
&%41 which, coincidentally, renders asan A in any HTML parser
All non-DB access would have to go through accessors r($x) to read,
w($x) to write Surprisingly easy to grep for access that isnt
wrapped Could do two things mysql_query_safe($x) could simply treat
all superencodedcharacters as data and parameterize accordingly
mysql itself could have its lexer modified to handle HTMLencoding,
exposing such characters to less of the SQL parser(this is just a
string) very LangSec
58. A Last Minute Alternative Perhaps weve got this backwards
Rather than tainting data as data, we mark code as code. SQL tends
not to be passed around from function to function, letalone parsed
in the frontend $sql = c(select * from foo where x=);$sql +=
$x;$sql += c(and y =);$sql += $y; Then either mysql_query_safe or
mysql itself (cowardly) refusesto execute anything with unmarked
code Or, if this is baked into MySQL, it just doesnt see bytes as
code if theyre not deeply marked as code Moderately greppable youre
basically finding all SQL in yourcode and wrapping it with some
sort of taint Either implicit as per Zulla, or explicit as per Etsy
Most likely failure mode is an attacker controlled variable somehow
getting inside of c();
59. This is what LangSec means What are people trying to say?
How can we make it easier to say that? How hard will it be for
people to migrate? What errors will they make when trying to
usethis? Can we limit how much code might contain abug? CARE ABOUT
YOUR DEVS OR THEY WILLNOT CARE ABOUT YOU
60. Whats Going On With The Web? It doesnt matter what code you
write, if there areparties in the middle changing or blocking
whatyou send Content alteration and blocking is becoming areal
thing Verizon is claiming the first amendment right torewrite
Internet connections Entire countries are silently blocking web
pages Indonesias blocking a million porn sites in the run up
toRamadan
61. What Went Wrong With N00ter N00ter was a really fun (and
really powerful) mechanismfor detecting network manipulation
Allowed a remote server and a cooperating client to pretendto have
a conversation with anyone on the Internet, using anyprotocol To
any MITM, it would look like a real, unmodifiedconversation So any
alterations that might normally hit the real server, would hit this
too Unfortunately, N00ter does a lot of very low
levelpacketcrafting, meaning (realistically) it requires
customhardware in front of user machines This is not fun to deploy
Especially if you need to get between NAT and actual
networkconnection Not impossible. Definitely improbable.
62. What Else Can We Use? Executable code on the client
OONI-Probe Web Pages with Iframes Herdict (Herd Verdict) Needs
either user cooperation, or a Chrome extension, to know if content
is up or down Is it possible to determine whether content is up or
not, from just a web page? Can we crowdsource censorship data?
Maximize data per user Minimize installation load per user
63. Imaging Browsers Same Origin Policy usually prevents
webpages from doing much with one another You wouldnt want Yahoo
able to read from your Gmailaccount But there is one exception Any
domain is allowed to load any other domainsimages Beyond that, its
allowed to know that the load wassuccessful Not merely that there
was a file at that location, but that it wasactually an image You
even get image dimensions (which youd have to, becauseit resizes
the page) If a domain is being censored, the image will not load
What one image is on most domains?
64. Favicon.ico (Its the picture to the left of Google in the
tab)
65. So this is CensorSweeper(Also by Joseph Van Geffen and
Michael Tiffany)Written for Wall Street Journal Data Transparency
Hackathon
66. Whats going on img = new Image();img.onload =
function(event) { }// render faviconimg.onerror = function(event) {
validate(); }img.src = http://somesite.com/favicon.ico The above is
done in parallel, reading from a listof sites that have confirmed
presence offavicon.ico Six failures are required before a bomb
isdropped on the map
67. Error Handling Six failures isnt actually enough! Web
browsers provide remarkably little feedback to adeveloper to know
whats failing, and why Put simply, flow control hasnt really been
implementedfor the web Everythings been designed around infinite
bandwidth For reliability, going to need to shut down all
othertraffic, and then do two simultaneous lookups One for a
known-up site, the other for the supposedly-down site That being
said, CensorSweeper works pretty well Can we do better?
68. Sockets Once upon a time, web browsers could act like
proxies, giving you connections anywhere There were bugs in Flash
and Java; we fixed them They can now only create connections to IP
addresses that invite them But ~20% of the time there are
transparent proxies between web servers and their users See Staring
into the Abyss by me, or Socket Capable Browser Plugins Result In
Transparent Proxy Abuse by Bob Augur This has been knownbut not
explored for mapping censorship!
69. HTTP Censorship Detection 1) Using Flash (or HaXe) Create a
HTTP connection back to your own IP on port 80 Host a socket policy
file, so Flash allows this 2) Request anything, from any domain If
the request comes to you, there is no transparentproxy Otherwise,
the request will be hijacked by the proxy,serviced, and sent back
to your Flash app You now see what that user would see, if
theybrowsed to that site! You can then submit it back
toyourself.
70. HTTPS Certificate Extraction Just as HTTP traffic on 80/tcp
is hijacked, so may HTTPStraffic on 443/tcp MITM may have an
alternate certificate for you But (if youre careful) it cant tell
the difference between thebrowser starting SSL, and Flash/HaXe
starting SSL It has to know which domain to pretend to have a
certificate for The proxy can parse the Server Hello, with its
certificate (Its your server saying hello) The proxy can parse the
Client Hello, with its Server Name Indication (Its your Flash app
saying hello) You can actually host the real Facebook certificate,
or even proxy thereal Facebook SSL endpoint Hard to keep track of
all of Facebooks IPs It has to forge the certificate, before you
have to prove you actually have Facebooks private key (assuming you
arent proxying)
71. Slight Annoyance No normal way, via Browser DOM, to
determine thecertificate that provided content This at least allows
a page to query for its exposedcertificates kinda cool! Limitations
You can test anyones certificate, as long as the attackerisnt
interposing themselves via DNS hijacking The Flash app sees whats
at the named IP; if hijacking is atthe DNS layer, then Flash wont
get hijacked You are able to test your own certificate, but then
theattacker has already MITMd you and can alter yoursecurity
validation layer
72. Full Proxying One of the goals of N00ter was seeing if
everyday content wasbeing altered or slowed down One of the
headaches with these custom probes is writing thesecustom probes
How do you look just like a real web browser trying to
accessYouTube? Answer: Be a real web browser trying to access
YouTube The last time we played with Flash and Sockets, we created
afull VPN But now sockets are limited to a single destination It
turns out that it may still be possible/useful to proxy an
entirebrowser (at the server) down to the Flash app (in the
client),which will then make open connections back to the server
whowill proxy them to the rest of the Internet This will allow, at
minimum, a protocol correct sequence ofmessages for HTTP and HTTPS
that are only incorrect bydestination IP So basically, if the
intercepting server doesnt care about IP correctness, you get to
interrogate its ruleset with no installed code on the client
73. Last but not least:Scanning Networks Quickly Actionable
Intelligence: What can an attacker dotoday, that he couldnt do
yesterday, for what classattacker, to what class victim? Rather
related to this: How many potential victims areout there? Ive run
two major scans this year (that Ive talkedabout) Telnet Determining
presence of Telnet Encryption support Answer: Very rare RDP
Determining presence of open RDP access Answer: VERY common
74. My Process Once upon a time, simply flooding TCP SYNswas
enough to find out what was out there Nowadays, many, many IP
addresses will threeway handshake, but there wont actually
beanything there Solution: Split process 1) Identify candidate IP
addresses, that are listeningon a given port 2) Given a candidate,
actually connect to the IP
75. More Detail Candidate collection For each IP, incrementing
the first bytefirst, (1.1.1.1, 2.1.1.1, 3.1.1.1), send a TCP SYNon
the required port (23 for telnet, 3389 for RDP) In a separate
window, log TCP SYN|ACKs withtcpdump tcpdump w log tcp[tcpflags] =
(tcp-syn|tcp-ack) Scanrand was being buggy, this maximized logging
Candidate Inspection Telnet Encryption nmap team whipped up a
quickcheck, so I just fed the IP list to it Very few found
76. RDP Sweep: Black Mamba Probably the most pleasant
environment for reasonable scale TCPprobing ever devised
http://rootfoo.org/blackmamba from blackmamba import *def get(host,
port=80):msg = "GET / HTTP/1.1rnHost: %srnrn" % hostyield
connect(host, port)yield write(msg)response = yield read()yield
close()print responsedef generate(host, count=100):for i in
range(count):yield get(example.com)run(generate(example.com)) You
end up getting ~3000 IPs a second May need to increase ulimit n May
need to alter hardcoded limits in blackmamba.py
77. Can We Get Faster? Always wanted to write a userspace TCP
stack HD Moore kinda kicked me into working on one for
critical.io,his mysterious new scanning project I am not at all
beyond being motivated by other peoples awesomeand mysterious
projects Especially when they give me CPU and Network Bandwidth So.
Scanrand3! A new scanner that doesnt just flood SYNs,but actually
connects to every node and extracts data Original plan: TCP stack
with SQLite as the backend SELECT * FROM sockets WHERE
data_sent!=data_ackedand data_sent_time-now()>3 (to find sockets
where aretransmit is needed) is just funny! SQLite, in memory-only
mode, is really really fast 160K inserts/sec fast Unfortunately,
that speed disappears when you add indexes 20K inserts/sec with two
indexes
78. New Plan: Let The Servers KeepTCP State
79. Details! Details! Scanrand didnt get its speed by keeping
track of who it didor didnt send traffic to Why should Scanrand3?
1) Send SYN Maximum Segment Size==1460 Window Size==1460 (for all
packets) 2) Upon receiving a SYN|ACK, reply with an ACK Include GET
/ HTTP/1.0 payload Yes, you can put a payload in the initial ACK!
3) Upon receiving an ACK, if there is a payload, ACK it Save the
payload 4) Upon receiving a FIN|ACK, RST Save the payload, if
any
80. No Local State If the first SYN is dropped OK, nobodys
around toretransmit it May want to log RST|ACK to avoid future
retransmits If the SYN|ACK is dropped to the client,
serverretransmits SYN|ACK If the ACK w/ initial payload is dropped
to the server,server retransmits SYN|ACK, causing new ACK w/payload
If any ACK w/ response payload is dropped to theclient, server will
retransmit ACK w/ response payload Same with FIN|ACK Window size of
1460 means we always know whichparticular packet to acknowledge
only one in flight(usually)
81. Performance Relatively unoptimized code on a well hosted
butunderpowered server (cheap Dual Opteron) 50-80K servers/sec w/
full payloads 3.25M IPs takes 60-80 seconds, retrieves about800MB
of content Task is embarrassingly parallelizable acrossthreads,
databases, etc. Should be able to use multiple bpf filters to route
packetsto their appropriate thread with kernel filtering Writing to
a SQLite DB, and then backing up to disk, isreally fast
(substantially faster than fwrite, thoughhavent tested a large mmap
yet) You basically reassemble payloads in SQLite as
apostprocess
82. Security Scanrand pioneered inverse SYN cookies you
protectagainst spoofed responses by validating fields in
theresponse against hashes of data plus a secret only youknow 16
bits in source port + 32 bits in sequence number arepossible May be
able to get another 32 bits out of TCPTimestamps, which are usually
supported Havent implemented yet, so very easy to poison me
Sequence space becomes less secure, the more data youactually send
You do know the exact size of each payload, so you can say I
onlyaccept responses with no payload seq, payload 1 seq, payload
2seq, etc Technically the other said can ACK at any byte offset,
but that doesntmean they actually will
83. Some Notes Kernels have actually gotten kind of fast
Non-blocking connect() plus epoll should be able to get pretty fast
Certainly easier to code for that model! Didnt work for me (not
sure why) This approach ultimately becomes fastest Probably need a
writev call to spew many packets w/o a write for each
84. More Notes Can also try more efficient stores than sqlite
Giant allocation of RAM with fixed offsets per IP MemSQL Neat
project by ex-facebookers compiles SQL to C++ They think even with
the indexes they can do +100K Can have merged approaches too Only
start keeping state if I like the response from the server Note
that stateless client + stateless server = no retransmits
85. What should the coding model be? Flat file / command line?
C? JavaScript? Lua? Could implement support for nmap scripts
86. Most Important Feature Blacklist support Most networks dont
mind getting swept They certainly are, already Some do Part of
being a whitehat is you let people know who you are,and listen to
their requests So you end up with a pile of IP ranges not to sweep
It can actually take a substantial amount of CPU if youcheck the
list naively Need to compile it into a quickly queriable structure
I dont think firewall rules apply to spoofed traffic
87. Simple Architectural Note Dont try to interact with the
Linux firewall Just pick another IP on the LAN and send from their
Respond to ARP traffic for it (Yes, it is an advantage of the
socket model thatyou dont need to requisition another IP)
88. Whew! Lots of stuff! Hope you enjoyed! This may not be how
you try to fix stuffbut itswhat I try to do Thanks to everyone
cited in the slides Thanks also toNick, Johnny, Blackstock, Alex,
Allessandra, Allessandra, and Andrew of The Sub for putting up with
mein DEFCON mode ;)