BLUEPRINT: Robust Prevention of Cross-site Scripting Attacks for Existing Browsers Mike Ter Louw V.N. Venkatakrishnan University of Illinois at Chicago
Mar 28, 2015
BLUEPRINT: Robust Prevention ofCross-site Scripting Attacks
for Existing Browsers
Mike Ter Louw V.N. VenkatakrishnanUniversity of Illinois at Chicago
Outline
• Intro to Cross-site Scripting• Objective• Approach• Technical details• Evaluation• Related work
Cross-site scripting (XSS)
• A widespread web application vulnerability– In the last few weeks…– Time magazine “Top 100 influential people” poll
defaced by XSS (Apr 2009)– Twitter XSS worm (Apr 2009)– McAfee web site attacked (May 2009)
• The #1 threat on the Internet (OWASP)
Problem: Malicious user created content!
Benign comment“Pete is…”
Malicious comment“<script>doEvil()</script>”
Our Objective
To develop a robust defense for cross-site scripting attacks
Typical Web Application Goals
• Allow user created content to be expressive, containing rich HTML content– Format text (<b>bold</b>, <i>italics</i>)– Hyperlinks (<a href=“http://g.com”>…</a>)– Embedded images
• Prevent scripts in user created content
• Today’s web browsers / standards do not easily facilitate these goals to be met simultaneously
Content Isolation
• User-created content should always be treated as “data”, never as “code”
• Need to isolate user created content as “data only”
Content Isolation for Browsers
• Content Isolation can be achieved for future browsers– Requires changes to standards and browser parser
implementations – Standards / Browsers’ revision cycles may take
several years
• Today’s browsers continue to remain vulnerable to XSS in the near term
Our Goal
Construct a robust defense for cross-site scripting attacks that – permits rich HTML content– works on today’s browsers• configured to default settings• without requiring changes of any form, including
patches, plug-ins, add-ons, etc.
Most popular defense: Content filtering
• Involves sanitization of untrusted HTML by removing script content– Mainly done using regular expressions / parsing
HTML• Absence of strong isolation facilities for HTML
has made content filtering the current main line of defense
Problem with Content Filtering
• The web application’s interpretation of sanitized content may differ from the browser’s interpretation
• Example: +ADw-SCRIPT+AD4-attack(); • Web Application’s understanding : raw text• Browser’s understanding: “<SCRIPT>attack();”
The parsing “gap”
Browser generated Parse Tree
div
div
text text
div
text
div
div
text text
div
script
Server intended parse tree
XSS Cheat Sheet provides approx. 100 examples of such browser “quirks”
Our Approach: Server intended parse treeof untrusted content
Reproduce on Browser
div
div
text text
div
text
div
div
text text
div
text
Challenge : Parsers on existing browsers are unreliable
The Blueprint Approach
• Take control content interpretation process on the browser – Avoid untrusted content parsing by browser
No parsing of untrusted content by browser
No scripts identified in untrusted content!
RobustXSSPrevention
High level overview
• Generate a parse tree of untrusted content on the server– Remove script content by applying whitelist of
known-static content types
• Automatically generate a (trusted) JavaScript program to reconstruct this parse tree on the browser
Approach Overview
HTML parse tree viadocument.createElement()et al.
Problem: Transporting data without invoking browser’s parser
• Parse tree is constructed using both JavaScript code and data– Code constructs various tree nodes (e.g. <div>)– Data that annotates tree nodes (e.g. text content)
• Exposing raw data to browser parser may lead to unpredictable behavior
• Our Solution: Encode data using safe alphabet– E.g. “a-z”– transport encoded data to the JavaScript interpreter
Transporting data
HTML parse tree viadocument.createElement() et al.
Text node
Plain text
String variable
DOM API used
document.
createElement() createTextNode() getElementById()
element.
appendChild()
insertBefore()
parentNode()
removeChild()
setAttribute() style[ ] style.setExpression()
Instrumenting web application with Blueprint
<?php foreach ($comments as $comment): ?> <li><?php echo($comment); ?></li><?php endforeach; ?>
<?php foreach ($comments as $comment): ?> <li><?php $model = Blueprint::cxPCData($comment); echo($model); ?></li><?php endforeach; ?>
Transformed web application output
XSS Vector II:Cascading Style Sheets
CSS without XSS
Use style object to apply style rules element.style['width'] = decode( untrusted );
Dynamic properties not allowed by whitelist element.style['behavior'] = …
element.style['-moz-binding'] = …
CSS expression vector
Any “static” property can be promoted to dynamic via expression() syntax
element.style[“width”] = “expression( attack())”;
Threat exists only on Internet Explorer IE has no DOM interface to directly force
static value
Protection against CSS expressions
Use setExpression( … ) to apply style rules Forces all CSS rules to be dynamic Trusted script invoked to retrieve property
value Script looks up untrusted value in array, then
returns it Returned value observed to be static
Evaluated unobfuscated expression() for all allowed CSS properties
XSS vector III:Uniform Resource Identifiers (URI)
URI
http://www.example.com/a.html?param#a URI scheme indicates static / dynamic nature
Static: http:, https:, ftp:, mailto: Dynamic: javascript:
No direct interface to URI parser to enforce a particular (whitelisted) scheme
We use a 3-tiered defense
Evaluation
Evaluation
Effectiveness at preventing XSS attacks on existing browsers
Compatibility with common use cases
Performance overhead on server and browser
Browser evaluation
• Chrome 1• Firefox 3• Firefox 2• IExplorer 7
8 browsers tested Total over 96% market share of browsers in active
use
Internet Explorer 6
Opera 9.6
Safari 3.2
Safari 3.1
Defense effectiveness
XSS Cheat Sheet [Ha09] 94 XSS attack examples Designed to target server-side defenses Embedded in several syntactic contexts Developed automated test platform
Identified which attacks successful on which browser Evaluated defense effectiveness
All 94 attacks successfully defended on all 8 evaluated browsers
Compatibility
Modified source code for two popular web applications: WordPress MediaWiki
Modified output of two popular websites NY Times blog Slashdot.org
WordPress (compatibility)
Added protection for 3 low integrity outputs (per user comment to blog article) Name (plain text) Website link (anchor element) Comment body (mixed HTML)
Allows testing of pages with hundreds of (relatively simple) models
Tested real-world blogs, 23—516 comments No negative compatibility impact observed
MediaWiki (compatibility)
Added protection for 2 low integrity outputs Article (i.e., web page) title Article content
Allows testing of large, complex models Tested “Featured” article from Wikipedia Content rendered very faithfully to original Problems:
<imagemap> not in whitelist Relocate trusted script
Performance overhead measurements
Server page generation latency Browser memory overhead Browser page rendering latency Combined effect of server and browser
latencies
WordPress page generation latency
Measured significant overhead Partly due to redundant content filter (KSES)
MediaWiki page generation latency
Better performance than WordPress Redundant intermediate HTML stage
Client memory overhead
Minor overhead
WordPress page rendering latency
MediaWiki page rendering latency
User experience impact of combined latencies
Tested with Firefox 2 (mid-road performance) WordPress with 100 blog comments
Low perception of delays for common case
Related Work
• Server-side (XSS-Guard, NeatHTML)– Prevent injected scripts in final output– Vulnerable to attacks exploiting parsing differences
• Client-side (NoMoXSS, Noxes)– Identification and prevention of data leaks– Cannot detect XSS within same origin
• Black box / proxy (XSS-DS, Taint inference)– Server: Detect and prevent reflected scripts– Client: Detect and prevent data leaks
Related work (cont.)
• Server and browser collaboration (BEEP, DSI, Noncespaces)– Server: Identify policy regions and declare policies– Client: Enforce policies over policy regions– Require browser changes
• Systems supporting benign scripts in user-created content– Caja, Web Sandbox, Facebook– Complimentary to our approach
Conclusion
• Cross-site scripting attacks can be prevented entirely if browsers and web applications can come to a common understanding of the structure of untrusted content
• Blueprint faciliates this goal and provides a novel defense for XSS
• Project page:– http://www.sisl.rites.uic.edu/blueprint
References
•[Ha09] Hansen, Robert. XSS Cheat Sheet
•[Di07] Di Paola, Stefano. Preventing XSS with Data Binding
XSS Detail
• Challenge for attacker: Embed content the browser will interpret as script
• Many vectors– Script tags <script> attack(); </script>– Script attributes: onmousemove=“attack();”– CSS Style rules: “width: expression( attack() );”– URI: src=“javascript:void attack()”
Encoding
• Search engine optimization (SEO)• Screen readers• View source• Solutions:– Less destructive encoding– Modify reader– Add feature to browser
Dynamic attacks
• UCC added to a page dynamically must also be protected
• Current implementation requires remote procedure call (via XHR / AJAX) to request model
• Blueprint can ensure a base document free of user-embedded scripts
• Trusted code must then take precautions to maintain security
Whitelist
• Whitelist can be site-specific• Whitelist can be grown, gradually adding
content known to be static• Used off-the-shelf whitelist from HTMLPurifier
URI Defense 3-tiered defense: 1. Character-level whitelist
Only allow syntactically-inert untrusted chars 2. Parse behavior sensing
a.protocol DOM property [Di07] Assumes URI parsing same for all contexts
a.href, img.src, url() 3. Impact mitigation
Rewrite URI pointing to redirection service Attacks execute in different origin, void of sensitive data
Eliminate dependency on browser parser
• Transform user-created content into static content models on web server– Model reflects approved content parse tree
• Propagate static content models into JavaScript interpreter of web browser
• Reconstruct server-approved parse tree using client-side model interpreter
Create static content model
• Parse untrusted HTML• Prune resulting parse tree in accordance with
whitelist of known-static node types• Serialize parse tree into stream of benign data
characters• Wrap in <code> … </code> tags• Attach trusted script for invoking model
interpreter
Model interpreter
• Interprets model as stream of declarative statements
• Uses reliable DOM API to generate content– document.createElement( … )– element.appendChild( … )
• Enforces server-intended parse tree in browser