BLUEPRINT: Robust Prevention of Cross-site Scripting Attacks for Existing Browsers Mike Ter Louw V.N. Venkatakrishnan University of Illinois at Chicago.

BLUEPRINT: Robust Prevention ofCross-site Scripting Attacks

for Existing Browsers

Mike Ter Louw V.N. VenkatakrishnanUniversity of Illinois at Chicago

Outline

• Intro to Cross-site Scripting• Objective• Approach• Technical details• Evaluation• Related work

Cross-site scripting (XSS)

• A widespread web application vulnerability– In the last few weeks…– Time magazine “Top 100 influential people” poll

defaced by XSS (Apr 2009)– Twitter XSS worm (Apr 2009)– McAfee web site attacked (May 2009)

• The #1 threat on the Internet (OWASP)

Problem: Malicious user created content!

Benign comment“Pete is…”

Malicious comment“<script>doEvil()</script>”

Our Objective

To develop a robust defense for cross-site scripting attacks

Typical Web Application Goals

• Allow user created content to be expressive, containing rich HTML content– Format text (<b>bold</b>, <i>italics</i>)– Hyperlinks (<a href=“http://g.com”>…</a>)– Embedded images

• Prevent scripts in user created content

• Today’s web browsers / standards do not easily facilitate these goals to be met simultaneously

Content Isolation

• User-created content should always be treated as “data”, never as “code”

• Need to isolate user created content as “data only”

Content Isolation for Browsers

• Content Isolation can be achieved for future browsers– Requires changes to standards and browser parser

implementations – Standards / Browsers’ revision cycles may take

several years

• Today’s browsers continue to remain vulnerable to XSS in the near term

Our Goal

Construct a robust defense for cross-site scripting attacks that – permits rich HTML content– works on today’s browsers• configured to default settings• without requiring changes of any form, including

patches, plug-ins, add-ons, etc.

Most popular defense: Content filtering

• Involves sanitization of untrusted HTML by removing script content– Mainly done using regular expressions / parsing

HTML• Absence of strong isolation facilities for HTML

has made content filtering the current main line of defense

Problem with Content Filtering

• The web application’s interpretation of sanitized content may differ from the browser’s interpretation

• Example: +ADw-SCRIPT+AD4-attack(); • Web Application’s understanding : raw text• Browser’s understanding: “<SCRIPT>attack();”

The parsing “gap”

Browser generated Parse Tree

div

div

text text

div

text

div

div

text text

div

script

Server intended parse tree

XSS Cheat Sheet provides approx. 100 examples of such browser “quirks”

Our Approach: Server intended parse treeof untrusted content

Reproduce on Browser

div

div

text text

div

text

div

div

text text

div

text

Challenge : Parsers on existing browsers are unreliable

The Blueprint Approach

• Take control content interpretation process on the browser – Avoid untrusted content parsing by browser

No parsing of untrusted content by browser

No scripts identified in untrusted content!

RobustXSSPrevention

High level overview

• Generate a parse tree of untrusted content on the server– Remove script content by applying whitelist of

known-static content types

• Automatically generate a (trusted) JavaScript program to reconstruct this parse tree on the browser

Approach Overview

HTML parse tree viadocument.createElement()et al.

Problem: Transporting data without invoking browser’s parser

• Parse tree is constructed using both JavaScript code and data– Code constructs various tree nodes (e.g. <div>)– Data that annotates tree nodes (e.g. text content)

• Exposing raw data to browser parser may lead to unpredictable behavior

• Our Solution: Encode data using safe alphabet– E.g. “a-z”– transport encoded data to the JavaScript interpreter

Transporting data

HTML parse tree viadocument.createElement() et al.

Text node

Plain text

String variable

DOM API used

document.

createElement() createTextNode() getElementById()

element.

appendChild()

insertBefore()

parentNode()

removeChild()

setAttribute() style[ ] style.setExpression()

Instrumenting web application with Blueprint

<?php foreach ($comments as $comment): ?> <li><?php echo($comment); ?></li><?php endforeach; ?>

<?php foreach ($comments as $comment): ?> <li><?php $model = Blueprint::cxPCData($comment); echo($model); ?></li><?php endforeach; ?>

Transformed web application output

XSS Vector II:Cascading Style Sheets

CSS without XSS

Use style object to apply style rules element.style['width'] = decode( untrusted );

Dynamic properties not allowed by whitelist element.style['behavior'] = …

element.style['-moz-binding'] = …

CSS expression vector

Any “static” property can be promoted to dynamic via expression() syntax

element.style[“width”] = “expression( attack())”;

Threat exists only on Internet Explorer IE has no DOM interface to directly force

static value

Protection against CSS expressions

Use setExpression( … ) to apply style rules Forces all CSS rules to be dynamic Trusted script invoked to retrieve property

value Script looks up untrusted value in array, then

returns it Returned value observed to be static

Evaluated unobfuscated expression() for all allowed CSS properties

XSS vector III:Uniform Resource Identifiers (URI)

URI

http://www.example.com/a.html?param#a URI scheme indicates static / dynamic nature

Static: http:, https:, ftp:, mailto: Dynamic: javascript:

No direct interface to URI parser to enforce a particular (whitelisted) scheme

We use a 3-tiered defense

Evaluation

Evaluation

Effectiveness at preventing XSS attacks on existing browsers

Compatibility with common use cases

Performance overhead on server and browser

Browser evaluation

• Chrome 1• Firefox 3• Firefox 2• IExplorer 7

8 browsers tested Total over 96% market share of browsers in active

use

Internet Explorer 6

Opera 9.6

Safari 3.2

Safari 3.1

Defense effectiveness

XSS Cheat Sheet [Ha09] 94 XSS attack examples Designed to target server-side defenses Embedded in several syntactic contexts Developed automated test platform

Identified which attacks successful on which browser Evaluated defense effectiveness

All 94 attacks successfully defended on all 8 evaluated browsers

Compatibility

Modified source code for two popular web applications: WordPress MediaWiki

Modified output of two popular websites NY Times blog Slashdot.org

WordPress (compatibility)

Added protection for 3 low integrity outputs (per user comment to blog article) Name (plain text) Website link (anchor element) Comment body (mixed HTML)

Allows testing of pages with hundreds of (relatively simple) models

Tested real-world blogs, 23—516 comments No negative compatibility impact observed

MediaWiki (compatibility)

Added protection for 2 low integrity outputs Article (i.e., web page) title Article content

Allows testing of large, complex models Tested “Featured” article from Wikipedia Content rendered very faithfully to original Problems:

<imagemap> not in whitelist Relocate trusted script

Performance overhead measurements

Server page generation latency Browser memory overhead Browser page rendering latency Combined effect of server and browser

latencies

WordPress page generation latency

Measured significant overhead Partly due to redundant content filter (KSES)

MediaWiki page generation latency

Better performance than WordPress Redundant intermediate HTML stage

Client memory overhead

Minor overhead

WordPress page rendering latency

MediaWiki page rendering latency

User experience impact of combined latencies

Tested with Firefox 2 (mid-road performance) WordPress with 100 blog comments

Low perception of delays for common case

Related Work

• Server-side (XSS-Guard, NeatHTML)– Prevent injected scripts in final output– Vulnerable to attacks exploiting parsing differences

• Client-side (NoMoXSS, Noxes)– Identification and prevention of data leaks– Cannot detect XSS within same origin

• Black box / proxy (XSS-DS, Taint inference)– Server: Detect and prevent reflected scripts– Client: Detect and prevent data leaks

Related work (cont.)

• Server and browser collaboration (BEEP, DSI, Noncespaces)– Server: Identify policy regions and declare policies– Client: Enforce policies over policy regions– Require browser changes

• Systems supporting benign scripts in user-created content– Caja, Web Sandbox, Facebook– Complimentary to our approach

Conclusion

• Cross-site scripting attacks can be prevented entirely if browsers and web applications can come to a common understanding of the structure of untrusted content

• Blueprint faciliates this goal and provides a novel defense for XSS

• Project page:– http://www.sisl.rites.uic.edu/blueprint

http://www.sisl.rites.uic.edu/blueprint

References

•[Ha09] Hansen, Robert. XSS Cheat Sheet

•[Di07] Di Paola, Stefano. Preventing XSS with Data Binding

XSS Detail

• Challenge for attacker: Embed content the browser will interpret as script

• Many vectors– Script tags <script> attack(); </script>– Script attributes: onmousemove=“attack();”– CSS Style rules: “width: expression( attack() );”– URI: src=“javascript:void attack()”

Encoding

• Search engine optimization (SEO)• Screen readers• View source• Solutions:– Less destructive encoding– Modify reader– Add feature to browser

Dynamic attacks

• UCC added to a page dynamically must also be protected

• Current implementation requires remote procedure call (via XHR / AJAX) to request model

• Blueprint can ensure a base document free of user-embedded scripts

• Trusted code must then take precautions to maintain security

Whitelist

• Whitelist can be site-specific• Whitelist can be grown, gradually adding

content known to be static• Used off-the-shelf whitelist from HTMLPurifier

URI Defense 3-tiered defense: 1. Character-level whitelist

Only allow syntactically-inert untrusted chars 2. Parse behavior sensing

a.protocol DOM property [Di07] Assumes URI parsing same for all contexts

a.href, img.src, url() 3. Impact mitigation

Rewrite URI pointing to redirection service Attacks execute in different origin, void of sensitive data

Eliminate dependency on browser parser

• Transform user-created content into static content models on web server– Model reflects approved content parse tree

• Propagate static content models into JavaScript interpreter of web browser

• Reconstruct server-approved parse tree using client-side model interpreter

Create static content model

• Parse untrusted HTML• Prune resulting parse tree in accordance with

whitelist of known-static node types• Serialize parse tree into stream of benign data

characters• Wrap in <code> … </code> tags• Attach trusted script for invoking model

interpreter

Model interpreter

• Interprets model as stream of declarative statements

• Uses reliable DOM API to generate content– document.createElement( … )– element.appendChild( … )

• Enforces server-intended parse tree in browser

BLUEPRINT: Robust Prevention of Cross-site Scripting Attacks for Existing Browsers Mike Ter Louw V.N. Venkatakrishnan University of Illinois at Chicago.

Documents

script content

browser slide

content filtering

browsers content isolation

untrusted content parsing

parsing of untrusted

content isolation user

attack slide