Top Banner
UC Berkeley Web 2.0 Applications EuroSys 2010 Tutorial Armando Fox UC Berkeley Reliable Adaptive Distributed Systems Lab [email protected] 1
114
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: tutorial

UC Berkeley

Web 2.0 Applications

EuroSys 2010 Tutorial

Armando Fox

UC Berkeley Reliable Adaptive Distributed Systems Lab

[email protected]

Page 2: tutorial

Who I Am

• Adjunct Prof. at UC Berkeley Computer Science• Research

– 2006-now: applying machine learning to problems of datacenter-scale applications

– 2001-2006: Recovery-Oriented Computing (ROC)

– 1996-2000: Mobile computing meets SaaS

• Teaching: undergraduate Software-as-a-Service/Software Engineering

• Developer & maintainer of active Web app• Know just enough about languages to be

dangerous

2

Page 3: tutorial

Where I Work:RAD Lab 5-year mission

Enable 1 person to develop, deploy, and operate next-generation Internet application at scale

• Key enabling technology: Statistical machine learning– management, scaling, anomaly detection, performance prediction...

• interdisciplinary: 7 faculty, ~30 PhD’s, ~6 ugrads, ~1 sysadm

• Engagement with industrial affiliates keeps us honest

3

Page 4: tutorial

Goals & Non-Goals

• Goals– New Web 2.0 features, technologies, challenges – Web 2.0 & Software Engineering Education– Server-centric view, though client highly nontrivial– Assumption: basic familiarity with Web 1.0

• Non-goals– Plug our own research (you can read it elsewhere)– Teach you to code (plenty of good frameworks, docs)– Instead, know the landscape & where to go next

• Disclaimers– My views are mine alone, etc.– Specific tools mentioned for sake of example only

4

Page 5: tutorial

Key Messages

• Social Computing & Rich UI’s • DADO teams (develop, assess, deploy,

operate) vs. waterfall • Agile, Behavior-Driven Development vs. Big

Design Up Front• High-productivity tools, languages, frameworks:

undergrads deploy ready-to-use apps in ~weeks• Cloud computing is a game changer for Web

education, research, & business

5

Page 6: tutorial

Outline of topics

• Web 1.0 review & what’s new in 2.0• Web 2.0 application frameworks• Service-oriented architecture• DADO, a new view of software development• Deployment• Education• Research Challenges

6

Page 7: tutorial

UC Berkeley

WEB 1.0 REVIEW &WHAT’S NEW IN 2.0

7

Page 8: tutorial

Software-as-a-Service (SaaS) Evolution

• (Dates are approximate...)• 1990: Web 0.9 (physicists using NCSA Mosaic)• 1995: Web 1.0 (static & some dynamic content,

e-commerce, Netscape)• 1997: "Content is King" => "Services are King"

(email, search engines, photo sharing...)• 2000: Web 2.0 (rich UI's, social computing)• 2004: SaaS & SOA (Service Oriented

Architectures) (Google Maps, Amazon S3...)• 2008: Cloud Computing (pay as you go)

8

Page 9: tutorial

The Web is a Client-Server, Request-Reply Architecture

• HTTP (Hypertext Transfer Protocol), ASCII-based request/reply protocol that runs over TCP– HTTPS: variant that first establishes symmetrically-encrypted channel via

public-key handshake, so suitable for sensitive info

• By convention, servers listen on TCP port 80 (HTTP) or 443 (HTTPS)• Universal Resource Identifier (URI) format: scheme, host, port, resource,

parameters, fragment

http://search.com:80/img/search/file?t=banana&client=firefox#p2

Web browser Web serverA series of tubes

DNS server

1.

2.

9

Page 10: tutorial

A Conversation With a Web Server

GET /index.html HTTP/1.0User-Agent: Mozilla/4.73 [en] (X11; U; Linux 2.0.35 i686)Host: www.yahoo.comAccept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,

image/png, */*Accept-Language: enAccept-Charset: iso-8859-1,*,utf-8

• Server replies:HTTP/1.0 200 OKContent-Length: 16018Set-Cookie: B=2vsconq5p0h2nContent-Type: text/html

<html><head><title>Yahoo!</title><base href=http://www.yahoo.com/> …etc.

• Repeat for embedded content (images, stylesheets, scripts...) <img width=230 height=33 src="http://us.a1.yimg.com/us.yimg.com/a/an/anchor/icons2.gif">

HTTP method & URIHTTP method & URI

Cookie data: up to 4KiBCookie data: up to 4KiB

MIME content typeMIME content type

10

Page 11: tutorial

Cookies

• On first visit to a server, browser may receive a cookie from server in HTTP header– Data is arbitrary (up to 4KB long) – typically opaque, interpretation is up to the server– usually HMAC’d or encrypted, since client untrusted

• Browser automatically passes appropriate cookie back to server on each request– Server may update cookie value with any response– Thus can synthesize concept of “session” using this

• Many, many uses– track user’s ID (canonical use: authentication)– track session state (up to 4KB) or a handle to it– before cookies, “fat URL’s” used for this in Web 1.0

11

Page 12: tutorial

XML (eXtensible Markup Language)

<?xml version="1.0" encoding="UTF-8"?><book year="1967"> <title>The politics of experience</title> <author> <firstname>Ronald</firstname> <lastname>Laing</lastname> </author></book>

• Really a metalanguage for describing hierarchical, semistructured, schema-less data

• XML Document Type Definition (DTD) specifies structural & content constraints on a particular document type

12

Page 13: tutorial

ValueAttribute

ElementElementElement

XML (eXtensible Markup Language)

<?xml version="1.0" encoding="UTF-8"?><book year="1967"> <title>The politics of experience</title> <author> <firstname>Ronald</firstname> <lastname>Laing</lastname> </author></book>

• Really a metalanguage for describing hierarchical, semistructured, schema-less data

• XML Document Type Definition (DTD) specifies structural & content constraints on a particular document type

13

Page 14: tutorial

HTML, XHTML & Beyond

• XHTML: a document conforming to a particular DTD describing a hierarchical collection of HTML elements– Variants: Strict, loose, transitional (for compatibility with deterioriating

HTML syntax 1990-95)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

– inline (headings, tables, lists...)– embedded (images, video, Java applets, JavaScript code...)– fill-in forms—text, radio/check buttons, dropdown menus...,

marshaling arguments into either URI or request body

• CSS (Cascading Stylesheets) for presentation– Strict XHML forbids presentational markup– Idea: complete separation of appearance from structure

14

Page 15: tutorial

Selectors identify specific tag(s)

<link rel="stylesheet" href="mystyles.css"/> <div class="pageFrame" id="pageHead">

<h1> Welcome, <span id="custName">Armando</span> </h1></div>

• tag name: h1• class name: .pageFrame • element ID: #pageHead• tag name & class: div.pageFrame• tag name & id: span#custName• descendant relationship: .pageFrame h1, div h1• descendant relationship: div .custName• child relationship: div > .custName

both of these match the outer div above

Page 16: tutorial

CSS Styles apply visual styling based on selectors

<link rel="stylesheet" href="mystyles.css"/> <div class="pageFrame" id="pageHead">

<h1> Welcome, <span id="custName">Armando</span> </h1></div>

• In mystyles.css (static asset with MIME type text/css):

div.pageFrame { background-image: url('/banner.gif'); }h1 { font-size: large; float: left; }#custName:hover { background-color: yellow; font-weight: bold; }

• Style properties include borders, background images, and layout directives (floating, absolute positioning, min/max scaled sizes, etc.)

• Changing style properties has side effect of re-rendering• e.g., change display or visibility property to show/hide elements

Page 17: tutorial

17

Page 18: tutorial

18

Page 19: tutorial

Dynamic content generation

• Most Web 1.0 (e-commerce) sites actually run a program that generates the output

• Originally: templates with embedded code “snippets”

• Eventually, embedded code became “tail that wagged the dog” and moved out of the Web server

• Languages/frameworks evolved to capture common tasks– Perl, PHP, Python, Ruby on Rails, ASP, ASP.NET,

Java Servlet Pages, Java Beans/J2EE, ...

19

Page 20: tutorial

SaaS 3-tier architecture

• Common gateway interface (cgi): allows Web server to run a program– Server maps some URI’s to application names– App is run, gets handed complete HTTP request

including headers

• “Arguments” embedded in URL with “&” syntax or sent as request body (with POST)http://www.foo.com/search?term=white

%20rabbit&show=10&page=1

• App generates entire response– content (HTML? an image? some javascript?)– HTTP headers & response code

• Plug-in modules for Web servers allow long-running CGI programs & link to language interpreters

HTTPserver

application

persistentstorage

appserver

storage

• Various frameworks have evolved to capture this common structure

20

Page 21: tutorial

3 Tier Deployment• HTTP server (“web server”)

– “fat” (e.g. Apache): support virtual hosts, plugins for multiple languages, URL rewriting, reverse proxying, ....

– “thin” (nginx, thin, Tomcat, ...): bare-bones machinery to support one language/framework; no frills

• App server1. separate server process, front-ended by a “thin” HTTP

server

2. or linked to an Apache worker via FastCGI or web server plug-in: mod_perl, mod_php, mod_rails, ...

– Apache can spawn/quiesce/reap independent processes

• Persistent storage– most commonly RDBMS (MySQL, PostgreSQL, etc.)

– communicate w/app via proprietary or standardized database “connector” (ODBC, JDBC, ...)

• Hence LAMP: Linux, Apache, MySQL, PHP/Perl

21

HTTPserver

application

persistentstorage

appserver

storage

Page 22: tutorial

Frameworks

• Support for more languages: Apache modules (mod_perl, mod_php, mod_rails ...)– avoid spawning new process per request– typically embed language interpreter in Apache

• Support for common idioms like sessions– Cookie management– virtualize connection to database– “dispatcher” interactions with front-end HTTP server

• Early “templating systems” (e.g. PHP) vs. modern “full stack frameworks” (e.g. Rails)

22

Page 23: tutorial

Example: Rails, a Ruby-based Model/View/Controller Framework

apache

your app

CGI or other dispatching

RelationalDatabase

mysql orsqlite3

Rubyinterp.

firefox

tables

models/*.rb

controllers/*.rb

Rails routingRails routing

views/*.html.erb

Rails renderingRails rendering

Model, View, Controller

Subclasses of ActiveRecord::Base

Subclasses of ActiveRecord::Base

Subclasses of ActionView

Subclasses of ActionView

Subclasses of ApplicationControll

er

Subclasses of ApplicationControll

er

• Implemented almost entirely in Ruby• Distributed as a Ruby “gem” (collection of related libraries & tools)• Connectors for most popular databases

Page 24: tutorial

A trip through a Rails app

1. Declarative routes map URL’s to actions (methods in a class) and unmarshal parameters from URL or form

2. Actions can set variables that are visible to views3. Every controller action eventually renders something

1. HTML page: view template with variables expanded2. Response to AJAX request3. Error page

http://.../foo/my_action?x=Howdy

routes.rb

app/controllers/foo_controller.rb

def my_action @var = params[:x]end

app/views/foo/my_action.html.erb

<p> Hey, <%= @var %></p>

Page 25: tutorial

25

ActiveRecord, an object-relational mapping layer

class User < ActiveRecord::Base• table name inferred from class name• columns introspected from database• example of convention over configuration

# To find by column values:armando = User.find_by_name('fox')armando = User.find_by_name_and_birthdate('fox',

Date.parse('May 12, 1968'))armando.birthdate = Date.parse('June 6, 1969')armando.save!

# To find only a few, and sort by an attributeold_guys = User.find(:all,

:conditions => ["birthdate < ?", Date.parse("1/1/80")], :order => "birthdate")

users

id*

name

birthdate

Protect from SQL injection

attacks

Protect from SQL injection

attacks

Page 26: tutorial

26

ActiveRecord Associations

users

id*

name

description

pics

id*

user_id**

filename

SELECT *FROM users u JOIN pics p ON u.id = p.user_id;

class User < ActiveRecord::Base has_many :picsendclass Pic < ActiveRecord::Base belongs_to :userendthisuser.pics << Pic.new(...)thisuser.pics.sort { |p| p.user.birthdate }

Page 27: tutorial

27

Multiple joins

user has_many :groups, :through=>:memberships

group has_many :users,:through=>:memberships

membership belongs_to :user, belongs_to :group

• Can now write user.groups, group.users, etc.

• Separates relationships from storage schema

memberships

user_id**

group_id**

status

groups

id*

name

topic

users

id*

name

description

Page 28: tutorial

Rails & Security

• Application-based attacks on Web 2.0 apps– SQL injection (defense: sanitize untrusted user input)– Cross-site request forgery, cross-site scripting

(defense: include session authentication token)– Good frameworks help protect against these

• Infrastructure-based attacks (DDoS, etc.)– Your deployment provider matters (more on this later)

28

Page 29: tutorial

What’s new in Web 2.0?

• Primitive UI => Rich UI– enable “desktop-like” interactive Web apps– enable browser as universal app platform on cell phones

• “Mass customize” to consumer => Social computing– tagging (Digg), collaborative filtering (Amazon reviews), etc. =>

primary value from users & their social networks– write-heavy workloads (Web 1.0 was read-mostly)– lots of short writes with hard-to-capture locality (hard to shard)

• Libraries => Service-oriented architecture– Integrate power of other sites with your own (e.g. mashups that

exploit Google Maps; Google Checkout shopping cart/payment)– Pay-as-you-go democratization of “services are king”– Focus on your core innovation

• Buy & rack => Pay-as-you-go Cloud Computing

29

Page 30: tutorial

Rich Internet Apps (RIAs)

• Closing gap between desktop & Web– Highly responsive UI’s that don’t require server roundtrip per-action– More flexible drawing/rendering facilities (e.g. sprite-based animation)– Implies sophisticated client-side programmability– Local storage, so can function when disconnected

• early example: Google Docs + Google Gears– include offline support, local storage, support for video, support for

arbitrary drawing, ...

• currently many technologies—Google Gears, Flash, MS Silverlight...– client interpreter must be embedded in browser (plugin, extension, etc.)– typically has access to low-level browser state => new security issues– N choices for framework * M browsers = N*M security headaches

• proposed HTML5 may obsolete some of these

30

Page 31: tutorial

Rich UI with AJAX(Asynchronous Javascript and XML)

• Web 1.0 GUI: click page reload• Web 2.0: click page can update in place

– also timer-based interactions, drag-and-drop, animations, etc.

How is this done?1. Document Object Model (c.1998, W3C) represents

document as a hierarchy of elements2. JavaScript (c.1995; now ECMAScript) makes DOM

available programmatically, allowing modification of page elements after page loaded

3. XMLHttpRequest (c.2000) allows async HTTP transactions decoupled from page reload

4. JavaScript libraries (jQuery, Prototype, script.aculo.us) encapsulate useful abstractions

31

Page 32: tutorial

DOM & JavaScript:Document = tree of objects

• hierarchical object model representing HTML or XML doc

• Exposed to JavaScript interpreter– Inspect DOM element value/attribs

– Change value/attribs redisplay or fetch new content from server

• Every element can be given a unique ID• JavaScript code can walk the DOM tree or select

specific nodes via provided methods

<input type="text" name="phone_number" id="phone_number"/><script type="text/javascript"> var phone = document.getElementById('phone_number'); phone.value='555-1212'; phone.disabled=true; document.images[0].src="http://.../some_other_image.jpg";</script>

32

Page 33: tutorial

JavaScript

• A browser-embedded scripting language– OOP: classes, objects, first-class functions, closures– dynamic: dynamic types, code generation at runtime– JS code can be embedded inline into document...

<script type="text/javascript"> <!-- # protect older browsers

calculate = function() { ... } // --> </script>

– ...or referenced remotely: <script src="http://evil.com/Pwn.js"/>

• Current page DOM available via window, document objects– Handlers (callbacks) for UI & timer events can be attached to JS

code, either inline or by function name: onClick, onMouseOver,...

Changing attributes/values of DOM elements has side-effects, e.g.: <a href="#" onClick="this.innerHTML='Presto!'">Click me</a>

33

Page 34: tutorial

AJAX ==Asynchronous Javascript And Xml

• Recipe:– attach JS handlers to events on DOM

objects – in handler, inspect/modify DOM elements

and optionally do asynchronous HTTP request to server

– register callback to receive server response– response callback modified DOM using

server-provided info

• JavaScript as a target language– Google Web Toolkit (GWT): compile Java => emit JS– Rails: runtime code generation ties app abstractions to JS

34

Page 35: tutorial

JavaScript example for AJAX

r=XmlHttpRequest.newr.open("GET","http://www.example.com",true)

last arg true means script should not block (important!)r.send(request_content) # eg, form fields

• Callbacks during XHR processingr.onReadyStateChange=function(XmlHttpRequest req)

{ ... }

– inspect req.readyState uninitialized,open, sent,receiving,loaded

req.status contains HTTP status of responsereq.responseText contains response content• Libraries like JQuery and Prototype abstract this

and provide some cross-browser support

35

Page 36: tutorial

Example: AJAX via Rails

• Embedded Ruby code in HTML template:

link_to_remote('Show article',:update => 'article_content',:url => {:action =>'get_article_text',:id =>article},:before => "Element.show('spinner')",:loading => "Element.hide('spinner'); Element.show('stopwatch')",:success => "Element.hide('stopwatch')",404 => alert("Article text not found!"),:failure => alert("Some other error"))

• Delivered page contains JS that embeds calls to Prototype, defines and dispatches to callback handlers, etc.

• Simple auto-completion handler:observe_field('student[last_name]', :url => {:controller=>'students',

:action=>'lookup_by_lastname'}, :update=>'lastname_completions')

36

Page 37: tutorial

Sidebar: It’s Tough Being a Browser

• Users now expect “Web apps” to include animation, sound, 3D graphics, disconnection, responsive GUI...– Browser =~ new OS: manage mutually-untrusting apps (sites)

37

Sou

rce:

Rob

ert O

’Cal

laha

n (M

ozill

a.or

g),

Insi

de F

irefo

x

Page 38: tutorial

Social Computing

• Web 1.0: add value via mass customization– select content/presentation for you based on best guesses about

your interests– resource: demographic/analytic data about you

• Web 2.0: add value via connecting to social network– vendor: your friends’ interests are a good indicator of your

interests – user: value added to existing content == how your friends interact

with it– resource: your social network

• From social networking site to social network as a way of structuring applications 38

Page 39: tutorial

Social Computing

• Amount of content “created” by each user small!– e.g., Digg article, rate video, play a Facebook game

• but still creates lots of short random writes– consider “Like” feature on Facebook– social graphs naturally hard to partition (though would

love to see a paper about this from FB)

• question for Web 2.0 developers is not whether social computing is part of your app, but how

• later we will discuss technical architecture of “connecting” an app to social networks

39

Page 40: tutorial

UC Berkeley

SOA

40

Page 41: tutorial

Amazon.com: Web 1.0 SOA

• ~50 “two-pizza” teams of “developer/operators”

• ~10 operators – monitor the whole site– page the resolvers on alarm

• ~1000 resolvers – 10-15 per team, 1 on-call 24x7– monitor own service, fix problems

• Over 140 code change commits/month• Internal microcosm of service-oriented

architecture (as were Yahoo, Google, others)

web serverweb serverweb server

web serverweb serverservice Aweb serverweb serverservice B

web serverweb serverservice C

DB DB DB

P. Bodík et al., Advanced Tools for Operators at Amazon.com, Proc. ICAC 2005

Page 42: tutorial

What is SOA?

• Use other services as RPC servers for your app• Web 1.0: large sites organized this way internally

– Yahoo!, Amazon, Google, ...– External “Services” available, but getting them is high-

touch: Doubleclick ads, Akamai content distribution

• Web 2.0: consumer-facing service API’s and typically pay-as-you-go (vs. contractual)– Services: Google AdSense, Google Analytics, Amazon

CloudFront...

– Platforms: Facebook, Google Maps, ...

– Mashups, e.g. housingmaps.com

– User-composable services, e.g. Yahoo Pipes

42

Page 43: tutorial

SOA == RPC

• Transport: HTTP(S)• Data interchange: XML DTD (e.g., RSS), JSON• Request protocol:

– SOAP (Simple Object Access Protocol)– JSON-RPC

• On the horizon: WebHooks (HTTP POST callback, for “push”)

43

Page 44: tutorial

JSON-RPC

• Open connection to designated port on server• Send HTTP method & request URI, with MIME type of body set to application/json• Then send request body:

{ "version": "1.1", "method": "confirmFruitPurchase", "id": "194521489", "params": [ [ "apple", "orange", "pear" ], 1.123 ]}• Response might be something like this:{ "version": "1.1", "result": "done", "error": null, "id": "194521489"}• You have to handle substantially all errors

44

Page 45: tutorial

RSS• Request is a regular HTTP GET to a specified URL

<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1"> <channel> <title>Altarena Playhouse Ticket Availability</title> <link>http://www.audience1st.com/altarena/store</link> <description>Altarena Ticket Availability</description> <item> <title>Sylvia - Friday, May 14, 8:00 PM – Buy now</title> <link>http://.../store?showdate_id=347</link> <guid permalink="false">http://.../store?ts=1271058414</guid> </item> <item> ... </item> </channel></rss>

45

Page 46: tutorial

AJAX and SOA

• AJAX: client server– client makes (async) requests to HTTP server

– client-side JavaScript upcall receives reply and decides what to do

– commonly, response includes XHTML/XML to update page, or JavaScript to execute

– Doesn’t really make sense except in context of client

• SOA: server server or client server– one principal makes (sync or async) requests to an HTTP server

– formerly, principal was a server running some app

– today, powerful JavaScript clients blur the line

46

Page 47: tutorial

Facebook

• Facebook plug-in apps

• Facebook platform (“Facebook Connect”)

47

AJAX

Facebook.com Your app2.3.

FB dataFBQL

4.html

1.

SOA

3.Your app Facebook.com

FB data

html+xfbml

1.

4.

2 (opt.).

REST

REST via JavaScript & XFBML HTML IFRAME w/FB content

Page 48: tutorial

Google Maps

• Your app embeds Javascript-heavy client code (provided by Google) – client-side functionality: clear/draw overlays, etc.– server-side functionality: fetch new map, rescale, geocoding

• Attach callbacks (handled by your app) to UI actions• Result of callback can trigger additional calls to Google

Maps code, which in turn contact GMaps servers

html+js

Yourapp

GoogleMaps

1.

2.

3.4.

48

Page 49: tutorial

Mashups: housingmaps.com

49

Page 50: tutorial

Two ways to do it...

50

“Thin” browser

client

“Thin” browser

client

Web 2.0 app

Web 2.0 app Craigslist.orgCraigslist.org

Google MapsGoogle Maps“Fat” browser client

“Fat” browser client

+ Client portability

+/– Client performance (both app download & JavaScript execution)

+ Availability of utility libraries for app development

– Privacy/trustworthiness of aggregator app

– Caching

Page 51: tutorial

REST (Representational State Transfer) Philosophy

• Architectural style (not a standard per se):– Client-server, Stateless, Cacheability indicated

– a/k/a post-hoc description of properties that made Web 1.0 successful by constraining SOA interactions

• In context of SOA for Web 2.0– HTTP is transport; HTTP methods (Get, Put, etc.) are the

only commands

– Reify idea that URI names resource (broadly...)

– Client has resource has enough info to request modification of resource on server

– cookie can encode part of transferred state

• If your app is RESTful, it’s easy to “SOA”-ify

51

Page 52: tutorial

REST with HTTP examplesHTTP GET HTTP PUT HTTP POST HTTP DELETE

Collection URI, such as http://example.com/customers/257/orders

List the members of the collection, complete with their member URIs for further navigation

Replace the entire collection with another collection

Create a new entry in the collection. The ID created is usually included as part of the data returned by this operation.

delete the entire collection

HTTP GET HTTP PUT HTTP POST HTTP DELETE

Element URI, such as http://example.com/resources/7HOU57Y

Retrieve a representation of the addressed member of the collection in an appropriate MIME type

Update (or create) the addressed member of the collection

Treats the addressed member as a collection in its own right and creates a new subordinate of it.

Delete the addressed member of the collection.

52

Page 53: tutorial

UC Berkeley

AGILE DEVELOPMENT & WEB 2.0

53

Page 54: tutorial

New models of software development

• Process: SupportDADO Evolution, 1 group

• Waterfall: Static Handoff Model, N groups

Develop

Assess Deploy

Operate

Develop

Assess

Deploy

Operate54

Page 55: tutorial

Why is this here?• For many, a new way to develop software• Highly productive: undergraduates produce complete

working apps, with tests, in weeks• Great structural fit for Web 2.0 applications• Amazingly good tools: “make it fun” just as important for

testing as for development

55

Page 56: tutorial

(Short) History of Software Engineering

• “1/3 of software development projects fail or are abandoned outright because of cost overruns, delays, and reduced functionality”

• IRS Tax Modernization System: – “The IRS must recognize that technology is an enabler, not a driver, of

business success, and that it needs a strategic plan with business objectives that drive the use of technology.” House Commission on Restructuring the IRS, 1997 report

• Denver Airport Baggage Handling System– 1.5 year delay, $1M/day during modifications/repairs, ultimately

abandoned 10 years later (source: Wikipedia)

• Software Development Failures: Anatomy of Abandoned Projects, K.Ewusi-Mensah, 2003

56

Page 57: tutorial

“Big Design Up Front”

• Started with elaborate, detailed specification of what customer wants– 100s of pages

• Problem: Customers may change mind– change wrecks schedule in unpredictable ways

– some use cases may have been forgotten or misrepresented

• But change is inevitable– “If a problem has no solution, it may not be a problem, but a

fact; not to be solved, but to be coped with over time”Israeli foreign minister Shimon

Peres

57

Page 58: tutorial

Agile Development

Big Design Up Front Agile

• Time, resources, and scope “fixed”

• Changing one affects the others, as well as quality

• Manage the plan• Try to minimize change

• Time, resources, and quality fixed

• Changing time or resources affects scope

• Manage the priorities• Change as you learn more

Agile methods break tasks into small increments with minimal planning, and do not directly involve long-term planning. Each iteration involves a team working through a full SW development cycle including planning, requirements analysis, design, coding, unit testing, and acceptance testing when a working product is demonstrated to stakeholders. This helps minimize overall risk, and lets the project adapt to changes quickly. An iteration may not add enough functionality to warrant a market release, but the goal is to have an available release (with minimal bugs) at the end of each iteration. Multiple iterations may be required to release a product or new features.

Page 59: tutorial

Test-Driven/Behavior-DrivenDevelopment

Behavior driven: start from behaviors, and behavior spec == acceptance test– Start from user behavior by writing the code you wish

you had (results in better API than top-down design)– Script the tests you’d single-step manually– when done, get automatable integration/acceptance

test for free

Test driven: write tests first– debugging, testing, isolating bugs: need modular codewrite test first ensures code is modular/debuggable

59

Page 60: tutorial

User Stories for Acceptance/Integration

Testing• A story from user perspective that provides business

value to stakeholder and is testable• As a [type of stakeholder]

I want to [perform some task] so that I can [reach some goal]

– Complete Web app has 100’s or 1000’s of stories

– Long stories (“epics”) broken down to smaller chunks

• Development proceeds in fixed-period iterations (typically 2 weeks)– Each story small enough to implement in 1 iteration

– Developer estimates difficulty (points) to implement

– “Deliver” (release) N new points/iteration (velocity)

60

Page 61: tutorial

A Feature Comprises Several User Stories

Feature: Subscriber purchases additional tickets

As a season subscriber I want to go to the Store page So that I can buy discounted tickets for a show

Scenario: Subscriber logs in Given I am logged in as a subscriber When I visit the "Store" page Then I should see the Subscriber message

Scenario: Subscriber offered discount ticket price Given I am on the "Store" page And there are upcoming performances of "Chicago" When I select the show "Chicago" Then "Subscriber Discount" should appear in the "Ticket Prices" menu

61

Page 62: tutorial

A Feature Comprises Several User Stories

Feature: Subscriber purchases additional tickets

As a season subscriber I want to go to the Store page So that I can buy discounted tickets for a show

Scenario: Subscriber logs in Given I am logged in as a subscriber When I visit the "Store" page Then I should see the Subscriber message

Scenario: Subscriber offered discount ticket price Given I am on the "Store" page And there are upcoming performances of "Chicago" When I select the show "Chicago" Then "Subscriber Discount" should appear in the "Ticket Prices" menu

62

1. Title1. Title

2. Narrative2. Narrative

3. User stories3. User stories

Page 63: tutorial

Rails Testing Ecosystem

• Unit testing: RSpec (based on Java Spec)– more expressive, and Ruby-specific

– extensive support for isolation (mocking & stubbing) by exploiting Ruby dynamic language features

• Integration/acceptance testing: Cucumber– can be used for non-Ruby systems

– bridges user stories and integration tests

• Cucumber on Rails– Web browser interactions: use Webrat or Selenium to

emulate or script browser interactions, incl. JavaScript

– (Optional) Use RSpec facilities to setup preconditions, check postconditions of tests

63

Page 64: tutorial

Given...

• Regular expressions match scenario text to test code• “Steps” implement Given, When, Then• Given: setup preconditions either directly or via

Webrat/Selenium

Given /^I am logged in as a subscriber$/ do visit '/customers/login' @customer = customers(:tom_the_subscriber) fill_in 'customer_login', :with => @customer.login fill_in 'customer_password', :with => @customer.pass click_button 'Login' response.should match(/Login successful/)end

64

Page 65: tutorial

When... Then...

• When: use Webrat or Selenium to emulate browser or drive a real browser

• Then: use RSpec (unit test) facilities to check outcome (should, should_receive, etc.)

When /^I visit the Store page$/i do visit '/store'end

Then /^I should see the (.*) message$/ do |msg| response.should have_selector("div. #{m}") response.should match (Regexp.new( "Welcome,.*" + @customer.first_name))end

65

Page 66: tutorial

Test case 2Test case 2

Test case 1Test case 1

Preconditions before each testPreconditions before each test

Expectations example

describe "transferring a ticket" do context "when recipient doesn't exist" do before(:each) do @t = Ticket.new(...) @from = Customer.find(:first) @from.tickets << @t @from.save! @to = create_nonexistent_customer_id() end it "should not cause an error" do lambda { @t.transfer_to_customer(@to) }. should_not raise_error end it "should not remove from original owner" do @t.transfer_to_customer(@to) @from.tickets.should include(@t) end endend

66

Page 67: tutorial

Expectations example

describe "successful purchase" do it "should contact the payment gateway" do

Store.should_receive(:pay_via_gateway). with(@amount,@credit_card,@params). exactly(1).times.and_return(@success)Store.purchase!(...)

• Expectation modifiers: at_least(n).times, any_number_of_times

• Argument modifiers: with(:any_args), with() • Return value modifiers: and_return(val)• Ruby dynamic language features used to implement

this test scaffolding

67

Page 68: tutorial

Outside-In Development: Red/Green/Refactor

For each step in user story

1. Write the step definition

2. Run & watch it fail

For each behavior of underlying objects/models

Write unit test (expectation)

3. Watch it fail

4. Implement just enough to pass

5. Refactor if needed

6. Watch user story step pass

7. Refactor step(s) if needed

68

Page 69: tutorial

Tracking Progress with PivotalTracker.com

69

Page 70: tutorial

Summary: Agile & Behavior-Driven Development

• Agile, iteration-based process based on user stories

• Planning, coding, testing proceed as a cycle by 1 person

• Test-first promotes modularity, debugability, and a concrete measure of progress

• Attention to productivity in testing tools as well as dev tools– Student projects in Berkeley SaaS class: ~50% LOC

were testing-related

70

Page 71: tutorial

UC Berkeley

DEPLOYMENT

71

Page 72: tutorial

Scaling via Replication

• The “most general” deployment scenario for a 3 tier Web app– Many Web servers

– possibly including static-asset servers

– L4/L7 load balancers distribute load among them

• Caches and reverse proxies remember previously-computed content– whole page caching

– page fragment caching, query caching

– Apache in reverse-proxy mode, or memcached process(es) addressed byapp server

• Integration of caching with app logic varies by framework

WSWS …

$ $…

LB LB…

App App

DB DB?

the Internets

AssetSvr

72

Page 73: tutorial

“Scale makes availability affordable”

• Goal: interchangeability (send any user request to any available server)– each server handles 1/N load– affinity can be used to “soft-pin” users to

particular servers– requires good support for session state

abstraction in app framework

• lose 1 server => lose 1/N capacity– Load Balancers have logic to detect

failed servers & remove from rotation until they are resurrected

73

WSWS …

$ $…

LB LB…

App App

DB DB?

the Internets

AssetSvr

Page 74: tutorial

Asset Servers

• For serving static assets (images, sound clips, CSS, etc.)

• Separate Web server process, configuration optimized for fast static file serving

• Web 2.0: use Amazon S3 (blob store) or CloudFront (CDN)– helps to have good asset-server abstraction in app

framework

74

Page 75: tutorial

Deploying a new release

• Checkout new code on production server(s)• Run database schema migrations if any• Quiesce old version, soft-restart new version• If necessary, temporary disable access during

quasi-atomic switchover• Differentiate between asset servers, code

servers, database machines• Be prepared to roll back if any problems• Tools like capistrano help automate the above

steps

75

Page 76: tutorial

Deployment scenarios (& approximate pricing)

• Buy/rack/install/configure it yourself...that’s so Web 1.0• Shared hosting ($3/month)

– turnkey support for popular frameworks, hosted versions of popular building blocks (e.g. MySQL)

– highly variable performance, multitenant per machine

• Virtual private host ($10/month)– better isolation and security through virtualization– substantially more administration

• “Framework VM” or “curated” environments (Heroku, Google AppEngine, Force.com) – pricing varies– hosted extensions: memcached, profiling, etc.– integration of 3rd party hosted services, e.g. Amazon S3 backup

• Cloud Computing

76

Page 77: tutorial

Pay-as-you-go Cloud Computing

7777

“Instances” Platform Cores Memory Disk

Small - $0.085 / hr 32-bit 1 1.7 GB 160 GB

Large - $0.34/ hr 64-bit 4 7.5 GB 850 GB – 2 spindles

XLarge - $0.68/ hr 64-bit 8 15.0 GB 1690 GB – 3 spindles

Options....extra memory, extra CPU, extra disk, ...

Page 78: tutorial

A Berkeley View of Cloud Computing (2/09)

abovetheclouds.cs.berkeley.edu• Goal: stimulate discussion on what’s new

– Clarify terminology– Quantify comparisons– Identify challenges & opportunities

• UC Berkeley perspective– industry engagement but no axe to grind– users of Cloud Computing since late 2007

• New: pay-as-you-go, utility computing– Illusion of infinite resources on demand (minutes)– Fine-grained billing: release == don’t pay, no minimum

78

Page 79: tutorial

Unused resources

Cloud Economics 101

• Cloud Computing User: Static provisioning for peak - wasteful, but necessary for SLA

“Statically provisioned” data center

“Virtual” data center in the cloud

Demand

Capacity

Time

Demand

Capacity

Time

79

Page 80: tutorial

Unused resources

Cloud Economics 101

• Cloud Computing Provider: Could save energy

“Statically provisioned” data center

Real data center in the cloud

Demand

Capacity

Time

Demand

Capacity

Time

80

Page 81: tutorial

Unused resources

Risk of Overprovisioning

• Underutilization results if “peak” predictions are too optimistic

Static data center

Demand

Capacity

Time

81

Page 82: tutorial

New Scenarios Enabled by “Risk Transfer” to Cloud

• “Cost associativity” from linear pricing: 1,000 CPUs for 1 hour same price as 1 CPUs for 1,000 hours (@$0.10/hour)– Washington Post converted Hillary Clinton’s travel documents to

post on WWW <1 day after released– RAD Lab graduate students demonstrate improved Hadoop (batch

job) scheduler—on 1,000 servers

• Major enabler for SaaS startups– Animoto traffic doubled every 12 hours for 3 days when released

as Facebook plug-in– Scaled from 50 to >3500 servers– ...then scaled back down

• Goal: fix any transient problem by adding/removing nodes– Single-node performance becomes much less important

82

Page 83: tutorial

Classifying Clouds for Web 2.0

• Instruction Set VM (Amazon EC2)• Managed runtime VM (Microsoft Azure)• Curated “IDE-as-a-service” (Heroku)• Platform as service (Google AppEngine, Force.com)

• flexibility/portability vs. built-in functionality

EC2 Azure Force.com

Lower-level,Less managed

Higher-level,More managed,

more value-added SW

83

Heroku,AppEngine

Joyent

Page 84: tutorial

Summary: Deployment

• “Deployment-as-a-service” increasingly common– monthly pay-as-you-go curated environment (Heroku)– hourly pay-as-you-go cloud computing (EC2)– hybrid: overflow from fixed capacity to elastic capacity– Remember administration costs when comparing!

• Good framework can help at deployment time– Separate abstractions for different types of state: session state,

asset server, caching, database– ORM – natural fit for social computing, and abstracts away from

SQL (vs Web 1.0 PHP, e.g.)– REST – encourages you to make your app RESTful from start,

so that “SOA”-ifying it is trivial

• Scaling structured storage: open challenge

84

Page 85: tutorial

UC Berkeley

EDUCATION

85

Page 86: tutorial

Software Education in 2010 (or: the case for teaching SaaS)

• “depth first” CS curricula vs. Web 2.0 breadth– DB, Networks, OS, SW Eng/Languages, Security, ...

– Medium of instruction for SW Eng. courses not tracking languages/tools/techniques actually in use

– Students learn bad practices by osmosis so they can create Web apps

• New: languages & tools are actually good now– Ruby, Python, etc. are tasteful and allow reinforcing

important CS concepts (higher-order programming, closures, etc.)

– order-of-magnitude greater productivity than 1 generation ago, including for testing

86

Page 87: tutorial

Team Skills

• Web 2.0 apps increasingly composed of loosely coupled teams doing DADO

• Technical as well as “social” team skills needed– repository management– branching, tagging, merging– distributing responsibility during collaboration

• Web 2.0 SaaS == Great fit for ugrad education– Apps can be developed/deployed on semester

timescale– Rapid gratification => projects outlive the course– Team skills in context of agile development

87

Page 88: tutorial

SaaS Using RoR at Cal:Course Goals

• What’s different about DADO for SaaS– Basic *ilities: Horizontal scaling, load balancing, H/A– Consistency, caching, database scaling, CAP– Benchmarking, tuning, understanding SLA’s

• How CS “big ideas” make RoR high productivity– H.O. programming, metaprogramming, introspection =>

ActiveRecord ORM– runtime code generation => AJAX support

• Major Vehicle: DADO an app of your choice, in teams of 2-3; deploy to public cloud– zero to prototype in ~6 weeks– assume OOP skills, but no DB or web programming

88

Page 89: tutorial

Comparison to other SW Eng./programming courses

• Open-ended project– vs. “fill in blanks” programming

• Focus on SaaS– vs. Android, Java desktop apps, etc.

• Focus on RoR as high-level framework• Projects expected to work

– vs. working pieces but no artifact– most projects actually do work, some continue life

outside class

• Focus on how “big ideas” in languages/programming enable high productivity

89

Page 90: tutorial

Topic coverage & labs

• “Hello World” web app in Rails• Unit-test-driven design of a specified module• User-story-driven design of an app (work in

teams of 2 or 3 students)• Deploy own app to Amazon EC2• Use Cloudstone benchmark app to saturate

MySQL database (using EC2)• Experiment with different types of caching to

observe effect on database saturation• Final demo: publicly-deployed app, short talk

90

Page 91: tutorial

Web 2.0 SaaS as Course Driver

• Majority of students: ability to design own app was key to appeal of the course– design things they or their peers would use

• High productivity frameworks => projects work– actual gratification from using CS skills, vs. getting N

complex pieces of Java code to work but not integrate

• Fast-paced semester is good fit for agile iteration-based design

• Tools used are same as in industry

91

Page 92: tutorial

Cloud Computing as a Supporting Technology

• Elasticity is great for courses!– Donation from AWS; ~$100/student– Watch a database fall over: ~200 servers needed – Lab deadlines, final project demos

• VM image simplifies courseware distribution– Prepare image ahead of time– Students can be root if need to install weird SW, libs...

• students get better hardware – cost associativity– cloud provider updates HW more frequently

• VM images compatible with Eucalyptus—enables hybrid cloud computing

92

Page 93: tutorial

Moving to cloud computing

What Before After

Compute servers 4 nodes of R cluster EC2

Storage local Thumper S3, EBS

Authentication login per student, MySQL username/tables per student, ssh key for SVN per student

EC2 keypair + Google account

Database Berkeley ITS shared MySQL

MySQL on EC2

Version control local SVN repository Google Code SVN

Horizontal scaling ??? EC2 + haproxy/nginx

Software stack management

burden Jon Kuroda create AMI

93

Page 94: tutorial

Success stories

94

Page 95: tutorial

Success stories, cont.

• Fall 2009 project: matching undergrads to research opportunities

• Fall 2009 project: Web 2.0 AJAXy course scheduler with links to professor reviews

• Spring 2010 projects: apps to stress RAD Lab infrastructure– gRADit: vocabulary review as a game– RADish: comment filtering taken to a whole new level

95

Page 96: tutorial

SaaS Courses at Cal

Lower div.

Upper div.

Grad.

Understand Web 2.0 app structure ✔

Understand high-level abstraction toolkits like RoR

✔ ✔

How high-level abstractions implemented

✔ ✔

Scaling/operational challenges of SaaS

✔ ✔

Develop & deploy SaaS app✔ ✔

Implement new abstractions, languages, or analysis for SaaS

✔96

Page 97: tutorial

Planning a SaaS course?

• Pick a highly-productive framework– Projects can be deployed, and will actually work– Students can use production-quality tools & methods– We used Ruby on Rails; Google AppEngine probably also a good

choice

• Avail yourself of *-as-a-service– Google Code for Subversion version control– PivotalTracker for project tracking– EC2 for app deployment (Amazon is very good about donating AWS

credits for education)

• Tie high-productivity mechanisms back to CS “big ideas”– Code generation, introspection/reflection, metaprogramming, higher

order programming

• Steal our materials (http://radlab.cs.berkeley.edu)

97

Page 98: tutorial

Summary: Education

• Web 2.0 SaaS is a great motivator for teaching software skills– students get to build artifacts they themselves use– some projects continue after course is over– opportunity to (re-)introduce “big ideas” in software

development/architecture

• Cloud computing is great fit for CS courses– elasticity around project deadlines– easier administration of courseware– students can take work product with them after

course (e.g. use of Eucalyptus in RAD Lab)

98

Page 99: tutorial

UC Berkeley

WEB 2.0 RESEARCH

99

Page 100: tutorial

What’s New in Web 2.0

• Very large structured data storage that scales elastically with app

• Understanding & generating large spikes• Operational problems: finding the “needle in the

haystack”• Renewed focus on client side challenges

(JavaScript, client security, browser performance)

• Cloud Computing enables large scale and elasticity

100

Page 101: tutorial

Cloud Computing

• Cost associativity makes it possible to obtain results on 100’s or 1000’s of servers– Console log mining– BOOM (declarative cloud programming)– SCADS (SIGMOD 2010 demo)

• Eucalyptus makes hybrid cloud computing reasonably practical– Run small experiments locally, then “scale up” to

cloud for paper results

• Why aren’t you using cloud computing yet?

101

Page 102: tutorial

Example: Facebook

• Facebook has 2 datacenters, 1 per coast– reads spread across both– writes only to W. Coast; periodically (~10 minutes)

replicated to E. Coast– >2000 MySQL servers, >25TB RAM for memcached

• Challenge: inconsistency due to stale data– I change status message => Friends on East Coast

datacenter don’t see change for 10 min– What if E.Coast person changes own status??

102

Page 103: tutorial

Web at 100 feet: georeplication & CDN’s

Source: “How Facebook Works”,Technology Review, Jul/Aug 2008

103

Page 104: tutorial

SCADS: Scalable, Consistency-Adjustable Data Storage

• Most popular websites follow the same pattern– Outgrow initial prototype (on MySQL) due to scale – Build large, complicated ad-hoc systems to deal with

scaling limitations as they arise

• Want Scale Independence as new users join:– No changes to application– Cost per user & request latency don’t increase

• Key Innovations1. Performance-{safe,insightful} query language2. Declarative performance/consistency tradeoffs3. Automatic scale up & down using machine learning

M. Armbrust et al., SCADS: Scalable Consistency-Adjustable Data Storage for Interactive Applications. Proc. CIDR 2009M. Armbrust et al., PIQL: A Performance-Insightful Query Language. Proc. SOCC 2010.

Page 105: tutorial
Page 106: tutorial
Page 107: tutorial
Page 108: tutorial
Page 109: tutorial
Page 110: tutorial

UC Berkeley

Page 111: tutorial
Page 112: tutorial
Page 113: tutorial

UC Berkeley

Page 114: tutorial